Siliang Tang
16
Papers
101
Total Citations
Papers (16)
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
ICML 2025
63
citations
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
CVPR 2025
18
citations
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
ICCV 2025
10
citations
Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning
NeurIPS 2025
6
citations
Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark
ICML 2025
4
citations
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation
ICCV 2025
0
citations
Data Shunt: Collaboration of Small and Large Models for Lower Costs and Better Performance
AAAI 2024
0
citations
DIEM: Decomposition-Integration Enhancing Multimodal Insights
CVPR 2024
0
citations
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
CVPR 2024
0
citations
Auto-Encoding Morph-Tokens for Multimodal LLM
ICML 2024
0
citations
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
ICML 2024
0
citations
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
ICML 2024
0
citations
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
CVPR 2025
0
citations
STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training
CVPR 2025
0
citations
Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
ICCV 2025
0
citations
Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
ICCV 2025
0
citations