Siliang Tang

16
Papers
101
Total Citations

Papers (16)

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

ICML 2025
63
citations

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

CVPR 2025
18
citations

Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program

ICCV 2025
10
citations

Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning

NeurIPS 2025
6
citations

Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark

ICML 2025
4
citations

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

ICCV 2025
0
citations

Data Shunt: Collaboration of Small and Large Models for Lower Costs and Better Performance

AAAI 2024
0
citations

DIEM: Decomposition-Integration Enhancing Multimodal Insights

CVPR 2024
0
citations

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

CVPR 2024
0
citations

Auto-Encoding Morph-Tokens for Multimodal LLM

ICML 2024
0
citations

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

ICML 2024
0
citations

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

ICML 2024
0
citations

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea

CVPR 2025
0
citations

STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training

CVPR 2025
0
citations

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

ICCV 2025
0
citations

Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness

ICCV 2025
0
citations