Yuhang Zang
17
Papers
119
Total Citations
2
Affiliations
Affiliations
Nanyang Technological UniversityShanghai AI Lab
Papers (17)
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
CVPR 2025
37
citations
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
CVPR 2025arXiv
31
citations
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
ICML 2025
21
citations
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
ICLR 2025arXiv
19
citations
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
NeurIPS 2025
6
citations
Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
ICCV 2025
3
citations
Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data
ICCV 2025
2
citations
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
0
citations
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
ICCV 2025
0
citations
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
ICCV 2025
0
citations
X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting
ICCV 2025
0
citations
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
CVPR 2024
0
citations
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
CVPR 2025
0
citations
Conical Visual Concentration for Efficient Large Vision-Language Models
CVPR 2025
0
citations
WildAvatar: Learning In-the-wild 3D Avatars from the Web
CVPR 2025
0
citations
Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate
ICCV 2025
0
citations
MM-IFEngine: Towards Multimodal Instruction Following
ICCV 2025
0
citations