Yinan He
11
Papers
2,765
Total Citations
Papers (11)
VBench: Comprehensive Benchmark Suite for Video Generative Models
CVPR 2024arXiv
996
citations
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024arXiv
864
citations
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
ICLR 2024arXiv
408
citations
VideoMamba: State Space Model for Efficient Video Understanding
ECCV 2024arXiv
401
citations
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
ICLR 2025arXiv
48
citations
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
CVPR 2025arXiv
19
citations
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
NeurIPS 2025arXiv
13
citations
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
ICCV 2025arXiv
8
citations
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
NeurIPS 2025arXiv
7
citations
WISNet: Pseudo Label Generation on Unbalanced and Patch Annotated Waste Images
CVPR 2025
1
citations
DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations
ICCV 2025
0
citations