Yinan He

11

Papers

2,765

Total Citations

Papers (11)

VBench: Comprehensive Benchmark Suite for Video Generative Models

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

VideoMamba: State Space Model for Efficient Video Understanding

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

NeurIPS 2025arXiv

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

NeurIPS 2025arXiv

WISNet: Pseudo Label Generation on Unbalanced and Patch Annotated Waste Images

DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations