Yinan He

14

Papers

2,679

Total Citations

Papers (14)

VBench: Comprehensive Benchmark Suite for Video Generative Models

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

VideoMamba: State Space Model for Efficient Video Understanding

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

WISNet: Pseudo Label Generation on Unbalanced and Patch Annotated Waste Images

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding