27
Papers
3,615
Total Citations
11
h-index

Papers (27)

VBench: Comprehensive Benchmark Suite for Video Generative Models

CVPR 2024
996
citations

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

CVPR 2024
864
citations

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

CVPR 2024
589
citations

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

ICLR 2024
408
citations

VideoMamba: State Space Model for Efficient Video Understanding

ECCV 2024arXiv
401
citations

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

CVPR 2024
84
citations

Multiple Object Tracking as ID Prediction

CVPR 2025arXiv
53
citations

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

ICLR 2025arXiv
41
citations

Sparse Global Matching for Video Frame Interpolation with Large Motion

CVPR 2024
27
citations

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

CVPR 2025arXiv
25
citations

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

CVPR 2025arXiv
19
citations

Adapting Short-Term Transformers for Action Detection in Untrimmed Videos

CVPR 2024
17
citations

Scalable Image Tokenization with Index Backpropagation Quantization

ICCV 2025
16
citations

Asymmetric Masked Distillation for Pre-Training Small Foundation Models

CVPR 2024
12
citations

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning

ICLR 2025
11
citations

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning

CVPR 2025
11
citations

Online Video Understanding: OVBench and VideoChat-Online

CVPR 2025arXiv
9
citations

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

ICLR 2025
9
citations

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

ICCV 2025
8
citations

Contextual AD Narration with Interleaved Multimodal Sequence

CVPR 2025arXiv
7
citations

Make Your Training Flexible: Towards Deployment-Efficient Video Models

ICCV 2025
5
citations

StreamForest: Efficient Online Video Understanding with Persistent Event Memory

NeurIPS 2025
3
citations

Dual DETRs for Multi-Label Temporal Action Detection

CVPR 2024
0
citations

MobileViCLIP: An Efficient Video-Text Model for Mobile Devices

ICCV 2025arXiv
0
citations

p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay

ICCV 2025
0
citations

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

CVPR 2024
0
citations

SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

CVPR 2024
0
citations