Limin Wang
27
Papers
3,615
Total Citations
11
h-index
Papers (27)
VBench: Comprehensive Benchmark Suite for Video Generative Models
CVPR 2024
996
citations
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024
864
citations
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
CVPR 2024
589
citations
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
ICLR 2024
408
citations
VideoMamba: State Space Model for Efficient Video Understanding
ECCV 2024arXiv
401
citations
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
CVPR 2024
84
citations
Multiple Object Tracking as ID Prediction
CVPR 2025arXiv
53
citations
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
ICLR 2025arXiv
41
citations
Sparse Global Matching for Video Frame Interpolation with Large Motion
CVPR 2024
27
citations
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
CVPR 2025arXiv
25
citations
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
CVPR 2025arXiv
19
citations
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
CVPR 2024
17
citations
Scalable Image Tokenization with Index Backpropagation Quantization
ICCV 2025
16
citations
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
CVPR 2024
12
citations
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
ICLR 2025
11
citations
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
CVPR 2025
11
citations
Online Video Understanding: OVBench and VideoChat-Online
CVPR 2025arXiv
9
citations
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
ICLR 2025
9
citations
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
ICCV 2025
8
citations
Contextual AD Narration with Interleaved Multimodal Sequence
CVPR 2025arXiv
7
citations
Make Your Training Flexible: Towards Deployment-Efficient Video Models
ICCV 2025
5
citations
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
NeurIPS 2025
3
citations
Dual DETRs for Multi-Label Temporal Action Detection
CVPR 2024
0
citations
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
ICCV 2025arXiv
0
citations
p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
ICCV 2025
0
citations
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
CVPR 2024
0
citations
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
CVPR 2024
0
citations