Limin Wang

Google Scholar OpenReview

27

Papers

3,615

Total Citations

11

h-index

Papers (27)

VBench: Comprehensive Benchmark Suite for Video Generative Models

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

VideoMamba: State Space Model for Efficient Video Understanding

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

Multiple Object Tracking as ID Prediction

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

Sparse Global Matching for Video Frame Interpolation with Large Motion

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Adapting Short-Term Transformers for Action Detection in Untrimmed Videos

Scalable Image Tokenization with Index Backpropagation Quantization

Asymmetric Masked Distillation for Pre-Training Small Foundation Models

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning

Online Video Understanding: OVBench and VideoChat-Online

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

Contextual AD Narration with Interleaved Multimodal Sequence

Make Your Training Flexible: Towards Deployment-Efficient Video Models

StreamForest: Efficient Online Video Understanding with Persistent Event Memory

Dual DETRs for Multi-Label Temporal Action Detection

MobileViCLIP: An Efficient Video-Text Model for Mobile Devices

p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos