Limin Wang
79
Papers
3,511
Total Citations
11
h-index
Papers (79)
VBench: Comprehensive Benchmark Suite for Video Generative Models
CVPR 2024
996
citations
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024
864
citations
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
CVPR 2024
589
citations
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
ICLR 2024
408
citations
VideoMamba: State Space Model for Efficient Video Understanding
ECCV 2024
396
citations
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
CVPR 2024
84
citations
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
ICLR 2025
39
citations
Sparse Global Matching for Video Frame Interpolation with Large Motion
CVPR 2024
27
citations
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
CVPR 2024
17
citations
Scalable Image Tokenization with Index Backpropagation Quantization
ICCV 2025
16
citations
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
CVPR 2024
12
citations
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
CVPR 2025
11
citations
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
ICLR 2025
11
citations
Online Video Understanding: OVBench and VideoChat-Online
CVPR 2025arXiv
9
citations
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
ICLR 2025
9
citations
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
ICCV 2025
8
citations
Contextual AD Narration with Interleaved Multimodal Sequence
CVPR 2025arXiv
7
citations
Make Your Training Flexible: Towards Deployment-Efficient Video Models
ICCV 2025
5
citations
StreamForest: Efficient Online Video Understanding with Persistent Event Memory
NeurIPS 2025
3
citations
CGA-Net: Category Guided Aggregation for Point Cloud Semantic Segmentation
CVPR 2021
0
citations
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection
CVPR 2022arXiv
0
citations
OCSampler: Compressing Videos to One Clip With Single-Step Sampling
CVPR 2022arXiv
0
citations
Cross-Architecture Self-Supervised Video Representation Learning
CVPR 2022arXiv
0
citations
AdaMixer: A Fast-Converging Query-Based Object Detector
CVPR 2022arXiv
0
citations
Structured Sparse R-CNN for Direct Scene Graph Generation
CVPR 2022
0
citations
Task-Specific Inconsistency Alignment for Domain Adaptive Object Detection
CVPR 2022arXiv
0
citations
MixFormer: End-to-End Tracking With Iterative Mixed Attention
CVPR 2022arXiv
0
citations
Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation
CVPR 2023arXiv
0
citations
VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking
CVPR 2023arXiv
0
citations
PDPP:Projected Diffusion for Procedure Planning in Instructional Videos
CVPR 2023arXiv
0
citations
STMixer: A One-Stage Sparse Action Detector
CVPR 2023arXiv
0
citations
LinK: Linear Kernel for LiDAR-Based 3D Perception
CVPR 2023arXiv
0
citations
Temporal Action Detection With Structured Segment Networks
ICCV 2017arXiv
0
citations
LIP: Local Importance-Based Pooling
ICCV 2019
0
citations
TAM: Temporal Adaptive Module for Video Recognition
ICCV 2021arXiv
0
citations
Target Adaptive Context Aggregation for Video Scene Graph Generation
ICCV 2021arXiv
0
citations
Mutual Supervision for Dense Object Detection
ICCV 2021arXiv
0
citations
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
ICCV 2021arXiv
0
citations
PyMAF: 3D Human Pose and Shape Regression With Pyramidal Mesh Alignment Feedback Loop
ICCV 2021arXiv
0
citations
Self Supervision to Distillation for Long-Tailed Visual Recognition
ICCV 2021arXiv
0
citations
MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
ICCV 2021arXiv
0
citations
Multiple Object Tracking as ID Prediction
CVPR 2025
0
citations
Memory-and-Anticipation Transformer for Online Action Understanding
ICCV 2023arXiv
0
citations
UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding
ICCV 2023
0
citations
MGMAE: Motion Guided Masking for Video Masked Autoencoding
ICCV 2023arXiv
0
citations
MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking
ICCV 2023arXiv
0
citations
SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes
ICCV 2023arXiv
0
citations
Efficient Video Action Detection with Token Dropout and Context Refinement
ICCV 2023arXiv
0
citations
Deep Equilibrium Object Detection
ICCV 2023arXiv
0
citations
StageInteractor: Query-based Object Detector with Cross-stage Interaction
ICCV 2023arXiv
0
citations
SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos
ICCV 2023arXiv
0
citations
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
ICCV 2023arXiv
0
citations
Actions as Moving Points
ECCV 2020
0
citations
Boundary-Aware Cascade Networks for Temporal Action Segmentation
ECCV 2020
0
citations
Context-Aware RCNN: A Baseline for Action Detection in Videos
ECCV 2020
0
citations
Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
ECCV 2022
0
citations
Relaxed Transformer Decoders for Direct Action Proposal Generation
ICCV 2021arXiv
0
citations
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
CVPR 2025
0
citations
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
CVPR 2025
0
citations
p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
ICCV 2025
0
citations
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
ICCV 2025arXiv
0
citations
Dual DETRs for Multi-Label Temporal Action Detection
CVPR 2024
0
citations
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
CVPR 2024
0
citations
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
CVPR 2024
0
citations
Action Recognition With Trajectory-Pooled Deep-Convolutional Descriptors
CVPR 2015
0
citations
Actionness Estimation Using Hybrid Fully Convolutional Networks
CVPR 2016
0
citations
Real-Time Action Recognition With Enhanced Motion Vector CNNs
CVPR 2016
0
citations
Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos
CVPR 2017arXiv
0
citations
UntrimmedNets for Weakly Supervised Action Recognition and Detection
CVPR 2017arXiv
0
citations
Appearance-and-Relation Networks for Video Classification
CVPR 2018arXiv
0
citations
Learning Actor Relation Graphs for Group Activity Recognition
CVPR 2019
0
citations
Translate-to-Recognize Networks for RGB-D Scene Recognition
CVPR 2019
0
citations
TEA: Temporal Excitation and Aggregation for Action Recognition
CVPR 2020arXiv
0
citations
SketchyCOCO: Image Generation From Freehand Scene Sketches
CVPR 2020arXiv
0
citations
TDN: Temporal Difference Networks for Efficient Action Recognition
CVPR 2021arXiv
0
citations
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
NeurIPS 2022
0
citations
PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points
NeurIPS 2022
0
citations
JourneyDB: A Benchmark for Generative Image Understanding
NeurIPS 2023
0
citations
MixFormerV2: Efficient Fully Transformer Tracking
NeurIPS 2023
0
citations