Yizhuo Li
10
Papers
1,279
Total Citations
Papers (10)
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024
864
citations
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
ICLR 2024
408
citations
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
CVPR 2025
7
citations
UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding
ICCV 2023
0
citations
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
ICCV 2023arXiv
0
citations
PGT: A Progressive Method for Training Models on Long Videos
CVPR 2021arXiv
0
citations
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
ICCV 2025
0
citations
TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model
CVPR 2020arXiv
0
citations
HOI Analysis: Integrating and Decomposing Human-Object Interaction
NeurIPS 2020
0
citations
Test-Time Personalization with a Transformer for Human Pose Estimation
NeurIPS 2021
0
citations