Yizhuo Li
4
Papers
1,279
Total Citations
Papers (4)
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024
864
citations
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
ICLR 2024
408
citations
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
CVPR 2025
7
citations
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
ICCV 2025
0
citations