Kunchang Li
9
Papers
1,706
Total Citations
Papers (9)
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024
864
citations
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
ICLR 2024
408
citations
VideoMamba: State Space Model for Efficient Video Understanding
ECCV 2024
396
citations
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
CVPR 2025arXiv
19
citations
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
ICLR 2025
9
citations
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
CVPR 2025
5
citations
Make Your Training Flexible: Towards Deployment-Efficient Video Models
ICCV 2025
5
citations
Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
AAAI 2025
0
citations
Vlogger: Make Your Dream A Vlog
CVPR 2024
0
citations