Kunchang Li

9

Papers

1,706

Total Citations

Papers (9)

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

VideoMamba: State Space Model for Efficient Video Understanding

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents

Make Your Training Flexible: Towards Deployment-Efficient Video Models

Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

Vlogger: Make Your Dream A Vlog