Kunchang Li

15

Papers

1,706

Total Citations

Papers (15)

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

VideoMamba: State Space Model for Efficient Video Understanding

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Make Your Training Flexible: Towards Deployment-Efficient Video Models

V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents

MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification

Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

Vlogger: Make Your Dream A Vlog

PointCLIP: Point Cloud Understanding by CLIP

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Self-Slimmed Vision Transformer