Mu Cai

5

Papers

218

Total Citations

Papers (5)

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Matryoshka Multimodal Models

Removing Distributional Discrepancies in Captions Improves Image-Text Alignment

Magma: A Foundation Model for Multimodal AI Agents

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models