Chaoyou Fu
10
Papers
2,450
Total Citations
17
h-index
Papers (10)
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
NeurIPS 2025
1,227
citations
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
CVPR 2025
858
citations
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
NeurIPS 2025arXiv
130
citations
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
ICML 2025
103
citations
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
ICML 2025
88
citations
No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
CVPR 2024
27
citations
VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model
NeurIPS 2025
17
citations
Learning Interleaved Image-Text Comprehension in Vision-Language Large Models
ICLR 2025
0
citations
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
CVPR 2025
0
citations
Aligning and Prompting Everything All at Once for Universal Visual Perception
CVPR 2024
0
citations