Chaoyou Fu
18
Papers
2,433
Total Citations
17
h-index
Papers (18)
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
NeurIPS 2025
1,227
citations
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
CVPR 2025
858
citations
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
NeurIPS 2025arXiv
130
citations
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
ICML 2025
103
citations
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
ICML 2025
88
citations
No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
CVPR 2024
27
citations
Rethinking Image Cropping: Exploring Diverse Compositions From Global Views
CVPR 2022
0
citations
CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification
ICCV 2021
0
citations
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
CVPR 2025
0
citations
Learning Interleaved Image-Text Comprehension in Vision-Language Large Models
ICLR 2025
0
citations
Aligning and Prompting Everything All at Once for Universal Visual Perception
CVPR 2024
0
citations
Cross-Spectral Face Hallucination via Disentangling Independent Factors
CVPR 2020arXiv
0
citations
Information Bottleneck Disentanglement for Identity Swapping
CVPR 2021
0
citations
Pareidolia Face Reenactment
CVPR 2021arXiv
0
citations
Dual Variational Generation for Low Shot Heterogeneous Face Recognition
NeurIPS 2019
0
citations
AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection
NeurIPS 2020
0
citations
Multi-modal Queried Object Detection in the Wild
NeurIPS 2023
0
citations
CAPro: Webly Supervised Learning with Cross-modality Aligned Prototypes
NeurIPS 2023
0
citations