18
Papers
2,433
Total Citations
17
h-index

Papers (18)

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

NeurIPS 2025
1,227
citations

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

CVPR 2025
858
citations

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

NeurIPS 2025arXiv
130
citations

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

ICML 2025
103
citations

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

ICML 2025
88
citations

No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

CVPR 2024
27
citations

Rethinking Image Cropping: Exploring Diverse Compositions From Global Views

CVPR 2022
0
citations

CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification

ICCV 2021
0
citations

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

CVPR 2025
0
citations

Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

ICLR 2025
0
citations

Aligning and Prompting Everything All at Once for Universal Visual Perception

CVPR 2024
0
citations

Cross-Spectral Face Hallucination via Disentangling Independent Factors

CVPR 2020arXiv
0
citations

Information Bottleneck Disentanglement for Identity Swapping

CVPR 2021
0
citations

Pareidolia Face Reenactment

CVPR 2021arXiv
0
citations

Dual Variational Generation for Low Shot Heterogeneous Face Recognition

NeurIPS 2019
0
citations

AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection

NeurIPS 2020
0
citations

Multi-modal Queried Object Detection in the Wild

NeurIPS 2023
0
citations

CAPro: Webly Supervised Learning with Cross-modality Aligned Prototypes

NeurIPS 2023
0
citations