Xing Sun
13
Papers
2,248
Total Citations
1
h-index
1
Affiliations
Affiliations
Tencent
Papers (13)
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
NeurIPS 2025
1,227
citations
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
CVPR 2025
858
citations
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
ICML 2025
103
citations
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
CVPR 2024
37
citations
SPD-DDPM: Denoising Diffusion Probabilistic Models in the Symmetric Positive Definite Space
AAAI 2024arXiv
13
citations
Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation
AAAI 2024arXiv
10
citations
HRVDA: High-Resolution Visual Document Assistant
CVPR 2024
0
citations
Aligning and Prompting Everything All at Once for Universal Visual Perception
CVPR 2024
0
citations
DS-VLM: Diffusion Supervision Vision Language Model
ICML 2025
0
citations
Probability-Density-aware Semi-supervised Learning
AAAI 2025
0
citations
Learning Interleaved Image-Text Comprehension in Vision-Language Large Models
ICLR 2025
0
citations
Visual Hallucination Elevates Speech Recognition
AAAI 2024
0
citations
A General and Efficient Training for Transformer via Token Expansion
CVPR 2024
0
citations