Haoyu Cao
4
Papers
55
Total Citations
Papers (4)
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
CVPR 2024
37
citations
VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model
NeurIPS 2025
17
citations
BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models
ICCV 2025
1
citations
HRVDA: High-Resolution Visual Document Assistant
CVPR 2024
0
citations