Qinghao Ye
9
Papers
738
Total Citations
Papers (9)
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
CVPR 2024
601
citations
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
CVPR 2024
116
citations
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
ICLR 2025
15
citations
TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training
AAAI 2024arXiv
6
citations
LLaVA-Critic: Learning to Evaluate Multimodal Models
CVPR 2025
0
citations
BUS: Efficient and Effective Vision-Language Pre-Training with Bottom-Up Patch Summarization.
ICCV 2023arXiv
0
citations
Temporal Cue Guided Video Highlight Detection With Low-Rank Audio-Visual Fusion
ICCV 2021
0
citations
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
ICCV 2023arXiv
0
citations
Learning Trajectory-Word Alignments for Video-Language Tasks
ICCV 2023arXiv
0
citations