Qinghao Ye

9

Papers

738

Total Citations

Papers (9)

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning

TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training

LLaVA-Critic: Learning to Evaluate Multimodal Models

BUS: Efficient and Effective Vision-Language Pre-Training with Bottom-Up Patch Summarization.

Temporal Cue Guided Video Highlight Detection With Low-Rank Audio-Visual Fusion

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

Learning Trajectory-Word Alignments for Video-Language Tasks