Haiyang Xu
12
Papers
759
Total Citations
Papers (12)
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
CVPR 2024
601
citations
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
CVPR 2024
116
citations
Bayesian Diffusion Models for 3D Shape Reconstruction
CVPR 2024
23
citations
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
CVPR 2025
7
citations
TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training
AAAI 2024arXiv
6
citations
YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
ICCV 2025
6
citations
DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion
ICCV 2025
0
citations
Science-T2I: Addressing Scientific Illusions in Image Synthesis
CVPR 2025
0
citations
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
CVPR 2022arXiv
0
citations
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
ICCV 2023arXiv
0
citations
Learning Trajectory-Word Alignments for Video-Language Tasks
ICCV 2023arXiv
0
citations
BUS: Efficient and Effective Vision-Language Pre-Training with Bottom-Up Patch Summarization.
ICCV 2023arXiv
0
citations