Papers (16)
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
CVPR 2024
601
citations
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
CVPR 2024
116
citations
DePT: Decoupled Prompt Tuning
CVPR 2024
60
citations
SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments
CVPR 2024
49
citations
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
CVPR 2025
7
citations
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
CVPR 2025arXiv
7
citations
TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training
AAAI 2024arXiv
6
citations
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
CVPR 2025
4
citations
MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments
ICCV 2025
2
citations
Graphical Contrastive Losses for Scene Graph Parsing
CVPR 2019
0
citations
Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss
CVPR 2021
0
citations
Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding
CVPR 2022arXiv
0
citations
DETA: Denoised Task Adaptation for Few-Shot Learning
ICCV 2023arXiv
0
citations
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
ICCV 2023arXiv
0
citations
Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
CVPR 2025
0
citations
Relationship Proposal Networks
CVPR 2017
0
citations