"vision-language alignment" Papers
14 papers found
$\Delta \mathrm{Energy}$: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization
Lin Zhu, Yifeng Yang, Xinbing Wang et al.
NeurIPS 2025poster
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text matching
Yang Liu, Wentao Feng, Zhuoyao Liu et al.
ICCV 2025posterarXiv:2503.14953
1
citations
Assessing and Learning Alignment of Unimodal Vision and Language Models
Le Zhang, Qian Yang, Aishwarya Agrawal
CVPR 2025highlightarXiv:2412.04616
14
citations
CompCap: Improving Multimodal Large Language Models with Composite Captions
Xiaohui Chen, Satya Narayan Shukla, Mahmoud Azab et al.
ICCV 2025posterarXiv:2412.05243
6
citations
Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning
Jingjing Jiang, Chao Ma, Xurui Song et al.
ICCV 2025highlightarXiv:2507.07424
7
citations
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
Shicheng Xu, Liang Pang, Yunchang Zhu et al.
ICLR 2025posterarXiv:2410.12662
14
citations
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Dahyun Kang, Piotr Bojanowski, Huy V. Vo et al.
CVPR 2025posterarXiv:2412.16334
41
citations
Fix-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text
Bingchao Wang, Zhiwei Ning, Jianyu Ding et al.
ICCV 2025posterarXiv:2507.10095
7
citations
RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Chest X-ray with Zero-Shot Multi-Task Capability
Jonggwon Park, Byungmu Yoon, Soobum Kim et al.
NeurIPS 2025posterarXiv:2504.07416
1
citations
VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set
Shufan Shen, Junshu Sun, Qingming Huang et al.
NeurIPS 2025posterarXiv:2510.21323
1
citations
CLIM: Contrastive Language-Image Mosaic for Region Representation
Size Wu, Wenwei Zhang, Lumin XU et al.
AAAI 2024paperarXiv:2312.11376
24
citations
Multi-Task Domain Adaptation for Language Grounding with 3D Objects
Penglei SUN, Yaoxian Song, Xinglin Pan et al.
ECCV 2024posterarXiv:2407.02846
2
citations
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
Meng Chu, Zhedong Zheng, Wei Ji et al.
ECCV 2024posterarXiv:2311.12751
25
citations
Weakly Supervised Open-Vocabulary Object Detection
Jianghang Lin, Yunhang Shen, Bingquan Wang et al.
AAAI 2024paperarXiv:2312.12437
16
citations