Poster "vision-language models" Papers

475 papers found • Page 1 of 10

$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

Vlad Sobal, Mark Ibrahim, Randall Balestriero et al.

ICLR 2025arXiv:2407.18134
13
citations

3D-SPATIAL MULTIMODAL MEMORY

Xueyan Zou, Yuchen Song, Ri-Zhao Qiu et al.

ICLR 2025arXiv:2503.16413
2
citations

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.

CVPR 2025arXiv:2411.18674
15
citations

AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining

Hongyuan Dong, Dingkang Yang, Xiao Liang et al.

NEURIPS 2025arXiv:2506.13274
3
citations

Adapting Text-to-Image Generation with Feature Difference Instruction for Generic Image Restoration

Chao Wang, Hehe Fan, Huichen Yang et al.

CVPR 2025
2
citations

Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning

Amit Peleg, Naman Deep Singh, Matthias Hein

NEURIPS 2025arXiv:2505.24424
2
citations

Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training

Haicheng Wang, Chen Ju, Weixiong Lin et al.

CVPR 2025arXiv:2412.00440
10
citations

AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark

Aruna Gauba, Irene Pi, Yunze Man et al.

NEURIPS 2025arXiv:2504.10568
2
citations

AgroBench: Vision-Language Model Benchmark in Agriculture

Risa Shinoda, Nakamasa Inoue, Hirokatsu Kataoka et al.

ICCV 2025arXiv:2507.20519
7
citations

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

Jiafei Duan, Wilbert Pumacay, Nishanth Kumar et al.

ICLR 2025arXiv:2410.00371
85
citations

Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment

Hua Ye, Hang Ding, Siyuan Chen et al.

NEURIPS 2025arXiv:2511.08399

Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning

Junming Liu, Siyuan Meng, Yanting Gao et al.

ICCV 2025arXiv:2503.12972
19
citations

Aligning Visual Contrastive learning models via Preference Optimization

Amirabbas Afzali, Borna khodabandeh, Ali Rasekh et al.

ICLR 2025arXiv:2411.08923
3
citations

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding

Ahmed Masry, Juan Rodriguez, Tianyu Zhang et al.

NEURIPS 2025arXiv:2502.01341
1
citations

All in One: Visual-Description-Guided Unified Point Cloud Segmentation

Zongyan Han, Mohamed El Amine Boudjoghra, Jiahua Dong et al.

ICCV 2025arXiv:2507.05211
1
citations

AmorLIP: Efficient Language-Image Pretraining via Amortization

Haotian Sun, Yitong Li, Yuchen Zhuang et al.

NEURIPS 2025arXiv:2505.18983
2
citations

A Multimodal Benchmark for Framing of Oil & Gas Advertising and Potential Greenwashing Detection

Gaku Morio, Harri Rowlands, Dominik Stammbach et al.

NEURIPS 2025arXiv:2510.21679

An Information-theoretical Framework for Understanding Out-of-distribution Detection with Pretrained Vision-Language Models

Bo Peng, Jie Lu, Guangquan Zhang et al.

NEURIPS 2025

An Intelligent Agentic System for Complex Image Restoration Problems

Kaiwen Zhu, Jinjin Gu, Zhiyuan You et al.

ICLR 2025arXiv:2410.17809
26
citations

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives

Shaoyuan Xie, Lingdong Kong, Yuhao Dong et al.

ICCV 2025arXiv:2501.04003
74
citations

Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model

Long Le, Jason Xie, William Liang et al.

ICLR 2025arXiv:2410.13882
44
citations

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs

Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.

CVPR 2025arXiv:2412.03324
27
citations

Attention! Your Vision Language Model Could Be Maliciously Manipulated

Xiaosen Wang, Shaokang Wang, Zhijin Ge et al.

NEURIPS 2025arXiv:2505.19911
3
citations

Attribute-based Visual Reprogramming for Vision-Language Models

Chengyi Cai, Zesheng Ye, Lei Feng et al.

ICLR 2025arXiv:2501.13982
5
citations

Automated Model Discovery via Multi-modal & Multi-step Pipeline

Lee Jung-Mok, Nam Hyeon-Woo, Moon Ye-Bin et al.

NEURIPS 2025arXiv:2509.25946

A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets

Zexi Jia, Chuanwei Huang, Yeshuang Zhu et al.

ICCV 2025arXiv:2507.04699
3
citations

BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs

Zhantao Yang, Ruili Feng, Keyu Yan et al.

CVPR 2025arXiv:2407.03314
3
citations

Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection

Zhen Qu, Xian Tao, Xinyi Gong et al.

CVPR 2025arXiv:2503.10080
26
citations

Bayesian Test-Time Adaptation for Vision-Language Models

Lihua Zhou, Mao Ye, Shuaifeng Li et al.

CVPR 2025arXiv:2503.09248
11
citations

BeliefMapNav: 3D Voxel-Based Belief Map for Zero-Shot Object Navigation

Zibo Zhou, Yue Hu, Lingkai Zhang et al.

NEURIPS 2025arXiv:2506.06487
5
citations

Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs

Qizhe Zhang, Aosong Cheng, Ming Lu et al.

ICCV 2025arXiv:2412.01818
45
citations

Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning

Hairui Ren, Fan Tang, He Zhao et al.

CVPR 2025arXiv:2504.11930

Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection

Yupeng Hu, Changxing Ding, Chang Sun et al.

ICCV 2025arXiv:2507.06510
4
citations

Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language Recognition

Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.

ICLR 2025arXiv:2502.15809
5
citations

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Yunhan Zhao, Xiang Zheng, Lin Luo et al.

ICLR 2025arXiv:2410.20971
20
citations

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding

Shuming Liu, Chen Zhao, Tianqi Xu et al.

CVPR 2025arXiv:2503.21483
28
citations

Boosting the visual interpretability of CLIP via adversarial fine-tuning

Shizhan Gong, Haoyu LEI, Qi Dou et al.

ICLR 2025
7
citations

Bridging the Gap Between Ideal and Real-world Evaluation: Benchmarking AI-Generated Image Detection in Challenging Scenarios

Chunxiao Li, Xiaoxiao Wang, Meiling Li et al.

ICCV 2025arXiv:2509.09172
1
citations

Bridging the gap to real-world language-grounded visual concept learning

whie jung, Semin Kim, Junee Kim et al.

NEURIPS 2025arXiv:2510.21412

Causality-guided Prompt Learning for Vision-language Models via Visual Granulation

Mengyu Gao, Qiulei Dong

ICCV 2025arXiv:2509.03803
1
citations

C-CLIP: Multimodal Continual Learning for Vision-Language Model

Wenzhuo Liu, Fei Zhu, Longhui Wei et al.

ICLR 2025
13
citations

CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting

Lei Tian, Xiaomin Li, Liqian Ma et al.

ICCV 2025arXiv:2505.20469
2
citations

CF-VLM:CounterFactual Vision-Language Fine-tuning

jusheng zhang, Kaitong Cai, Yijia Fan et al.

NEURIPS 2025

Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks

Peng Xie, Yequan Bie, Jianda Mao et al.

CVPR 2025arXiv:2411.15720
13
citations

ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models

Ke Niu, Haiyang Yu, Mengyang Zhao et al.

ICCV 2025arXiv:2502.19958
8
citations

Class Distribution-induced Attention Map for Open-vocabulary Semantic Segmentations

Dong Un Kang, Hayeon Kim, Se Young Chun

ICLR 2025

CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning

Ji Qi, Ming Ding, Weihan Wang et al.

ICLR 2025arXiv:2402.04236
36
citations

Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues

Francesco Taioli, Edoardo Zorzi, Gianni Franchi et al.

ICCV 2025arXiv:2412.01250
4
citations

Compositional Entailment Learning for Hyperbolic Vision-Language Models

Avik Pal, Max van Spengler, Guido D'Amely di Melendugno et al.

ICLR 2025arXiv:2410.06912
37
citations

Context-Aware Academic Emotion Dataset and Benchmark

Luming Zhao, Jingwen Xuan, Jiamin Lou et al.

ICCV 2025arXiv:2507.00586
PreviousNext