ICLR Poster "vision-language models" Papers
30 papers found
$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs
Vlad Sobal, Mark Ibrahim, Randall Balestriero et al.
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
Jiafei Duan, Wilbert Pumacay, Nishanth Kumar et al.
Aligning Visual Contrastive learning models via Preference Optimization
Amirabbas Afzali, Borna khodabandeh, Ali Rasekh et al.
Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model
Long Le, Jason Xie, William Liang et al.
Attribute-based Visual Reprogramming for Vision-Language Models
Chengyi Cai, Zesheng Ye, Lei Feng et al.
C-CLIP: Multimodal Continual Learning for Vision-Language Model
Wenzhuo Liu, Fei Zhu, Longhui Wei et al.
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning
Ji Qi, Ming Ding, Weihan Wang et al.
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Marco Mistretta, Alberto Baldrati, Lorenzo Agnolucci et al.
DAMO: Decoding by Accumulating Activations Momentum for Mitigating Hallucinations in Vision-Language Models
Kaishen Wang, Hengrui Gu, Meijun Gao et al.
Divergence-enhanced Knowledge-guided Context Optimization for Visual-Language Prompt Tuning
Yilun Li, Miaomiao Cheng, Xu Han et al.
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities
Zheyuan Zhang, Fengyuan Hu, Jayjun Lee et al.
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data
Yucheng Shi, Quanzheng Li, Jin Sun et al.
Enhancing Vision-Language Model with Unmasked Token Alignment
Hongsheng Li, Jihao Liu, Boxiao Liu et al.
Language-Assisted Feature Transformation for Anomaly Detection
EungGu Yun, Heonjin Ha, Yeongwoo Nam et al.
Locality Alignment Improves Vision-Language Models
Ian Covert, Tony Sun, James Y Zou et al.
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
Mohammad Shahab Sepehri, Zalan Fabian, Maryam Soltanolkotabi et al.
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
Wenbo Hu, Jia-Chen Gu, Zi-Yi Dou et al.
Multi-Label Test-Time Adaptation with Bound Entropy Minimization
Xiangyu Wu, Feng Yu, Yang Yang et al.
Noisy Test-Time Adaptation in Vision-Language Models
Chentao Cao, Zhun Zhong, (Andrew) Zhanke Zhou et al.
Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models
Linh Tran, Wei Sun, Stacy Patterson et al.
RA-TTA: Retrieval-Augmented Test-Time Adaptation for Vision-Language Models
Youngjun Lee, Doyoung Kim, Junhyeok Kang et al.
Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation
Jihyo Kim, Seulbi Lee, Sangheum Hwang
SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP
Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang et al.
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo, Wenchao Xu, Zhong Zhang et al.
Should VLMs be Pre-trained with Image Data?
Sedrick Keh, Jean Mercat, Samir Yitzhak Gadre et al.
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
Jiankang Chen, Tianke Zhang, Changyi Liu et al.
Teaching Human Behavior Improves Content Understanding Abilities Of VLMs
SOMESH SINGH, Harini S I, Yaman Singla et al.
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Shi Yu, Chaoyue Tang, Bokai Xu et al.
What Makes a Maze Look Like a Maze?
Joy Hsu, Jiayuan Mao, Joshua B Tenenbaum et al.