ICML "vision-language models" Papers

41 papers found

Amend to Alignment: Decoupled Prompt Tuning for Mitigating Spurious Correlation in Vision-Language Models

Jie ZHANG, Xiaosong Ma, Song Guo et al.

ICML 2024poster

A Multimodal Automated Interpretability Agent

Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang et al.

ICML 2024posterarXiv:2404.14394

An Empirical Study Into What Matters for Calibrating Vision-Language Models

Weijie Tu, Weijian Deng, Dylan Campbell et al.

ICML 2024posterarXiv:2402.07417

ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

Kailas Vodrahalli, James Zou

ICML 2024posterarXiv:2306.08141

A Touch, Vision, and Language Dataset for Multimodal Alignment

Letian Fu, Gaurav Datta, Huang Huang et al.

ICML 2024posterarXiv:2402.13232

Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models

Zhihe Lu, Jiawang Bai, Xin Li et al.

ICML 2024posterarXiv:2311.17091

Bridging Environments and Language with Rendering Functions and Vision-Language Models

Théo Cachet, Christopher Dance, Olivier Sigaud

ICML 2024poster

Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data

Jiahan Zhang, Qi Wei, Feng Liu et al.

ICML 2024posterarXiv:2406.10502

Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

Yunheng Li, Zhong-Yu Li, Quan-Sheng Zeng et al.

ICML 2024posterarXiv:2406.00670
20
citations

Code as Reward: Empowering Reinforcement Learning with VLMs

David Venuto, Mohammad Sami Nur Islam, Martin Klissarov et al.

ICML 2024spotlightarXiv:2402.04764

Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models

Zhengbo Wang, Jian Liang, Ran He et al.

ICML 2024posterarXiv:2402.04050

DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection

Zhi Zhou, Ming Yang, Jiang-Xin Shi et al.

ICML 2024posterarXiv:2406.00345

Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

Shuyu Cheng, Yibo Miao, Yinpeng Dong et al.

ICML 2024posterarXiv:2405.19098

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

Chentao Cao, Zhun Zhong, Zhanke Zhou et al.

ICML 2024posterarXiv:2406.00806

Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models

Mingrui Wu, Jiayi Ji, Oucheng Huang et al.

ICML 2024posterarXiv:2406.16449

Exploring Intrinsic Dimension for Vision-Language Model Pruning

Hanzhang Wang, Jiawen Zhang, Qingyuan Ma

ICML 2024poster

Extracting Training Data From Document-Based VQA Models

Francesco Pinto, Nathalie Rauschmayr, Florian Tramer et al.

ICML 2024posterarXiv:2407.08707

Fool Your (Vision and) Language Model with Embarrassingly Simple Permutations

Yongshuo Zong, Tingyang Yu, Ruchika Chavhan et al.

ICML 2024posterarXiv:2310.01651

GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

Ling Li, Yu Ye, Bingchuan Jiang et al.

ICML 2024posterarXiv:2406.18572

Gradient-based Visual Explanation for Transformer-based CLIP

Chenyang ZHAO, Kun Wang, Xingyu Zeng et al.

ICML 2024poster

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

Zhaorun Chen, Zhuokai Zhao, HONGYIN LUO et al.

ICML 2024posterarXiv:2403.00425

Harmonizing Generalization and Personalization in Federated Prompt Learning

Tianyu Cui, Hongxia Li, Jingya Wang et al.

ICML 2024posterarXiv:2405.09771

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Luke Bailey, Euan Ong, Stuart Russell et al.

ICML 2024posterarXiv:2309.00236

Improving fine-grained understanding in image-text pre-training

Ioana Bica, Anastasija Ilic, Matthias Bauer et al.

ICML 2024posterarXiv:2401.09865

Language-Driven Cross-Modal Classifier for Zero-Shot Multi-Label Image Recognition

Yicheng Liu, Jie Wen, Chengliang Liu et al.

ICML 2024poster

Let Go of Your Labels with Unsupervised Transfer

Artyom Gadetsky, Yulun Jiang, Maria Brbic

ICML 2024posterarXiv:2406.07236

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Kai Zhang, Yi Luan, Hexiang Hu et al.

ICML 2024posterarXiv:2403.19651

Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning

Shibo Jie, Yehui Tang, Ning Ding et al.

ICML 2024posterarXiv:2405.05615

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Kaining Ying, Fanqing Meng, Jin Wang et al.

ICML 2024posterarXiv:2404.16006

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Samuel Lavoie, Polina Kirichenko, Mark Ibrahim et al.

ICML 2024posterarXiv:2405.00740

Open-Vocabulary Calibration for Fine-tuned CLIP

Shuoyuan Wang, Jindong Wang, Guoqing Wang et al.

ICML 2024posterarXiv:2402.04655

Position: The Platonic Representation Hypothesis

Minyoung Huh, Brian Cheung, Tongzhou Wang et al.

ICML 2024poster

Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization

Jian Liang, Sheng, Zhengbo Wang et al.

ICML 2024spotlightarXiv:2308.12919

Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion

Xuantong Liu, Tianyang Hu, Wenjia Wang et al.

ICML 2024posterarXiv:2402.16305

Revisiting the Role of Language Priors in Vision-Language Models

Zhiqiu Lin, Xinyue Chen, Deepak Pathak et al.

ICML 2024posterarXiv:2306.01879

Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models

Christian Schlarmann, Naman Singh, Francesco Croce et al.

ICML 2024posterarXiv:2402.12336

Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models

Yongshuo Zong, Ondrej Bohdal, Tingyang Yu et al.

ICML 2024posterarXiv:2402.02207

SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code

ziniu hu, Ahmet Iscen, Aashi Jain et al.

ICML 2024posterarXiv:2403.01248

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

Ziping Ma, Furong Xu, Jian liu et al.

ICML 2024posterarXiv:2401.02137

Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models

Yifei Ming, Sharon Li

ICML 2024posterarXiv:2405.01468

Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models

Jinhao Li, Haopeng Li, Sarah Erfani et al.

ICML 2024posterarXiv:2406.02915