ICML Poster "vision-language models" Papers
39 papers found
Amend to Alignment: Decoupled Prompt Tuning for Mitigating Spurious Correlation in Vision-Language Models
Jie ZHANG, Xiaosong Ma, Song Guo et al.
A Multimodal Automated Interpretability Agent
Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang et al.
An Empirical Study Into What Matters for Calibrating Vision-Language Models
Weijie Tu, Weijian Deng, Dylan Campbell et al.
ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations
Kailas Vodrahalli, James Zou
A Touch, Vision, and Language Dataset for Multimodal Alignment
Letian Fu, Gaurav Datta, Huang Huang et al.
Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models
Zhihe Lu, Jiawang Bai, Xin Li et al.
Bridging Environments and Language with Rendering Functions and Vision-Language Models
Théo Cachet, Christopher Dance, Olivier Sigaud
Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data
Jiahan Zhang, Qi Wei, Feng Liu et al.
Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
Yunheng Li, Zhong-Yu Li, Quan-Sheng Zeng et al.
Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models
Zhengbo Wang, Jian Liang, Ran He et al.
DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection
Zhi Zhou, Ming Yang, Jiang-Xin Shi et al.
Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior
Shuyu Cheng, Yibo Miao, Yinpeng Dong et al.
Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection
Chentao Cao, Zhun Zhong, Zhanke Zhou et al.
Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models
Mingrui Wu, Jiayi Ji, Oucheng Huang et al.
Exploring Intrinsic Dimension for Vision-Language Model Pruning
Hanzhang Wang, Jiawen Zhang, Qingyuan Ma
Extracting Training Data From Document-Based VQA Models
Francesco Pinto, Nathalie Rauschmayr, Florian Tramer et al.
Fool Your (Vision and) Language Model with Embarrassingly Simple Permutations
Yongshuo Zong, Tingyang Yu, Ruchika Chavhan et al.
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
Ling Li, Yu Ye, Bingchuan Jiang et al.
Gradient-based Visual Explanation for Transformer-based CLIP
Chenyang ZHAO, Kun Wang, Xingyu Zeng et al.
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
Zhaorun Chen, Zhuokai Zhao, HONGYIN LUO et al.
Harmonizing Generalization and Personalization in Federated Prompt Learning
Tianyu Cui, Hongxia Li, Jingya Wang et al.
Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Luke Bailey, Euan Ong, Stuart Russell et al.
Improving fine-grained understanding in image-text pre-training
Ioana Bica, Anastasija Ilic, Matthias Bauer et al.
Language-Driven Cross-Modal Classifier for Zero-Shot Multi-Label Image Recognition
Yicheng Liu, Jie Wen, Chengliang Liu et al.
Let Go of Your Labels with Unsupervised Transfer
Artyom Gadetsky, Yulun Jiang, Maria Brbic
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Kai Zhang, Yi Luan, Hexiang Hu et al.
Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
Shibo Jie, Yehui Tang, Ning Ding et al.
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Kaining Ying, Fanqing Meng, Jin Wang et al.
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Samuel Lavoie, Polina Kirichenko, Mark Ibrahim et al.
Open-Vocabulary Calibration for Fine-tuned CLIP
Shuoyuan Wang, Jindong Wang, Guoqing Wang et al.
Position: The Platonic Representation Hypothesis
Minyoung Huh, Brian Cheung, Tongzhou Wang et al.
Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion
Xuantong Liu, Tianyang Hu, Wenjia Wang et al.
Revisiting the Role of Language Priors in Vision-Language Models
Zhiqiu Lin, Xinyue Chen, Deepak Pathak et al.
Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
Christian Schlarmann, Naman Singh, Francesco Croce et al.
Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
Yongshuo Zong, Ondrej Bohdal, Tingyang Yu et al.
SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code
ziniu hu, Ahmet Iscen, Aashi Jain et al.
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment
Ziping Ma, Furong Xu, Jian liu et al.
Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models
Yifei Ming, Sharon Li
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
Jinhao Li, Haopeng Li, Sarah Erfani et al.