Poster "vision-language models" Papers

178 papers found • Page 3 of 4

Vision-Language Models Can't See the Obvious

YASSER ABDELAZIZ DAHOU DJILALI, Ngoc Huynh, Phúc Lê Khắc et al.

ICCV 2025posterarXiv:2507.04741
7
citations

Vision-Language Models Do Not Understand Negation

Kumail Alhamoud, Shaden Alshammari, Yonglong Tian et al.

CVPR 2025posterarXiv:2501.09425
36
citations

Vision‑Language‑Vision Auto‑Encoder: Scalable Knowledge Distillation from Diffusion Models

Tiezheng Zhang, Yitong Li, Yu-Cheng Chou et al.

NeurIPS 2025posterarXiv:2507.07104
2
citations

ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding

Jialiang Kang, Han Shu, Wenshuo Li et al.

NeurIPS 2025posterarXiv:2509.15235
2
citations

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Shi Yu, Chaoyue Tang, Bokai Xu et al.

ICLR 2025posterarXiv:2410.10594
121
citations

VladVA: Discriminative Fine-tuning of LVLMs

Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.

CVPR 2025posterarXiv:2412.04378
11
citations

Vocabulary-Guided Gait Recognition

Panjian Huang, Saihui Hou, Chunshui Cao et al.

NeurIPS 2025poster

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Ailin Deng, Tri Cao, Zhirui Chen et al.

CVPR 2025posterarXiv:2503.02199
33
citations

Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts

Yanting Yang, Minghao Chen, Qibo Qiu et al.

ECCV 2024posterarXiv:2407.14872
4
citations

Adaptive Multi-task Learning for Few-shot Object Detection

Yan Ren, Yanling Li, Wai-Kin Adams Kong

ECCV 2024poster

Adapt without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models

MENGYU ZHENG, Yehui Tang, Zhiwei Hao et al.

ECCV 2024poster
6
citations

Amend to Alignment: Decoupled Prompt Tuning for Mitigating Spurious Correlation in Vision-Language Models

Jie ZHANG, Xiaosong Ma, Song Guo et al.

ICML 2024poster

A Multimodal Automated Interpretability Agent

Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang et al.

ICML 2024poster

An Empirical Study Into What Matters for Calibrating Vision-Language Models

Weijie Tu, Weijian Deng, Dylan Campbell et al.

ICML 2024poster

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Liang Chen, Haozhe Zhao, Tianyu Liu et al.

ECCV 2024posterarXiv:2403.06764
343
citations

ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

Kailas Vodrahalli, James Zou

ICML 2024poster

A Touch, Vision, and Language Dataset for Multimodal Alignment

Letian Fu, Gaurav Datta, Huang Huang et al.

ICML 2024poster

Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models

Zhihe Lu, Jiawang Bai, Xin Li et al.

ICML 2024poster

Bridging Environments and Language with Rendering Functions and Vision-Language Models

Théo Cachet, Christopher Dance, Olivier Sigaud

ICML 2024poster

Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data

Jiahan Zhang, Qi Wei, Feng Liu et al.

ICML 2024poster

Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

Yunheng Li, Zhong-Yu Li, Quan-Sheng Zeng et al.

ICML 2024poster
20
citations

CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts

Yichao Cai, Yuhang Liu, Zhen Zhang et al.

ECCV 2024posterarXiv:2311.16445
11
citations

Conceptual Codebook Learning for Vision-Language Models

Yi Zhang, Ke Yu, Siqi Wu et al.

ECCV 2024posterarXiv:2407.02350
6
citations

Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models

Zhengbo Wang, Jian Liang, Ran He et al.

ICML 2024poster

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities

Lorenzo Baraldi, Federico Cocchi, Marcella Cornia et al.

ECCV 2024posterarXiv:2407.20337
31
citations

DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection

Zhi Zhou, Ming Yang, Jiang-Xin Shi et al.

ICML 2024poster

Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

Shuyu Cheng, Yibo Miao, Yinpeng Dong et al.

ICML 2024poster

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

Chentao Cao, Zhun Zhong, Zhanke Zhou et al.

ICML 2024poster

Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models

Mingrui Wu, Jiayi Ji, Oucheng Huang et al.

ICML 2024poster

EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding

jiazhou zhou, Xu Zheng, Yuanhuiyi Lyu et al.

ECCV 2024posterarXiv:2308.03135
28
citations

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

Tong Shao, Zhuotao Tian, Hang Zhao et al.

ECCV 2024posterarXiv:2407.08268
44
citations

Exploring Intrinsic Dimension for Vision-Language Model Pruning

Hanzhang Wang, Jiawen Zhang, Qingyuan Ma

ICML 2024poster

Extracting Training Data From Document-Based VQA Models

Francesco Pinto, Nathalie Rauschmayr, Florian Tramer et al.

ICML 2024poster

FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

Hang Hua, Jing Shi, Kushal Kafle et al.

ECCV 2024posterarXiv:2404.14715
14
citations

Fool Your (Vision and) Language Model with Embarrassingly Simple Permutations

Yongshuo Zong, Tingyang Yu, Ruchika Chavhan et al.

ICML 2024poster

GalLop: Learning global and local prompts for vision-language models

Marc Lafon, Elias Ramzi, Clément Rambour et al.

ECCV 2024posterarXiv:2407.01400
39
citations

Generalizing to Unseen Domains via Text-guided Augmentation

Daiqing Qi, Handong Zhao, Aidong Zhang et al.

ECCV 2024poster

GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

Ling Li, Yu Ye, Bingchuan Jiang et al.

ICML 2024poster

Gradient-based Visual Explanation for Transformer-based CLIP

Chenyang ZHAO, Kun Wang, Xingyu Zeng et al.

ICML 2024poster

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

Zhaorun Chen, Zhuokai Zhao, HONGYIN LUO et al.

ICML 2024poster

Harmonizing Generalization and Personalization in Federated Prompt Learning

Tianyu Cui, Hongxia Li, Jingya Wang et al.

ICML 2024poster

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Luke Bailey, Euan Ong, Stuart Russell et al.

ICML 2024poster

Improving fine-grained understanding in image-text pre-training

Ioana Bica, Anastasija Ilic, Matthias Bauer et al.

ICML 2024poster

Improving Zero-Shot Generalization for CLIP with Variational Adapter

Ziqian Lu, Fengli Shen, Mushui Liu et al.

ECCV 2024poster
7
citations

Language-Driven Cross-Modal Classifier for Zero-Shot Multi-Label Image Recognition

Yicheng Liu, Jie Wen, Chengliang Liu et al.

ICML 2024poster

Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics

Shishira R Maiya, Anubhav Anubhav, Matthew Gwilliam et al.

ECCV 2024posterarXiv:2408.02672
7
citations

Let Go of Your Labels with Unsupervised Transfer

Artyom Gadetsky, Yulun Jiang, Maria Brbic

ICML 2024poster

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Kai Zhang, Yi Luan, Hexiang Hu et al.

ICML 2024poster

Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning

Shibo Jie, Yehui Tang, Ning Ding et al.

ICML 2024poster

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Kaining Ying, Fanqing Meng, Jin Wang et al.

ICML 2024poster