"vision-language models" Papers

570 papers found • Page 9 of 12

VladVA: Discriminative Fine-tuning of LVLMs

Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.

CVPR 2025arXiv:2412.04378
11
citations

VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving

Ruifei Zhang, Wei Zhang, Xiao Tan et al.

ICCV 2025arXiv:2511.06256
5
citations

VLMaterial: Procedural Material Generation with Large Vision-Language Models

Beichen Li, Rundi Wu, Armando Solar-Lezama et al.

ICLR 2025arXiv:2501.18623
5
citations

VLMs can Aggregate Scattered Training Patches

Zhanhui Zhou, Lingjie Chen, Chao Yang et al.

NEURIPS 2025arXiv:2506.03614

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Haozhe Wang, Chao Qu, Zuming Huang et al.

NEURIPS 2025spotlightarXiv:2504.08837
183
citations

Vocabulary-Guided Gait Recognition

Panjian Huang, Saihui Hou, Chunshui Cao et al.

NEURIPS 2025

VT-FSL: Bridging Vision and Text with LLMs for Few-Shot Learning

Wenhao Li, Qiangchang Wang, Xianjing Meng et al.

NEURIPS 2025arXiv:2509.25033
4
citations

Weakly-Supervised Learning of Dense Functional Correspondences

Stefan Stojanov, Linan Zhao, Yunzhi Zhang et al.

ICCV 2025arXiv:2509.03893

Web Artifact Attacks Disrupt Vision Language Models

Maan Qraitem, Piotr Teterwak, Kate Saenko et al.

ICCV 2025arXiv:2503.13652
2
citations

What Makes a Maze Look Like a Maze?

Joy Hsu, Jiayuan Mao, Joshua B Tenenbaum et al.

ICLR 2025arXiv:2409.08202
13
citations

When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning

Junwei Luo, Yingying Zhang, Xue Yang et al.

ICCV 2025arXiv:2503.07588
14
citations

Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering

Yangfu Li, Hongjian Zhan, Tianyi Chen et al.

NEURIPS 2025arXiv:2505.10118
1
citations

Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context

Ge Zheng, Jiaye Qian, Jiajin Tang et al.

ICCV 2025arXiv:2510.20229
8
citations

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Ailin Deng, Tri Cao, Zhirui Chen et al.

CVPR 2025arXiv:2503.02199
34
citations

Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding

seil kang, Jinyeong Kim, Junhyeok Kim et al.

CVPR 2025highlightarXiv:2503.06287
37
citations

A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models

Julio Silva-Rodríguez, Sina Hajimiri, Ismail Ben Ayed et al.

CVPR 2024arXiv:2312.12730
65
citations

Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts

Yanting Yang, Minghao Chen, Qibo Qiu et al.

ECCV 2024arXiv:2407.14872
5
citations

Adaptive Multi-task Learning for Few-shot Object Detection

Yan Ren, Yanling Li, Wai-Kin Adams Kong

ECCV 2024
6
citations

Adapt without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models

MENGYU ZHENG, Yehui Tang, Zhiwei Hao et al.

ECCV 2024
6
citations

Adversarial Prompt Tuning for Vision-Language Models

Jiaming Zhang, Xingjun Ma, Xin Wang et al.

ECCV 2024arXiv:2311.11261
34
citations

Amend to Alignment: Decoupled Prompt Tuning for Mitigating Spurious Correlation in Vision-Language Models

Jie ZHANG, Xiaosong Ma, Song Guo et al.

ICML 2024

A Multimodal Automated Interpretability Agent

Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang et al.

ICML 2024arXiv:2404.14394
45
citations

Anchor-based Robust Finetuning of Vision-Language Models

Jinwei Han, Zhiwen Lin, Zhongyisun Sun et al.

CVPR 2024arXiv:2404.06244
10
citations

An Empirical Study Into What Matters for Calibrating Vision-Language Models

Weijie Tu, Weijian Deng, Dylan Campbell et al.

ICML 2024arXiv:2402.07417
15
citations

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Liang Chen, Haozhe Zhao, Tianyu Liu et al.

ECCV 2024arXiv:2403.06764
368
citations

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.

AAAI 2024paperarXiv:2308.15366
252
citations

ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling

William Zhu, Keren Ye, Junjie Ke et al.

ECCV 2024arXiv:2408.04102
2
citations

ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

Kailas Vodrahalli, James Zou

ICML 2024arXiv:2306.08141
9
citations

A Touch, Vision, and Language Dataset for Multimodal Alignment

Letian Fu, Gaurav Datta, Huang Huang et al.

ICML 2024arXiv:2402.13232
74
citations

Attention Prompting on Image for Large Vision-Language Models

Runpeng Yu, Weihao Yu, Xinchao Wang

ECCV 2024arXiv:2409.17143
28
citations

BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models

Ye-Bin Moon, Nam Hyeon-Woo, Wonseok Choi et al.

ECCV 2024arXiv:2407.13442
10
citations

Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models

Zhihe Lu, Jiawang Bai, Xin Li et al.

ICML 2024arXiv:2311.17091
17
citations

BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

Ian Huang, Guandao Yang, Leonidas Guibas

ECCV 2024arXiv:2404.17672
11
citations

Bootstrapping Variational Information Pursuit with Large Language and Vision Models for Interpretable Image Classification

Aditya Chattopadhyay, Kwan Ho Ryan Chan, Rene Vidal

ICLR 2024

Bridging Environments and Language with Rendering Functions and Vision-Language Models

Théo Cachet, Christopher Dance, Olivier Sigaud

ICML 2024

Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data

Jiahan Zhang, Qi Wei, Feng Liu et al.

ICML 2024arXiv:2406.10502
22
citations

Can I Trust Your Answer? Visually Grounded Video Question Answering

Junbin Xiao, Angela Yao, Yicong Li et al.

CVPR 2024highlightarXiv:2309.01327
113
citations

Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

Yunheng Li, Zhong-Yu Li, Quan-Sheng Zeng et al.

ICML 2024arXiv:2406.00670
20
citations

CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing

Ajian Liu, Shuai Xue, Gan Jianwen et al.

CVPR 2024highlightarXiv:2403.14333
52
citations

CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts

Yichao Cai, Yuhang Liu, Zhen Zhang et al.

ECCV 2024arXiv:2311.16445
11
citations

CLIM: Contrastive Language-Image Mosaic for Region Representation

Size Wu, Wenwei Zhang, Lumin XU et al.

AAAI 2024paperarXiv:2312.11376
25
citations

Code as Reward: Empowering Reinforcement Learning with VLMs

David Venuto, Mohammad Sami Nur Islam, Martin Klissarov et al.

ICML 2024spotlightarXiv:2402.04764
27
citations

Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation

Siyu Jiao, hongguang Zhu, Yunchao Wei et al.

ECCV 2024arXiv:2408.00744
36
citations

COMMA: Co-articulated Multi-Modal Learning

Authors: Lianyu Hu, Liqing Gao, Zekang Liu et al.

AAAI 2024paperarXiv:2401.00268
7
citations

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

Hao Tan, Jun Li, Yizhuang Zhou et al.

AAAI 2024paperarXiv:2312.06401
14
citations

Conceptual Codebook Learning for Vision-Language Models

Yi Zhang, Ke Yu, Siqi Wu et al.

ECCV 2024arXiv:2407.02350
7
citations

Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models

Zhengbo Wang, Jian Liang, Ran He et al.

ICML 2024arXiv:2402.04050

Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering

Zaid Khan, Yun Fu

CVPR 2024arXiv:2404.10193
26
citations

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities

Lorenzo Baraldi, Federico Cocchi, Marcella Cornia et al.

ECCV 2024arXiv:2407.20337
32
citations

DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection

Zhi Zhou, Ming Yang, Jiang-Xin Shi et al.

ICML 2024arXiv:2406.00345
12
citations