"vision-language models" Papers

570 papers found • Page 3 of 12

Filters:vision-language models Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data

Yucheng Shi, Quanzheng Li, Jin Sun et al.

ICLR 2025arXiv:2502.14044

citations

Enhancing Compositional Reasoning in CLIP via Reconstruction and Alignment of Text Descriptions

Jihoon Kwon, Kyle Min, Jy-yong Sohn

NEURIPS 2025arXiv:2510.16540

Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

Yudi Shi, Shangzhe Di, Qirui Chen et al.

CVPR 2025arXiv:2412.01694

citations

Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding

Yixiong Fang, Ziran Yang, Zhaorun Chen et al.

NEURIPS 2025arXiv:2412.06474

citations

Enhancing Vision-Language Model with Unmasked Token Alignment

Hongsheng Li, Jihao Liu, Boxiao Liu et al.

ICLR 2025arXiv:2405.19009

Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?

Yiwei Yang, Chung Peng Lee, Shangbin Feng et al.

NEURIPS 2025arXiv:2506.18322

citations

Evading Data Provenance in Deep Neural Networks

Hongyu Zhu, Sichu Liang, Wenwen Wang et al.

ICCV 2025highlightarXiv:2508.01074

citations

Evaluating Model Perception of Color Illusions in Photorealistic Scenes

Lingjun Mao, Zineng Tang, Alane Suhr

CVPR 2025arXiv:2412.06184

citations

Evaluating Vision-Language Models as Evaluators in Path Planning

Mohamed Aghzal, Xiang Yue, Erion Plaku et al.

CVPR 2025arXiv:2411.18711

citations

EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution

Zhebei Shen, Qifan Yu, Juncheng Li et al.

NEURIPS 2025

ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning

Xiao Yu, Baolin Peng, Vineeth Vajipey et al.

ICLR 2025arXiv:2410.02052

citations

Explaining Domain Shifts in Language: Concept Erasing for Interpretable Image Classification

Zequn Zeng, Yudi Su, Jianqiao Sun et al.

CVPR 2025arXiv:2503.18483

citations

Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation

Seogkyu Jeon, Kibeom Hong, Hyeran Byun

ICCV 2025arXiv:2512.03508

citations

Exploiting the Asymmetric Uncertainty Structure of Pre-trained VLMs on the Unit Hypersphere

Li Ju, Max Andersson, Stina Fredriksson et al.

NEURIPS 2025arXiv:2505.11029

citations

Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models

Shuyang Hao, Bryan Hooi, Jun Liu et al.

CVPR 2025arXiv:2411.18000

citations

Extract Free Dense Misalignment from CLIP

JeongYeon Nam, Jinbae Im, Wonjae Kim et al.

AAAI 2025paperarXiv:2412.18404

citations

FA: Forced Prompt Learning of Vision-Language Models for Out-of-Distribution Detection

Xinhua Lu, Runhe Lai, Yanqi Wu et al.

ICCV 2025arXiv:2507.04511

citations

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models

Rylan Schaeffer, Dan Valentine, Luke Bailey et al.

ICLR 2025arXiv:2407.15211

citations

FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts

Weihao Bo, Yanpeng Sun, Yu Wang et al.

NEURIPS 2025arXiv:2511.00480

FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models

Mainak Singha, Subhankar Roy, Sarthak Mehrotra et al.

ICCV 2025arXiv:2504.20860

citations

Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models

Xudong Li, Zihao Huang, Yan Zhang et al.

ICCV 2025arXiv:2409.05381

citations

FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts

Yichen Gong, Delong Ran, Jinyuan Liu et al.

AAAI 2025paperarXiv:2311.05608

302

citations

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

Hang Hua, Qing Liu, Lingzhi Zhang et al.

CVPR 2025arXiv:2411.15411

citations

Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection

Jiawen Zhu, YEW-SOON ONG, Chunhua Shen et al.

ICCV 2025arXiv:2410.10289

citations

Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving

Yue Li, Meng Tian, Zhenyu Lin et al.

ICCV 2025arXiv:2503.21505

citations

Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs

Yifan Shen, Yuanzhe Liu, Jingyuan Zhu et al.

NEURIPS 2025arXiv:2506.21656

citations

FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs

Mothilal Asokan, Kebin wu, Fatima Albreiki

CVPR 2025arXiv:2504.01916

citations

FlySearch: Exploring how vision-language models explore

Adam Pardyl, Dominik Matuszek, Mateusz Przebieracz et al.

NEURIPS 2025arXiv:2506.02896

citations

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

Valérie Costa, Thomas Fel, Ekdeep S Lubana et al.

NEURIPS 2025arXiv:2506.03093

citations

From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration

Mingyang Song, Xiaoye Qu, Jiawei Zhou et al.

CVPR 2025arXiv:2503.12821

citations

From Panels to Prose: Generating Literary Narratives from Comics

Ragav Sachdeva, Andrew Zisserman

ICCV 2025arXiv:2503.23344

citations

Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding

Tianyu Chen, Xingcheng Fu, Yisen Gao et al.

CVPR 2025highlightarXiv:2503.18578

citations

GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs

Xinli Xu, Wenhang Ge, Dicong Qiu et al.

ICCV 2025arXiv:2412.11258

citations

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

Zhaochong An, Guolei Sun, Yun Liu et al.

CVPR 2025arXiv:2503.16282

citations

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

Tsung-Han (Patrick) Wu, Heekyung Lee, Jiaxin Ge et al.

NEURIPS 2025arXiv:2504.13169

citations

Generate, Transduct, Adapt: Iterative Transduction with VLMs

Oindrila Saha, Logan Lawrence, Grant Horn et al.

ICCV 2025arXiv:2501.06031

citations

Generating CAD Code with Vision-Language Models for 3D Designs

Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Haider Zaidi et al.

ICLR 2025arXiv:2410.05340

citations

Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency

Xiangyu Guo, Zhanqian Wu, Kaixin Xiong et al.

NEURIPS 2025oralarXiv:2506.07497

citations

GenIR: Generative Visual Feedback for Mental Image Retrieval

Diji Yang, Minghao Liu, Chung-Hsiang Lo et al.

NEURIPS 2025arXiv:2506.06220

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Muhammad Danish, Muhammad Akhtar Munir, Syed Shah et al.

ICCV 2025highlightarXiv:2411.19325

citations

GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization

Pengyue Jia, Seongheon Park, Song Gao et al.

NEURIPS 2025arXiv:2505.13731

citations

GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks

Haoqiang Kang, Enna Sachdeva, Piyush Gupta et al.

CVPR 2025arXiv:2503.06514

citations

Glance2Gaze: Efficient Vision-Language Models from Glance Fusion to Gaze Compression

Juan Chen, Honglin liu, Yingying Ao et al.

NEURIPS 2025

Global-Local Tree Search in VLMs for 3D Indoor Scene Generation

Wei Deng, Mengshi Qi, Huadong Ma

CVPR 2025arXiv:2503.18476

citations

GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity

Seongheon Park, Sharon Li

NEURIPS 2025arXiv:2508.19972

citations

GoalLadder: Incremental Goal Discovery with Vision-Language Models

Alexey Zakharov, Shimon Whiteson

NEURIPS 2025arXiv:2506.16396

citations

Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions

He Zhu, Quyu Kong, Kechun Xu et al.

CVPR 2025arXiv:2504.04744

citations

Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs

Hao Fang, Changle Zhou, Jiawei Kong et al.

NEURIPS 2025arXiv:2505.19678

citations

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

Rui Hu, Yuxuan Zhang, Lianghui Zhu et al.

ICCV 2025arXiv:2503.10596

citations

Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels

Yongshuo Zong, Qin ZHANG, DONGSHENG An et al.

CVPR 2025arXiv:2505.13788

citations

← Previous

1 2 3 4 5...12