NEURIPS 2025 "vision-language models" Papers

131 papers found • Page 1 of 3

3D-RAD: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks

Xiaotang Gai, Jiaxiang Liu, Yichen Li et al.

NEURIPS 2025oralarXiv:2506.11147
4
citations

AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining

Hongyuan Dong, Dingkang Yang, Xiao Liang et al.

NEURIPS 2025posterarXiv:2506.13274
3
citations

Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning

Amit Peleg, Naman Deep Singh, Matthias Hein

NEURIPS 2025posterarXiv:2505.24424
2
citations

AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark

Aruna Gauba, Irene Pi, Yunze Man et al.

NEURIPS 2025posterarXiv:2504.10568
2
citations

Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment

Hua Ye, Hang Ding, Siyuan Chen et al.

NEURIPS 2025posterarXiv:2511.08399

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding

Ahmed Masry, Juan Rodriguez, Tianyu Zhang et al.

NEURIPS 2025posterarXiv:2502.01341

AmorLIP: Efficient Language-Image Pretraining via Amortization

Haotian Sun, Yitong Li, Yuchen Zhuang et al.

NEURIPS 2025posterarXiv:2505.18983
2
citations

A Multimodal Benchmark for Framing of Oil & Gas Advertising and Potential Greenwashing Detection

Gaku Morio, Harri Rowlands, Dominik Stammbach et al.

NEURIPS 2025posterarXiv:2510.21679

An Information-theoretical Framework for Understanding Out-of-distribution Detection with Pretrained Vision-Language Models

Bo Peng, Jie Lu, Guangquan Zhang et al.

NEURIPS 2025poster

Approximate Domain Unlearning for Vision-Language Models

Kodai Kawamura, Yuta Goto, Rintaro Yanagi et al.

NEURIPS 2025spotlightarXiv:2510.08132

Attention! Your Vision Language Model Could Be Maliciously Manipulated

Xiaosen Wang, Shaokang Wang, Zhijin Ge et al.

NEURIPS 2025posterarXiv:2505.19911
3
citations

Automated Model Discovery via Multi-modal & Multi-step Pipeline

Lee Jung-Mok, Nam Hyeon-Woo, Moon Ye-Bin et al.

NEURIPS 2025posterarXiv:2509.25946

BeliefMapNav: 3D Voxel-Based Belief Map for Zero-Shot Object Navigation

Zibo Zhou, Yue Hu, Lingkai Zhang et al.

NEURIPS 2025posterarXiv:2506.06487
2
citations

Bridging the gap to real-world language-grounded visual concept learning

whie jung, Semin Kim, Junee Kim et al.

NEURIPS 2025posterarXiv:2510.21412

CF-VLM:CounterFactual Vision-Language Fine-tuning

jusheng zhang, Kaitong Cai, Yijia Fan et al.

NEURIPS 2025poster

Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

Bryan Sangwoo Kim, Jeongsol Kim, Jong Chul Ye

NEURIPS 2025spotlightarXiv:2505.18600

Conditional Representation Learning for Customized Tasks

Honglin Liu, Chao Sun, Peng Hu et al.

NEURIPS 2025spotlightarXiv:2510.04564

Cross-modal Associations in Vision and Language Models: Revisiting the Bouba-Kiki Effect

Tom Kouwenhoven, Kiana Shahrasbi, Tessa Verhoef

NEURIPS 2025posterarXiv:2507.10013

CrypticBio: A Large Multimodal Dataset for Visually Confusing Species

Georgiana Manolache, Gerard Schouten, Joaquin Vanschoren

NEURIPS 2025oral

CURV: Coherent Uncertainty-Aware Reasoning in Vision-Language Models for X-Ray Report Generation

Ziao Wang, Sixing Yan, Kejing Yin et al.

NEURIPS 2025poster

CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays

Hyungyung Lee, Geon Choi, Jung-Oh Lee et al.

NEURIPS 2025spotlightarXiv:2505.18087
3
citations

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Tao Zhang, Cheng Da, Kun Ding et al.

NEURIPS 2025posterarXiv:2502.01051
13
citations

Disentanglement Beyond Static vs. Dynamic: A Benchmark and Evaluation Framework for Multi-Factor Sequential Representations

Tal Barami, Nimrod Berman, Ilan Naiman et al.

NEURIPS 2025posterarXiv:2510.17313
2
citations

Do LVLMs Truly Understand Video Anomalies? Revealing Hallucination via Co-Occurrence Patterns

Menghao Zhang, Huazheng Wang, Pengfei Ren et al.

NEURIPS 2025poster

DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?

Tianhong Zhou, xu yin, Yingtao Zhu et al.

NEURIPS 2025posterarXiv:2505.24173
5
citations

DualCnst: Enhancing Zero-Shot Out-of-Distribution Detection via Text-Image Consistency in Vision-Language Models

Fayi Le, Wenwu He, Chentao Cao et al.

NEURIPS 2025poster

Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning

Ankan Deria, Adinath Dukre, feilong tang et al.

NEURIPS 2025oralarXiv:2506.15649

DyMU: Dynamic Merging and Virtual Unmerging for Efficient Variable-Length VLMs

Zhenhailong Wang, Senthil Purushwalkam, Caiming Xiong et al.

NEURIPS 2025poster
6
citations

EA3D: Online Open-World 3D Object Extraction from Streaming Videos

Xiaoyu Zhou, Jingqi Wang, Yuang Jia et al.

NEURIPS 2025posterarXiv:2510.25146
1
citations

Each Complexity Deserves a Pruning Policy

Hanshi Wang, Yuhao Xu, Zekun Xu et al.

NEURIPS 2025posterarXiv:2509.23931

EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition

Christoph Schuhmann, Robert Kaczmarczyk, Gollam Rabby et al.

NEURIPS 2025posterarXiv:2505.20033
3
citations

Enhancing Compositional Reasoning in CLIP via Reconstruction and Alignment of Text Descriptions

Jihoon Kwon, Kyle Min, Jy-yong Sohn

NEURIPS 2025posterarXiv:2510.16540

Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding

Yixiong Fang, Ziran Yang, Zhaorun Chen et al.

NEURIPS 2025posterarXiv:2412.06474
13
citations

Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?

Yiwei Yang, Chung Peng Lee, Shangbin Feng et al.

NEURIPS 2025posterarXiv:2506.18322
3
citations

EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution

Zhebei Shen, Qifan Yu, Juncheng Li et al.

NEURIPS 2025poster

Exploiting the Asymmetric Uncertainty Structure of Pre-trained VLMs on the Unit Hypersphere

Li Ju, Max Andersson, Stina Fredriksson et al.

NEURIPS 2025posterarXiv:2505.11029
2
citations

FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts

Weihao Bo, Yanpeng Sun, Yu Wang et al.

NEURIPS 2025posterarXiv:2511.00480

Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs

Yifan Shen, Yuanzhe Liu, Jingyuan Zhu et al.

NEURIPS 2025posterarXiv:2506.21656
3
citations

FlySearch: Exploring how vision-language models explore

Adam Pardyl, Dominik Matuszek, Mateusz Przebieracz et al.

NEURIPS 2025posterarXiv:2506.02896
3
citations

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

Valérie Costa, Thomas Fel, Ekdeep S Lubana et al.

NEURIPS 2025posterarXiv:2506.03093
10
citations

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

Tsung-Han (Patrick) Wu, Heekyung Lee, Jiaxin Ge et al.

NEURIPS 2025posterarXiv:2504.13169
10
citations

Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency

Xiangyu Guo, Zhanqian Wu, Kaixin Xiong et al.

NEURIPS 2025oralarXiv:2506.07497
8
citations

GenIR: Generative Visual Feedback for Mental Image Retrieval

Diji Yang, Minghao Liu, Chung-Hsiang Lo et al.

NEURIPS 2025posterarXiv:2506.06220

GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization

Pengyue Jia, Seongheon Park, Song Gao et al.

NEURIPS 2025posterarXiv:2505.13731
3
citations

Glance2Gaze: Efficient Vision-Language Models from Glance Fusion to Gaze Compression

Juan Chen, Honglin liu, Yingying Ao et al.

NEURIPS 2025poster

GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity

Seongheon Park, Sharon Li

NEURIPS 2025posterarXiv:2508.19972

GoalLadder: Incremental Goal Discovery with Vision-Language Models

Alexey Zakharov, Shimon Whiteson

NEURIPS 2025posterarXiv:2506.16396
1
citations

Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs

Hao Fang, Changle Zhou, Jiawei Kong et al.

NEURIPS 2025posterarXiv:2505.19678
6
citations

GTR-Loc: Geospatial Text Regularization Assisted Outdoor LiDAR Localization

Shangshu Yu, Wen Li, Xiaotian Sun et al.

NEURIPS 2025poster

HQA-VLAttack: Towards High Quality Adversarial Attack on Vision-Language Pre-Trained Models

Han Liu, Jiaqi Li, Zhi Xu et al.

NEURIPS 2025poster
PreviousNext