NEURIPS 2025 "vision-language models" Papers

131 papers found • Page 1 of 3

Filters:NEURIPS 2025 vision-language models Clear all

Conference

AAAI 2025 (3,028)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,140)oral (1,594)spotlight (1,421)highlight (975)

3D-RAD: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks

Xiaotang Gai, Jiaxiang Liu, Yichen Li et al.

NEURIPS 2025oralarXiv:2506.11147

citations

AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining

Hongyuan Dong, Dingkang Yang, Xiao Liang et al.

NEURIPS 2025posterarXiv:2506.13274

citations

Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning

Amit Peleg, Naman Deep Singh, Matthias Hein

NEURIPS 2025posterarXiv:2505.24424

citations

AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark

Aruna Gauba, Irene Pi, Yunze Man et al.

NEURIPS 2025posterarXiv:2504.10568

citations

Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment

Hua Ye, Hang Ding, Siyuan Chen et al.

NEURIPS 2025posterarXiv:2511.08399

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding

Ahmed Masry, Juan Rodriguez, Tianyu Zhang et al.

NEURIPS 2025posterarXiv:2502.01341

AmorLIP: Efficient Language-Image Pretraining via Amortization

Haotian Sun, Yitong Li, Yuchen Zhuang et al.

NEURIPS 2025posterarXiv:2505.18983

citations

A Multimodal Benchmark for Framing of Oil & Gas Advertising and Potential Greenwashing Detection

Gaku Morio, Harri Rowlands, Dominik Stammbach et al.

NEURIPS 2025posterarXiv:2510.21679

An Information-theoretical Framework for Understanding Out-of-distribution Detection with Pretrained Vision-Language Models

Bo Peng, Jie Lu, Guangquan Zhang et al.

NEURIPS 2025poster

Approximate Domain Unlearning for Vision-Language Models

Kodai Kawamura, Yuta Goto, Rintaro Yanagi et al.

NEURIPS 2025spotlightarXiv:2510.08132

Attention! Your Vision Language Model Could Be Maliciously Manipulated

Xiaosen Wang, Shaokang Wang, Zhijin Ge et al.

NEURIPS 2025posterarXiv:2505.19911

citations

Automated Model Discovery via Multi-modal & Multi-step Pipeline

Lee Jung-Mok, Nam Hyeon-Woo, Moon Ye-Bin et al.

NEURIPS 2025posterarXiv:2509.25946

BeliefMapNav: 3D Voxel-Based Belief Map for Zero-Shot Object Navigation

Zibo Zhou, Yue Hu, Lingkai Zhang et al.

NEURIPS 2025posterarXiv:2506.06487

citations

Bridging the gap to real-world language-grounded visual concept learning

whie jung, Semin Kim, Junee Kim et al.

NEURIPS 2025posterarXiv:2510.21412

CF-VLM：CounterFactual Vision-Language Fine-tuning

jusheng zhang, Kaitong Cai, Yijia Fan et al.

NEURIPS 2025poster

Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

Bryan Sangwoo Kim, Jeongsol Kim, Jong Chul Ye

NEURIPS 2025spotlightarXiv:2505.18600

Conditional Representation Learning for Customized Tasks

Honglin Liu, Chao Sun, Peng Hu et al.

NEURIPS 2025spotlightarXiv:2510.04564

Cross-modal Associations in Vision and Language Models: Revisiting the Bouba-Kiki Effect

Tom Kouwenhoven, Kiana Shahrasbi, Tessa Verhoef

NEURIPS 2025posterarXiv:2507.10013

CrypticBio: A Large Multimodal Dataset for Visually Confusing Species

Georgiana Manolache, Gerard Schouten, Joaquin Vanschoren

NEURIPS 2025oral

CURV: Coherent Uncertainty-Aware Reasoning in Vision-Language Models for X-Ray Report Generation

Ziao Wang, Sixing Yan, Kejing Yin et al.

NEURIPS 2025poster

CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays

Hyungyung Lee, Geon Choi, Jung-Oh Lee et al.

NEURIPS 2025spotlightarXiv:2505.18087

citations

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Tao Zhang, Cheng Da, Kun Ding et al.

NEURIPS 2025posterarXiv:2502.01051

citations

Disentanglement Beyond Static vs. Dynamic: A Benchmark and Evaluation Framework for Multi-Factor Sequential Representations

Tal Barami, Nimrod Berman, Ilan Naiman et al.

NEURIPS 2025posterarXiv:2510.17313

citations

Do LVLMs Truly Understand Video Anomalies? Revealing Hallucination via Co-Occurrence Patterns

Menghao Zhang, Huazheng Wang, Pengfei Ren et al.

NEURIPS 2025poster

DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?

Tianhong Zhou, xu yin, Yingtao Zhu et al.

NEURIPS 2025posterarXiv:2505.24173

citations

DualCnst: Enhancing Zero-Shot Out-of-Distribution Detection via Text-Image Consistency in Vision-Language Models

Fayi Le, Wenwu He, Chentao Cao et al.

NEURIPS 2025poster

Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning

Ankan Deria, Adinath Dukre, feilong tang et al.

NEURIPS 2025oralarXiv:2506.15649

DyMU: Dynamic Merging and Virtual Unmerging for Efficient Variable-Length VLMs

Zhenhailong Wang, Senthil Purushwalkam, Caiming Xiong et al.

NEURIPS 2025poster

citations

EA3D: Online Open-World 3D Object Extraction from Streaming Videos

Xiaoyu Zhou, Jingqi Wang, Yuang Jia et al.

NEURIPS 2025posterarXiv:2510.25146

citations

Each Complexity Deserves a Pruning Policy

Hanshi Wang, Yuhao Xu, Zekun Xu et al.

NEURIPS 2025posterarXiv:2509.23931

EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition

Christoph Schuhmann, Robert Kaczmarczyk, Gollam Rabby et al.

NEURIPS 2025posterarXiv:2505.20033

citations

Enhancing Compositional Reasoning in CLIP via Reconstruction and Alignment of Text Descriptions

Jihoon Kwon, Kyle Min, Jy-yong Sohn

NEURIPS 2025posterarXiv:2510.16540

Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding

Yixiong Fang, Ziran Yang, Zhaorun Chen et al.

NEURIPS 2025posterarXiv:2412.06474

citations

Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?

Yiwei Yang, Chung Peng Lee, Shangbin Feng et al.

NEURIPS 2025posterarXiv:2506.18322

citations

EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution

Zhebei Shen, Qifan Yu, Juncheng Li et al.

NEURIPS 2025poster

Exploiting the Asymmetric Uncertainty Structure of Pre-trained VLMs on the Unit Hypersphere

Li Ju, Max Andersson, Stina Fredriksson et al.

NEURIPS 2025posterarXiv:2505.11029

citations

FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts

Weihao Bo, Yanpeng Sun, Yu Wang et al.

NEURIPS 2025posterarXiv:2511.00480

Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs

Yifan Shen, Yuanzhe Liu, Jingyuan Zhu et al.

NEURIPS 2025posterarXiv:2506.21656

citations

FlySearch: Exploring how vision-language models explore

Adam Pardyl, Dominik Matuszek, Mateusz Przebieracz et al.

NEURIPS 2025posterarXiv:2506.02896

citations

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

Valérie Costa, Thomas Fel, Ekdeep S Lubana et al.

NEURIPS 2025posterarXiv:2506.03093

citations

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

Tsung-Han (Patrick) Wu, Heekyung Lee, Jiaxin Ge et al.

NEURIPS 2025posterarXiv:2504.13169

citations

Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency

Xiangyu Guo, Zhanqian Wu, Kaixin Xiong et al.

NEURIPS 2025oralarXiv:2506.07497

citations

GenIR: Generative Visual Feedback for Mental Image Retrieval

Diji Yang, Minghao Liu, Chung-Hsiang Lo et al.

NEURIPS 2025posterarXiv:2506.06220

GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization

Pengyue Jia, Seongheon Park, Song Gao et al.

NEURIPS 2025posterarXiv:2505.13731

citations

Glance2Gaze: Efficient Vision-Language Models from Glance Fusion to Gaze Compression

Juan Chen, Honglin liu, Yingying Ao et al.

NEURIPS 2025poster

GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity

Seongheon Park, Sharon Li

NEURIPS 2025posterarXiv:2508.19972

GoalLadder: Incremental Goal Discovery with Vision-Language Models

Alexey Zakharov, Shimon Whiteson

NEURIPS 2025posterarXiv:2506.16396

citations

Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs

Hao Fang, Changle Zhou, Jiawei Kong et al.

NEURIPS 2025posterarXiv:2505.19678

citations

GTR-Loc: Geospatial Text Regularization Assisted Outdoor LiDAR Localization

Shangshu Yu, Wen Li, Xiaotian Sun et al.

NEURIPS 2025poster

HQA-VLAttack: Towards High Quality Adversarial Attack on Vision-Language Pre-Trained Models

Han Liu, Jiaqi Li, Zhi Xu et al.

NEURIPS 2025poster

← Previous

1 2 3