NEURIPS Poster "vision-language models" Papers

102 papers found • Page 1 of 3

AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining

Hongyuan Dong, Dingkang Yang, Xiao Liang et al.

NEURIPS 2025posterarXiv:2506.13274
3
citations

Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning

Amit Peleg, Naman Deep Singh, Matthias Hein

NEURIPS 2025posterarXiv:2505.24424
2
citations

AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark

Aruna Gauba, Irene Pi, Yunze Man et al.

NEURIPS 2025posterarXiv:2504.10568
2
citations

Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment

Hua Ye, Hang Ding, Siyuan Chen et al.

NEURIPS 2025posterarXiv:2511.08399

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding

Ahmed Masry, Juan Rodriguez, Tianyu Zhang et al.

NEURIPS 2025posterarXiv:2502.01341

AmorLIP: Efficient Language-Image Pretraining via Amortization

Haotian Sun, Yitong Li, Yuchen Zhuang et al.

NEURIPS 2025posterarXiv:2505.18983
2
citations

A Multimodal Benchmark for Framing of Oil & Gas Advertising and Potential Greenwashing Detection

Gaku Morio, Harri Rowlands, Dominik Stammbach et al.

NEURIPS 2025posterarXiv:2510.21679

An Information-theoretical Framework for Understanding Out-of-distribution Detection with Pretrained Vision-Language Models

Bo Peng, Jie Lu, Guangquan Zhang et al.

NEURIPS 2025poster

Attention! Your Vision Language Model Could Be Maliciously Manipulated

Xiaosen Wang, Shaokang Wang, Zhijin Ge et al.

NEURIPS 2025posterarXiv:2505.19911
3
citations

Automated Model Discovery via Multi-modal & Multi-step Pipeline

Lee Jung-Mok, Nam Hyeon-Woo, Moon Ye-Bin et al.

NEURIPS 2025posterarXiv:2509.25946

BeliefMapNav: 3D Voxel-Based Belief Map for Zero-Shot Object Navigation

Zibo Zhou, Yue Hu, Lingkai Zhang et al.

NEURIPS 2025posterarXiv:2506.06487
2
citations

Bridging the gap to real-world language-grounded visual concept learning

whie jung, Semin Kim, Junee Kim et al.

NEURIPS 2025posterarXiv:2510.21412

CF-VLM:CounterFactual Vision-Language Fine-tuning

jusheng zhang, Kaitong Cai, Yijia Fan et al.

NEURIPS 2025poster

Cross-modal Associations in Vision and Language Models: Revisiting the Bouba-Kiki Effect

Tom Kouwenhoven, Kiana Shahrasbi, Tessa Verhoef

NEURIPS 2025posterarXiv:2507.10013

CURV: Coherent Uncertainty-Aware Reasoning in Vision-Language Models for X-Ray Report Generation

Ziao Wang, Sixing Yan, Kejing Yin et al.

NEURIPS 2025poster

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Tao Zhang, Cheng Da, Kun Ding et al.

NEURIPS 2025posterarXiv:2502.01051
13
citations

Disentanglement Beyond Static vs. Dynamic: A Benchmark and Evaluation Framework for Multi-Factor Sequential Representations

Tal Barami, Nimrod Berman, Ilan Naiman et al.

NEURIPS 2025posterarXiv:2510.17313
2
citations

Do LVLMs Truly Understand Video Anomalies? Revealing Hallucination via Co-Occurrence Patterns

Menghao Zhang, Huazheng Wang, Pengfei Ren et al.

NEURIPS 2025poster

DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?

Tianhong Zhou, xu yin, Yingtao Zhu et al.

NEURIPS 2025posterarXiv:2505.24173
5
citations

DualCnst: Enhancing Zero-Shot Out-of-Distribution Detection via Text-Image Consistency in Vision-Language Models

Fayi Le, Wenwu He, Chentao Cao et al.

NEURIPS 2025poster

DyMU: Dynamic Merging and Virtual Unmerging for Efficient Variable-Length VLMs

Zhenhailong Wang, Senthil Purushwalkam, Caiming Xiong et al.

NEURIPS 2025poster
6
citations

EA3D: Online Open-World 3D Object Extraction from Streaming Videos

Xiaoyu Zhou, Jingqi Wang, Yuang Jia et al.

NEURIPS 2025posterarXiv:2510.25146
1
citations

Each Complexity Deserves a Pruning Policy

Hanshi Wang, Yuhao Xu, Zekun Xu et al.

NEURIPS 2025posterarXiv:2509.23931

EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition

Christoph Schuhmann, Robert Kaczmarczyk, Gollam Rabby et al.

NEURIPS 2025posterarXiv:2505.20033
3
citations

Enhancing Compositional Reasoning in CLIP via Reconstruction and Alignment of Text Descriptions

Jihoon Kwon, Kyle Min, Jy-yong Sohn

NEURIPS 2025posterarXiv:2510.16540

Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding

Yixiong Fang, Ziran Yang, Zhaorun Chen et al.

NEURIPS 2025posterarXiv:2412.06474
13
citations

Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?

Yiwei Yang, Chung Peng Lee, Shangbin Feng et al.

NEURIPS 2025posterarXiv:2506.18322
3
citations

EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution

Zhebei Shen, Qifan Yu, Juncheng Li et al.

NEURIPS 2025poster

Exploiting the Asymmetric Uncertainty Structure of Pre-trained VLMs on the Unit Hypersphere

Li Ju, Max Andersson, Stina Fredriksson et al.

NEURIPS 2025posterarXiv:2505.11029
2
citations

FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts

Weihao Bo, Yanpeng Sun, Yu Wang et al.

NEURIPS 2025posterarXiv:2511.00480

Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs

Yifan Shen, Yuanzhe Liu, Jingyuan Zhu et al.

NEURIPS 2025posterarXiv:2506.21656
3
citations

FlySearch: Exploring how vision-language models explore

Adam Pardyl, Dominik Matuszek, Mateusz Przebieracz et al.

NEURIPS 2025posterarXiv:2506.02896
3
citations

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

Valérie Costa, Thomas Fel, Ekdeep S Lubana et al.

NEURIPS 2025posterarXiv:2506.03093
10
citations

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

Tsung-Han (Patrick) Wu, Heekyung Lee, Jiaxin Ge et al.

NEURIPS 2025posterarXiv:2504.13169
10
citations

GenIR: Generative Visual Feedback for Mental Image Retrieval

Diji Yang, Minghao Liu, Chung-Hsiang Lo et al.

NEURIPS 2025posterarXiv:2506.06220

GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization

Pengyue Jia, Seongheon Park, Song Gao et al.

NEURIPS 2025posterarXiv:2505.13731
3
citations

Glance2Gaze: Efficient Vision-Language Models from Glance Fusion to Gaze Compression

Juan Chen, Honglin liu, Yingying Ao et al.

NEURIPS 2025poster

GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity

Seongheon Park, Sharon Li

NEURIPS 2025posterarXiv:2508.19972

GoalLadder: Incremental Goal Discovery with Vision-Language Models

Alexey Zakharov, Shimon Whiteson

NEURIPS 2025posterarXiv:2506.16396
1
citations

Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs

Hao Fang, Changle Zhou, Jiawei Kong et al.

NEURIPS 2025posterarXiv:2505.19678
6
citations

GTR-Loc: Geospatial Text Regularization Assisted Outdoor LiDAR Localization

Shangshu Yu, Wen Li, Xiaotian Sun et al.

NEURIPS 2025poster

HQA-VLAttack: Towards High Quality Adversarial Attack on Vision-Language Pre-Trained Models

Han Liu, Jiaqi Li, Zhi Xu et al.

NEURIPS 2025poster

iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning

Manyi Yao, Bingbing Zhuang, Sparsh Garg et al.

NEURIPS 2025posterarXiv:2509.19552
1
citations

Intervene-All-Paths: Unified Mitigation of LVLM Hallucinations across Alignment Formats

Jiaye Qian, Ge Zheng, Yuchen Zhu et al.

NEURIPS 2025posterarXiv:2511.17254
2
citations

Learning a Cross-Modal Schrödinger Bridge for Visual Domain Generalization

Hao Zheng, Jingjun Yi, Qi Bi et al.

NEURIPS 2025poster

LISAt: Language-Instructed Segmentation Assistant for Satellite Imagery

Jerome Quenum, Wen-Han Hsieh, Tsung-Han (Patrick) Wu et al.

NEURIPS 2025posterarXiv:2505.02829
4
citations

LMFusion: Adapting Pretrained Language Models for Multimodal Generation

Weijia Shi, Xiaochuang Han, Chunting Zhou et al.

NEURIPS 2025posterarXiv:2412.15188
79
citations

LOMIA: Label-Only Membership Inference Attacks against Pre-trained Large Vision-Language Models

Yihao LIU, Xinqi Lyu, Dong Wang et al.

NEURIPS 2025poster

LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization

Zhenpeng Huang, Jiaqi Li, zihan jia et al.

NEURIPS 2025posterarXiv:2602.02341

MemEIC: A Step Toward Continual and Compositional Knowledge Editing

Jin Seong, Jiyun Park, Wencke Liermann et al.

NEURIPS 2025posterarXiv:2510.25798
PreviousNext