Poster "vision-language models" Papers

475 papers found • Page 4 of 10

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models

Fan-Yun Sun, Weiyu Liu, Siyi Gu et al.

CVPR 2025arXiv:2412.02193
50
citations

Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion

Dexuan Ding, Lei Wang, Liyun Zhu et al.

ICLR 2025arXiv:2410.01506
17
citations

Learning a Cross-Modal Schrödinger Bridge for Visual Domain Generalization

Hao Zheng, Jingjun Yi, Qi Bi et al.

NEURIPS 2025

Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

Chenyu Zhou, Mengdan Zhang, Peixian Chen et al.

ICLR 2025arXiv:2406.10228
5
citations

Learning Visual Proxy for Compositional Zero-Shot Learning

Shiyu Zhang, Cheng Yan, Yang Liu et al.

ICCV 2025arXiv:2501.13859

LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement Learning

Zhuorui Ye, Stephanie Milani, Geoff Gordon et al.

ICLR 2025arXiv:2407.15786
5
citations

Lightweight Neural App Control

Filippos Christianos, Georgios Papoudakis, Thomas Coste et al.

ICLR 2025arXiv:2410.17883
11
citations

LISAt: Language-Instructed Segmentation Assistant for Satellite Imagery

Jerome Quenum, Wen-Han Hsieh, Tsung-Han (Patrick) Wu et al.

NEURIPS 2025arXiv:2505.02829
4
citations

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Guowei Xu, Peng Jin, ZiangWu ZiangWu et al.

ICCV 2025arXiv:2411.10440
360
citations

LMFusion: Adapting Pretrained Language Models for Multimodal Generation

Weijia Shi, Xiaochuang Han, Chunting Zhou et al.

NEURIPS 2025arXiv:2412.15188
86
citations

Locality Alignment Improves Vision-Language Models

Ian Covert, Tony Sun, James Y Zou et al.

ICLR 2025arXiv:2410.11087
11
citations

Locality-Aware Zero-Shot Human-Object Interaction Detection

Sanghyun Kim, Deunsol Jung, Minsu Cho

CVPR 2025arXiv:2505.19503
5
citations

LOMIA: Label-Only Membership Inference Attacks against Pre-trained Large Vision-Language Models

Yihao LIU, Xinqi Lyu, Dong Wang et al.

NEURIPS 2025

LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization

Zhenpeng Huang, Jiaqi Li, zihan jia et al.

NEURIPS 2025arXiv:2602.02341

Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation

Yongkang Li, Tianheng Cheng, Bin Feng et al.

CVPR 2025arXiv:2412.04533
11
citations

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

Shiyao Li, Yingchun Hu, Xuefei Ning et al.

CVPR 2025arXiv:2412.19509
15
citations

MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models

Mohammad Shahab Sepehri, Zalan Fabian, Maryam Soltanolkotabi et al.

ICLR 2025arXiv:2409.15477
20
citations

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Jiacheng Chen, Tianhao Liang, Sherman Siu et al.

ICLR 2025arXiv:2410.10563
30
citations

MemEIC: A Step Toward Continual and Compositional Knowledge Editing

Jin Seong, Jiyun Park, Wencke Liermann et al.

NEURIPS 2025arXiv:2510.25798

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.

ICLR 2025arXiv:2410.17637
22
citations

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Xi Chen, Mingkang Zhu, Shaoteng Liu et al.

NEURIPS 2025arXiv:2506.22434

Mint: A Simple Test-Time Adaptation of Vision-Language Models against Common Corruptions

Wenxuan Bao, Ruxi Deng, Jingrui He

NEURIPS 2025arXiv:2510.22127
1
citations

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

Xinyan Chen, Renrui Zhang, Dongzhi JIANG et al.

NEURIPS 2025arXiv:2506.05331
24
citations

MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents

Lukas Aichberger, Alasdair Paren, Guohao Li et al.

NEURIPS 2025arXiv:2503.10809
10
citations

MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers

Yang Tian, Zheng Lu, Mingqi Gao et al.

ICCV 2025arXiv:2503.16856
2
citations

MMCSBench: A Fine-Grained Benchmark for Large Vision-Language Models in Camouflage Scenes

Jin Zhang, Ruiheng Zhang, Zhe Cao et al.

NEURIPS 2025

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Fanqing Meng, Jin Wang, Chuanhao Li et al.

ICLR 2025arXiv:2408.02718
48
citations

MM-OPERA: Benchmarking Open-ended Association Reasoning for Large Vision-Language Models

Zimeng Huang, Jinxin Ke, Xiaoxuan Fan et al.

NEURIPS 2025arXiv:2510.26937

mmWalk: Towards Multi-modal Multi-view Walking Assistance

Kedi Ying, Ruiping Liu, Chongyan Chen et al.

NEURIPS 2025arXiv:2510.11520

Modality-Specialized Synergizers for Interleaved Vision-Language Generalists

Zhiyang Xu, Minqian Liu, Ying Shen et al.

ICLR 2025arXiv:2407.03604
8
citations

Modeling dynamic social vision highlights gaps between deep learning and humans

Kathy Garcia, Emalie McMahon, Colin Conwell et al.

ICLR 2025

MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models

Wenbo Hu, Jia-Chen Gu, Zi-Yi Dou et al.

ICLR 2025arXiv:2410.08182
30
citations

Multi-Label Test-Time Adaptation with Bound Entropy Minimization

Xiangyu Wu, Feng Yu, Yang Yang et al.

ICLR 2025arXiv:2502.03777
5
citations

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

Zhi Gao, Bofei Zhang, Pengxiang Li et al.

ICLR 2025arXiv:2412.15606
38
citations

Multimodal Causal Reasoning for UAV Object Detection

Nianxin Li, Mao Ye, Lihua Zhou et al.

NEURIPS 2025

Multi-modal Knowledge Distillation-based Human Trajectory Forecasting

Jaewoo Jeong, Seohee Lee, Daehee Park et al.

CVPR 2025arXiv:2503.22201
8
citations

Multimodal Prompt Alignment for Facial Expression Recognition

Fuyan Ma, Yiran He, Bin Sun et al.

ICCV 2025arXiv:2506.21017
2
citations

Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap

Christopher Liao, Christian So, Theodoros Tsiligkaridis et al.

ICLR 2025arXiv:2402.04416
1
citations

Multi-View Slot Attention Using Paraphrased Texts for Face Anti-Spoofing

Jeongmin Yu, Susang Kim, Kisu Lee et al.

ICCV 2025arXiv:2509.06336

MUNBa: Machine Unlearning via Nash Bargaining

Jing Wu, Mehrtash Harandi

ICCV 2025arXiv:2411.15537
8
citations

MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding

Rongchang Xie, Chen Du, Ping Song et al.

ICCV 2025arXiv:2411.17762
27
citations

NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection

Amirhossein Ansari, Ke Wang, Pulei Xiong

ICCV 2025arXiv:2507.09795

Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis

Boming Miao, Chunxiao Li, Xiaoxiao Wang et al.

CVPR 2025arXiv:2411.16503
3
citations

Noise Matters: Optimizing Matching Noise for Diffusion Classifiers

Yanghao Wang, Long Chen

NEURIPS 2025arXiv:2508.11330
2
citations

Noisy Test-Time Adaptation in Vision-Language Models

Chentao Cao, Zhun Zhong, (Andrew) Zhanke Zhou et al.

ICLR 2025arXiv:2502.14604
4
citations

Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection

Le Yang, Ziwei Zheng, Boxu Chen et al.

CVPR 2025arXiv:2412.13817
26
citations

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Linke Ouyang, Yuan Qu, Hongbin Zhou et al.

CVPR 2025arXiv:2412.07626
47
citations

OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning

Shihao Wang, Zhiding Yu, Xiaohui Jiang et al.

CVPR 2025arXiv:2504.04348
86
citations

One Head to Rule Them All: Amplifying LVLM Safety through a Single Critical Attention Head

Junhao Xia, Haotian Zhu, Shuchao Pang et al.

NEURIPS 2025

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Yana Wei, Liang Zhao, Jianjian Sun et al.

NEURIPS 2025arXiv:2507.05255
14
citations