2025 "large vision-language models" Papers

14 papers found

Filters:2025 large vision-language models Clear all

Conference

AAAI 2025 (3,028)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,140)oral (1,594)spotlight (1,421)highlight (975)

Compress & Cache: Vision token compression for efficient generation and retrieval

Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos

NEURIPS 2025poster

Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models

Shicheng Xu, Liang Pang, Yunchang Zhu et al.

ICLR 2025posterarXiv:2410.12662

citations

DSBench: How Far Are Data Science Agents from Becoming Data Science Experts?

Liqiang Jing, Zhehui Huang, Xiaoyang Wang et al.

ICLR 2025posterarXiv:2409.07703

citations

Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

Jie Zhang, Zhongqi Wang, Mengqi Lei et al.

ICLR 2025posterarXiv:2406.18849

citations

How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?

Seongyun Lee, Geewook Kim, Jiyeon Kim et al.

ICLR 2025posterarXiv:2410.07571

citations

ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models

Junzhe Chen, Tianshu Zhang, Shiyu Huang et al.

CVPR 2025posterarXiv:2411.15268

citations

Latent Chain-of-Thought for Visual Reasoning

Guohao Sun, Hang Hua, Jian Wang et al.

NEURIPS 2025posterarXiv:2510.23925

citations

LVLM-Driven Attribute-Aware Modeling for Visible-Infrared Person Re-Identification

Zhiqi Pang, Lingling Zhao, Junjie Wang et al.

NEURIPS 2025poster

MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

Xuannan Liu, Zekun Li, Pei Li et al.

ICLR 2025posterarXiv:2406.08772

citations

Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning

Cheng Chen, Yunpeng Zhai, Yifan Zhao et al.

CVPR 2025posterarXiv:2506.09473

citations

SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions

Xianzhe Fan, Xuhui Zhou, Chuanyang Jin et al.

NEURIPS 2025posterarXiv:2506.23046

citations

Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation

Gianni Franchi, Nacim Belkhir, Dat NGUYEN et al.

CVPR 2025posterarXiv:2412.03178

citations

Visual Structures Help Visual Reasoning: Addressing the Binding Problem in LVLMs

Amirmohammad Izadi, Mohammadali Banayeeanzade, Fatemeh Askari et al.

NEURIPS 2025poster

citations

VladVA: Discriminative Fine-tuning of LVLMs

Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.

CVPR 2025posterarXiv:2412.04378

citations