2025 Poster "vision language models" Papers

44 papers found

Aligning Effective Tokens with Video Anomaly in Large Language Models

YINGXIAN Chen, Jiahui Liu, Ruidi Fan et al.

ICCV 2025posterarXiv:2508.06350
1
citations

Are Large Vision Language Models Good Game Players?

Xinyu Wang, Bohan Zhuang, Qi Wu

ICLR 2025posterarXiv:2503.02358
13
citations

ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models

Xubing Ye, Yukang Gan, Yixiao Ge et al.

CVPR 2025posterarXiv:2412.00447
31
citations

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Yuhui Zhang, Yuchang Su, Yiming Liu et al.

CVPR 2025posterarXiv:2501.03225
21
citations

Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization

kaiyuan Li, Xiaoyue Chen, Chen Gao et al.

NEURIPS 2025posterarXiv:2505.22038
4
citations

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

Zeyi Huang, Yuyang Ji, Xiaofang Wang et al.

CVPR 2025posterarXiv:2501.04336
7
citations

Can We Talk Models Into Seeing the World Differently?

Paul Gavrikov, Jovita Lukasik, Steffen Jung et al.

ICLR 2025posterarXiv:2403.09193
15
citations

CAPTURE: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting

Atin Pothiraj, Jaemin Cho, Elias Stengel-Eskin et al.

ICCV 2025posterarXiv:2504.15485
19
citations

ColPali: Efficient Document Retrieval with Vision Language Models

Manuel Faysse, Hugues Sibille, Tony Wu et al.

ICLR 2025posterarXiv:2407.01449
91
citations

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

Hongxin Zhang, Zeyuan Wang, Qiushi Lyu et al.

ICLR 2025posterarXiv:2404.10775
33
citations

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time

Yi Ding, Bolian Li, Ruqi Zhang

ICLR 2025posterarXiv:2410.06625
42
citations

FastVLM: Efficient Vision Encoding for Vision Language Models

Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li et al.

CVPR 2025posterarXiv:2412.13303
38
citations

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Jin Wang, Chenghui Lv, Xian Li et al.

CVPR 2025posterarXiv:2503.15024
10
citations

FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Vision Language Models

Tianyu Fu, Tengxuan Liu, Qinghao Han et al.

ICCV 2025posterarXiv:2501.01986
22
citations

GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning

Haolong Yan, Yeqing Shen, Xin Huang et al.

NEURIPS 2025posterarXiv:2512.02423

HalLoc: Token-level Localization of Hallucinations for Vision Language Models

Eunkyu Park, Minyeong Kim, Gunhee Kim

CVPR 2025posterarXiv:2506.10286
3
citations

Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions

Yiting Qu, Ziqing Yang, Yihan Ma et al.

ICCV 2025posterarXiv:2507.22617

HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?

Yusen Zhang, Wenliang Zheng, Aashrith Madasu et al.

ICCV 2025posterarXiv:2504.18406

Improving Large Vision and Language Models by Learning from a Panel of Peers

Jefferson Hernandez, Jing Shi, Simon Jenni et al.

ICCV 2025posterarXiv:2509.01610
1
citations

Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search

Yuta Oshima, Masahiro Suzuki, Yutaka Matsuo et al.

NEURIPS 2025posterarXiv:2501.19252
20
citations

Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing

Yudong Liu, Jingwei Sun, Yueqian Lin et al.

ICCV 2025posterarXiv:2503.10742
6
citations

Knowledge Transfer from Interaction Learning

Yilin Gao, Kangyi Chen, Zhongxing Peng et al.

ICCV 2025posterarXiv:2509.18733

Mechanistic Interpretability Meets Vision Language Models: Insights and Limitations

Yiming Liu, Yuhui Zhang, Serena Yeung

ICLR 2025poster

METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models

Yuchen Liu, Yaoming Wang, Bowen Shi et al.

ICCV 2025posterarXiv:2507.20842
1
citations

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

Yicheng Xiao, Lin Song, Yukang Chen et al.

NEURIPS 2025posterarXiv:2505.13031
18
citations

MuSLR: Multimodal Symbolic Logical Reasoning

Jundong Xu, Hao Fei, Yuhui Zhang et al.

NEURIPS 2025posterarXiv:2509.25851

OOD-Barrier: Build a Middle-Barrier for Open-Set Single-Image Test Time Adaptation via Vision Language Models

Boyang Peng, Sanqing Qu, Tianpei Zou et al.

NEURIPS 2025poster

Open-ended Hierarchical Streaming Video Understanding with Vision Language Models

Hyolim Kang, Yunsu Park, Youngbeom Yoo et al.

ICCV 2025posterarXiv:2509.12145
3
citations

PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language Models

Jenny Schmalfuss, Nadine Chang, Vibashan VS et al.

CVPR 2025posterarXiv:2506.14808
1
citations

Progress-Aware Video Frame Captioning

Zihui Xue, Joungbin An, Xitong Yang et al.

CVPR 2025posterarXiv:2412.02071
7
citations

Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models

Wei Suo, Ji Ma, Mengyang Sun et al.

ICCV 2025posterarXiv:2412.06458
1
citations

Rethinking Layered Graphic Design Generation with a Top-Down Approach

Jingye Chen, Zhaowen Wang, Nanxuan Zhao et al.

ICCV 2025posterarXiv:2507.05601
3
citations

Semantic Discrepancy-aware Detector for Image Forgery Identification

Wang Ziye, Minghang Yu, Chunyan Xu et al.

ICCV 2025posterarXiv:2508.12341

SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes

Yifan Yang, Zhen Zhang, Rupak Vignesh Swaminathan et al.

NEURIPS 2025posterarXiv:2506.20990
1
citations

SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference

Samir Khaki, Junxian Guo, Jiaming Tang et al.

ICCV 2025posterarXiv:2510.17777
1
citations

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models

Yongting Zhang, Lu Chen, Guodong Zheng et al.

CVPR 2025posterarXiv:2406.12030
61
citations

SpiritSight Agent: Advanced GUI Agent with One Look

Zhiyuan Huang, Ziming Cheng, Junting Pan et al.

CVPR 2025posterarXiv:2503.03196
11
citations

Systematic Reward Gap Optimization for Mitigating VLM Hallucinations

Lehan He, Zeren Chen, Zhelun Shi et al.

NEURIPS 2025posterarXiv:2411.17265
3
citations

Training-Free Personalization via Retrieval and Reasoning on Fingerprints

Deepayan Das, Davide Talon, Yiming Wang et al.

ICCV 2025posterarXiv:2503.18623
2
citations

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding

Yan Shu, Zheng Liu, Peitian Zhang et al.

CVPR 2025posterarXiv:2409.14485
144
citations

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

Xindi Yang, Baolu Li, Yiming Zhang et al.

ICCV 2025posterarXiv:2503.23368
17
citations

VLM4D: Towards Spatiotemporal Awareness in Vision Language Models

Shijie Zhou, Alexander Vilesov, Xuehai He et al.

ICCV 2025posterarXiv:2508.02095
15
citations

Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation

Sua Lee, Kyubum Shin, Jung Ho Park

ICLR 2025posterarXiv:2507.07147
1
citations

Zero-Shot Vision Encoder Grafting via LLM Surrogates

Kaiyu Yue, Vasu Singla, Menglin Jia et al.

ICCV 2025posterarXiv:2505.22664