ICCV "vision language models" Papers

18 papers found

Aligning Effective Tokens with Video Anomaly in Large Language Models

YINGXIAN Chen, Jiahui Liu, Ruidi Fan et al.

ICCV 2025posterarXiv:2508.06350
1
citations

CAPTURE: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting

Atin Pothiraj, Jaemin Cho, Elias Stengel-Eskin et al.

ICCV 2025posterarXiv:2504.15485
19
citations

FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Vision Language Models

Tianyu Fu, Tengxuan Liu, Qinghao Han et al.

ICCV 2025posterarXiv:2501.01986
22
citations

Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions

Yiting Qu, Ziqing Yang, Yihan Ma et al.

ICCV 2025posterarXiv:2507.22617

HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?

Yusen Zhang, Wenliang Zheng, Aashrith Madasu et al.

ICCV 2025posterarXiv:2504.18406

Improving Large Vision and Language Models by Learning from a Panel of Peers

Jefferson Hernandez, Jing Shi, Simon Jenni et al.

ICCV 2025posterarXiv:2509.01610
1
citations

Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing

Yudong Liu, Jingwei Sun, Yueqian Lin et al.

ICCV 2025posterarXiv:2503.10742
6
citations

Knowledge Transfer from Interaction Learning

Yilin Gao, Kangyi Chen, Zhongxing Peng et al.

ICCV 2025posterarXiv:2509.18733

METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models

Yuchen Liu, Yaoming Wang, Bowen Shi et al.

ICCV 2025posterarXiv:2507.20842
1
citations

Open-ended Hierarchical Streaming Video Understanding with Vision Language Models

Hyolim Kang, Yunsu Park, Youngbeom Yoo et al.

ICCV 2025posterarXiv:2509.12145
3
citations

Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models

Wei Suo, Ji Ma, Mengyang Sun et al.

ICCV 2025posterarXiv:2412.06458
1
citations

Rethinking Layered Graphic Design Generation with a Top-Down Approach

Jingye Chen, Zhaowen Wang, Nanxuan Zhao et al.

ICCV 2025posterarXiv:2507.05601
3
citations

Semantic Discrepancy-aware Detector for Image Forgery Identification

Wang Ziye, Minghang Yu, Chunyan Xu et al.

ICCV 2025posterarXiv:2508.12341

SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference

Samir Khaki, Junxian Guo, Jiaming Tang et al.

ICCV 2025posterarXiv:2510.17777
1
citations

Training-Free Personalization via Retrieval and Reasoning on Fingerprints

Deepayan Das, Davide Talon, Yiming Wang et al.

ICCV 2025posterarXiv:2503.18623
2
citations

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

Xindi Yang, Baolu Li, Yiming Zhang et al.

ICCV 2025posterarXiv:2503.23368
17
citations

VLM4D: Towards Spatiotemporal Awareness in Vision Language Models

Shijie Zhou, Alexander Vilesov, Xuehai He et al.

ICCV 2025posterarXiv:2508.02095
15
citations

Zero-Shot Vision Encoder Grafting via LLM Surrogates

Kaiyu Yue, Vasu Singla, Menglin Jia et al.

ICCV 2025posterarXiv:2505.22664