2025 Papers

21,856 papers found • Page 424 of 438

Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation

Yukuan Min, Muli Yang, Jinhao Zhang et al.

ICCV 2025poster

Vision-Language Model IP Protection via Prompt-based Learning

Lianyu Wang, Meng Wang, Huazhu Fu et al.

CVPR 2025posterarXiv:2503.02393

Vision Language Models are In-Context Value Learners

Yecheng Jason Ma, Joey Hejna, Chuyuan Fu et al.

ICLR 2025oralarXiv:2411.04549
43
citations

Vision-Language Models Can't See the Obvious

YASSER ABDELAZIZ DAHOU DJILALI, Ngoc Huynh, Phúc Lê Khắc et al.

ICCV 2025posterarXiv:2507.04741
7
citations

Vision-Language Models Create Cross-Modal Task Representations

Grace Luo, Trevor Darrell, Amir Bar

ICML 2025posterarXiv:2410.22330
7
citations

Vision-Language Models Do Not Understand Negation

Kumail Alhamoud, Shaden Alshammari, Yonglong Tian et al.

CVPR 2025posterarXiv:2501.09425
36
citations

Vision-Language Model Selection and Reuse for Downstream Adaptation

Hao-Zhe Tan, Zhi Zhou, Yu-Feng Li et al.

ICML 2025posterarXiv:2501.18271
2
citations

Vision-Language Neural Graph Featurization for Extracting Retinal Lesions

Taimur Hassan, Anabia Sohail, Muzammal Naseer et al.

ICCV 2025poster

Vision‑Language‑Vision Auto‑Encoder: Scalable Knowledge Distillation from Diffusion Models

Tiezheng Zhang, Yitong Li, Yu-Cheng Chou et al.

NeurIPS 2025posterarXiv:2507.07104
2
citations

Vision-LSTM: xLSTM as Generic Vision Backbone

Benedikt Alkin, Maximilian Beck, Korbinian Pöppel et al.

ICLR 2025posterarXiv:2406.04303
85
citations

VisionMath: Vision-Form Mathematical Problem-Solving

Zongyang Ma, Yuxin Chen, Ziqi Zhang et al.

ICCV 2025poster

VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving

Haiming Zhang, Wending Zhou, Shenzhen The Chinese University of Hongkong et al.

CVPR 2025posterarXiv:2411.14716
10
citations

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Yuchen Duan, Weiyun Wang, Zhe Chen et al.

ICLR 2025posterarXiv:2403.02308

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Senqiao Yang, Junyi Li, Xin Lai et al.

NeurIPS 2025posterarXiv:2507.13348

Vision Transformers Beat WideResNets on Small Scale Datasets Adversarial Robustness

Juntao Wu, Ziyu Song, Xiaoyu Zhang et al.

AAAI 2025paper

Vision Transformers Don't Need Trained Registers

Nicholas Jiang, Amil Dravid, Alexei Efros et al.

NeurIPS 2025spotlightarXiv:2506.08010
12
citations

Vision Transformers with Self-Distilled Registers

Zipeng Yan, Yinjie Chen, Chong Zhou et al.

NeurIPS 2025spotlightarXiv:2505.21501
4
citations

VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters

Mouxiang Chen, Lefei Shen, Zhuo Li et al.

ICML 2025posterarXiv:2408.17253

VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models

Taesung Kwon, Jong Ye

ICCV 2025posterarXiv:2412.00156
8
citations

VisionZip: Longer is Better but Not Necessary in Vision Language Models

Senqiao Yang, Yukang Chen, Zhuotao Tian et al.

CVPR 2025posterarXiv:2412.04467

VisNumBench: Evaluating Number Sense of Multimodal Large Language Models

Tengjin Weng, Jingyi Wang, Wenhao Jiang et al.

ICCV 2025posterarXiv:2503.14939

VISO: Accelerating In-orbit Object Detection with Language-Guided Mask Learning and Sparse Inference

Meiqi Wang, Han Qiu

ICCV 2025poster

ViSpeak: Visual Instruction Feedback in Streaming Videos

Shenghao Fu, Qize Yang, Yuan-Ming Li et al.

ICCV 2025posterarXiv:2503.12769
11
citations

ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding

Jialiang Kang, Han Shu, Wenshuo Li et al.

NeurIPS 2025posterarXiv:2509.15235
2
citations

ViSPLA: Visual Iterative Self-Prompting for Language-Guided 3D Affordance Learning

Hritam Basak, Zhaozheng Yin

NeurIPS 2025poster

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Shi Yu, Chaoyue Tang, Bokai Xu et al.

ICLR 2025posterarXiv:2410.10594
121
citations

VisRec: A Semi-Supervised Approach to Visibility Data Reconstruction in Radio Astronomy

Ruoqi Wang, Haitao Wang, Qiong Luo et al.

AAAI 2025paper

VisRL: Intention-Driven Visual Perception via Reinforced Reasoning

Zhangquan Chen, Xufang Luo, Dongsheng Li

ICCV 2025posterarXiv:2503.07523

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Yufan He, Pengfei Guo, Yucheng Tang et al.

CVPR 2025posterarXiv:2406.05285
38
citations

VistaDream: Sampling multiview consistent images for single-view scene reconstruction

Haiping Wang, Yuan Liu, Ziwei Liu et al.

ICCV 2025posterarXiv:2410.16892
27
citations

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation

Weiming Ren, Huan Yang, Jie Min et al.

CVPR 2025posterarXiv:2412.00927
9
citations

VISTREAM: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network

Kang You, Ziling Wei, Jing Yan et al.

CVPR 2025poster
2
citations

Visual Abstraction: A Plug-and-Play Approach for Text-Visual Retrieval

Guofeng Ding, Yiding Lu, Peng Hu et al.

ICML 2025poster

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Xiao Liu, Tianjie Zhang, Yu Gu et al.

ICLR 2025posterarXiv:2408.06327
67
citations

Visual Agentic AI for Spatial Reasoning with a Dynamic API

Damiano Marsili, Rohun Agrawal, Yisong Yue et al.

CVPR 2025posterarXiv:2502.06787
30
citations

Visual Agents as Fast and Slow Thinkers

Guangyan Sun, Mingyu Jin, Zhenting Wang et al.

ICLR 2025posterarXiv:2408.08862
44
citations

Visual Anagrams Reveal Hidden Differences in Holistic Shape Processing Across Vision Models

Fenil Doshi, Thomas Fel, Talia Konkle et al.

NeurIPS 2025posterarXiv:2507.00493

Visual and Auditory Feedback of Vibration, and Particle Effects for Enhancing Pseudo-Haptic Button Interaction in VR

Myeongji Ko, Woojoo Kim

ISMAR 2025paper

Visual and Domain Knowledge for Professional-level Graph-of-Thought Medical Reasoning

Rina Bao, Shilong Dong, Zhenfang Chen et al.

ICML 2025spotlight

Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning

Huajie Jiang, Zhengxian Li, Xiaohan Yu et al.

CVPR 2025posterarXiv:2503.23030
1
citations

Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models

Mingi Jung, Saehyung Lee, Eunji Kim et al.

ICML 2025posterarXiv:2502.01419

Visual Autoregressive Modeling for Image Super-Resolution

Yunpeng Qu, Kun Yuan, Jinhua Hao et al.

ICML 2025posterarXiv:2501.18993

Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images

Boyang Deng, Kyle Genova, Songyou Peng et al.

ICCV 2025highlightarXiv:2504.08727

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Zhong-Yu Li, Ruoyi Du, Juncheng Yan et al.

ICCV 2025posterarXiv:2504.07960
20
citations

Visual Consensus Prompting for Co-Salient Object Detection

Jie Wang, Nana Yu, Zihao Zhang et al.

CVPR 2025posterarXiv:2504.14254

Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs

Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar et al.

ICLR 2025posterarXiv:2405.15683
15
citations

Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection

Chanhyeong Yang, Taehoon song, Jihwan Park et al.

NeurIPS 2025posterarXiv:2510.25094

Visual Generation Without Guidance

Huayu Chen, Kai Jiang, Kaiwen Zheng et al.

ICML 2025posterarXiv:2501.15420
10
citations

Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models

Zahra Babaiee, Peyman M. Kiasari, Daniela Rus et al.

ICML 2025oralarXiv:2506.06242
1
citations

Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark

Tsung-Han Wu, Giscard Biamby, Jerome Quenum et al.

ICLR 2025posterarXiv:2407.13766
30
citations