2025 Papers

21,856 papers found • Page 426 of 438

VLForgery Face Triad: Detection, Localization and Attribution via Multimodal Large Language Models

Xinan He, Yue Zhou, Bing Fan et al.

NeurIPS 2025posterarXiv:2503.06142

VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning

Yongshuo Zong, Ondrej Bohdal, Timothy Hospedales

ICLR 2025posterarXiv:2403.13164
17
citations

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

Xindi Yang, Baolu Li, Yiming Zhang et al.

ICCV 2025posterarXiv:2503.23368
17
citations

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks

Ziyan Jiang, Rui Meng, Xinyi Yang et al.

ICLR 2025poster

VLM4D: Towards Spatiotemporal Awareness in Vision Language Models

Shijie Zhou, Alexander Vilesov, Xuehai He et al.

ICCV 2025posterarXiv:2508.02095
15
citations

VLMaterial: Procedural Material Generation with Large Vision-Language Models

Beichen Li, Rundi Wu, Armando Solar-Lezama et al.

ICLR 2025posterarXiv:2501.18623
5
citations

VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking

Kichang Yang, Seonjun Kim, Minjae Kim et al.

NeurIPS 2025posterarXiv:2511.18692

VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture

Maonan Wang, Yirong Chen, Aoyu Pang et al.

NeurIPS 2025posterarXiv:2505.19486

VLM-R³: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Chaoya Jiang, Yongrui Heng, Wei Ye et al.

NeurIPS 2025poster
18
citations

VLMs can Aggregate Scattered Training Patches

Zhanhui Zhou, Lingjie Chen, Chao Yang et al.

NeurIPS 2025posterarXiv:2506.03614

VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning

Haoran Xu, Peixi Peng, Guang Tan et al.

CVPR 2025poster

VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs

Shmuel Berman, Jia Deng

NeurIPS 2025spotlight

VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Enric Corona, Andrei Zanfir, Eduard Gabriel Bazavan et al.

CVPR 2025posterarXiv:2403.08764
46
citations

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary

Kevin Qinghong Lin, Mike Zheng Shou

CVPR 2025posterarXiv:2503.09402

VLR-Driver: Large Vision-Language-Reasoning Models for Embodied Autonomous Driving

Fanjie Kong, Yitong Li, Weihuang Chen et al.

ICCV 2025poster

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Haozhe Wang, Chao Qu, Zuming Huang et al.

NeurIPS 2025spotlightarXiv:2504.08837
169
citations

VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models

Lei Li, wei yuancheng, Zhihui Xie et al.

CVPR 2025highlight

VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models

JIACHENG RUAN, Wenzhen Yuan, Xian Gao et al.

ICCV 2025posterarXiv:2503.07478
15
citations

VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set

Shufan Shen, Junshu Sun, Qingming Huang et al.

NeurIPS 2025posterarXiv:2510.21323
1
citations

VL-SAM-V2: Open-World Object Detection with General and Specific Query Fusion

Zhiwei Lin, Yongtao Wang

NeurIPS 2025posterarXiv:2505.18986
1
citations

VLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion

Meng Wang, Huilong Pi, Ruihui Li et al.

AAAI 2025paperarXiv:2503.06219

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models

Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang et al.

CVPR 2025posterarXiv:2412.01822

VMBench: A Benchmark for Perception-Aligned Video Motion Generation

Xinran Ling, Chen Zhu, Meiqi Wu et al.

ICCV 2025poster
15
citations

VMDT: Decoding the Trustworthiness of Video Foundation Models

Yujin Potter, Zhun Wang, Nicholas Crispino et al.

NeurIPS 2025posterarXiv:2511.05682

VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Runjia Li, Philip Torr, Andrea Vedaldi et al.

ICCV 2025highlightarXiv:2506.18903
20
citations

Vocabulary-Guided Gait Recognition

Panjian Huang, Saihui Hou, Chunshui Cao et al.

NeurIPS 2025poster

Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding

Qian Ma, Ruoxiang Xu, Yongqiang Cai

NeurIPS 2025posterarXiv:2511.06376

VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

Yash Garg, Saketh Bachu, Arindam Dutta et al.

ICCV 2025posterarXiv:2508.06757

VoCo-LLaMA: Towards Vision Compression with Large Language Models

Xubing Ye, Yukang Gan, Xiaoke Huang et al.

CVPR 2025posterarXiv:2406.12275

VODiff: Controlling Object Visibility Order in Text-to-Image Generation

Dong Liang, Jinyuan Jia, Yuhao Liu et al.

CVPR 2025poster
3
citations

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models

Kim Sung-Bin, Jeongsoo Choi, Puyuan Peng et al.

ICCV 2025posterarXiv:2504.02386
6
citations

VOILA: Complexity-Aware Universal Segmentation of CT Images by Voxel Interacting with Language

Zishuo Wan, Yu Gao, Wanyuan Pang et al.

AAAI 2025paperarXiv:2501.03482

VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning

Nilay Yilmaz, Maitreya Patel, Lawrence Luo et al.

ICLR 2025posterarXiv:2503.00043
1
citations

VolFormer: Explore More Comprehensive Cube Interaction for Hyperspectral Image Restoration and Beyond

Dabing Yu, Zheng Gao

CVPR 2025poster

VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play

Zelai Xu, Ruize Zhang, Chao Yu et al.

NeurIPS 2025posterarXiv:2502.01932

VoluMe – Authentic 3D Video Calls from Live Gaussian Splat Prediction

Martin de La Gorce, Charlie Hewitt, Tibor Takács et al.

ICCV 2025posterarXiv:2507.21311
2
citations

Volume-Aware Distance for Robust Similarity Learning

Shuo Chen, Chen Gong, Jun Li et al.

ICML 2025poster

Volume Optimality in Conformal Prediction with Structured Prediction Sets

Chao Gao, Liren Shan, Vaidehi Srinivas et al.

ICML 2025posterarXiv:2502.16658
6
citations

Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-noising and Super-Resolution

ZELIN LI, Chenwei Wang, Zhaoke Huang et al.

CVPR 2025highlight

Volume Transmission Implements Context Factorization to Target Online Credit Assignment and Enable Compositional Generalization

Matthew Bull, Po-Chen Kuo, Andrew Smith et al.

NeurIPS 2025poster

Volumetrically Consistent 3D Gaussian Rasterization

Chinmay Talegaonkar, Yash Belhe, Ravi Ramamoorthi et al.

CVPR 2025highlight

VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions

Marko Mihajlovic, Siwei Zhang, Gen Li et al.

ICCV 2025highlightarXiv:2506.23236
3
citations

Volumetric Surfaces: Representing Fuzzy Geometries with Layered Meshes

Stefano Esposito, Anpei Chen, Christian Reiser et al.

CVPR 2025posterarXiv:2409.02482

Voronoi-grid-based Pareto Front Learning and Its Application to Collaborative Federated Learning

Mengmeng Chen, Xiaohu Wu, QIQI LIU et al.

ICML 2025posterarXiv:2505.20648

VORTA: Efficient Video Diffusion via Routing Sparse Attention

Wenhao Sun, Rong-Cheng Tu, Yifu Ding et al.

NeurIPS 2025posterarXiv:2505.18809
7
citations

VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow

Yancong Lin, Shiming Wang, Liangliang Nan et al.

CVPR 2025posterarXiv:2503.22328
4
citations

Voter Priming Campaigns: Strategies, Equilibria, and Algorithms

Jonathan Shaki, Yonatan Aumann, Sarit Kraus

AAAI 2025paperarXiv:2412.13380

VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding

Minchao Jiang, Shunyu Jia, Jiaming Gu et al.

ICCV 2025posterarXiv:2506.22799
3
citations

VOVTrack: Exploring the Potentiality in Raw Videos for Open-Vocabulary Multi-Object Tracking

Zekun Qian, Ruize Han, Junhui Hou et al.

ICCV 2025poster

VoxDet: Rethinking 3D Semantic Scene Completion as Dense Object Detection

Wuyang Li, Zhu Yu, Alexandre Alahi

NeurIPS 2025spotlight