ICCV 2025 Papers

2,701 papers found • Page 53 of 55

VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks

shiduo zhang, Zhe Xu, Peiju Liu et al.

ICCV 2025poster

VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving

Ruifei Zhang, Wei Zhang, Xiao Tan et al.

ICCV 2025poster

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

Xindi Yang, Baolu Li, Yiming Zhang et al.

ICCV 2025poster

VLM4D: Towards Spatiotemporal Awareness in Vision Language Models

Shijie Zhou, Alexander Vilesov, Xuehai He et al.

ICCV 2025poster
15
citations

VLR-Driver: Large Vision-Language-Reasoning Models for Embodied Autonomous Driving

Fanjie Kong, Yitong Li, Weihuang Chen et al.

ICCV 2025poster

VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models

JIACHENG RUAN, Wenzhen Yuan, Xian Gao et al.

ICCV 2025poster

VMBench: A Benchmark for Perception-Aligned Video Motion Generation

Xinran Ling, Chen Zhu, Meiqi Wu et al.

ICCV 2025poster
15
citations

VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Runjia Li, Philip Torr, Andrea Vedaldi et al.

ICCV 2025highlight
20
citations

VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

Yash Garg, Saketh Bachu, Arindam Dutta et al.

ICCV 2025poster

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models

Kim Sung-Bin, Jeongsoo Choi, Puyuan Peng et al.

ICCV 2025poster
6
citations

VoluMe – Authentic 3D Video Calls from Live Gaussian Splat Prediction

Martin de La Gorce, Charlie Hewitt, Tibor Takács et al.

ICCV 2025poster

VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions

Marko Mihajlovic, Siwei Zhang, Gen Li et al.

ICCV 2025highlight

VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding

Minchao Jiang, Shunyu Jia, Jiaming Gu et al.

ICCV 2025poster
3
citations

VOVTrack: Exploring the Potentiality in Raw Videos for Open-Vocabulary Multi-Object Tracking

Zekun Qian, Ruize Han, Junhui Hou et al.

ICCV 2025poster

VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data

Jian Shi, Peter Wonka

ICCV 2025poster

Voyaging into Perpetual Dynamic Scenes from a Single View

Fengrui Tian, Tianjiao Ding, Jinqi Luo et al.

ICCV 2025poster

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization

Jiale Cheng, Ruiliang Lyu, Xiaotao Gu et al.

ICCV 2025poster

VPR-Cloak: A First Look at Privacy Cloak Against Visual Place Recognition

Shuting Dong, Mingzhi Chen, Feng Lu et al.

ICCV 2025poster

VQ-SGen: A Vector Quantized Stroke Representation for Creative Sketch Generation

Jiawei Wang, Zhiming Cui, Changjian Li

ICCV 2025poster
1
citations

VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers

Yating Wang, Haoyi Zhu, Mingyu Liu et al.

ICCV 2025poster
16
citations

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

Jiashuo Yu, Yue Wu, Meng Chu et al.

ICCV 2025poster
8
citations

VRM: Knowledge Distillation via Virtual Relation Matching

Weijia Zhang, Fei Xie, Weidong Cai et al.

ICCV 2025highlight

VSC: Visual Search Compositional Text-to-Image Diffusion Model

Do Dat, Nam Hyeon-Woo, Po-Yuan Mao et al.

ICCV 2025poster
2
citations

VSP: Diagnosing the Dual Challenges of Perception and Reasoning in Spatial Planning Tasks for MLLMs

Qiucheng Wu, Handong Zhao, Michael Saxon et al.

ICCV 2025poster

VSRM: A Robust Mamba-Based Framework for Video Super-Resolution

Phu Tran Dinh, Hung Dao, Daeyoung Kim

ICCV 2025poster

VSSD: Vision Mamba with Non-Causal State Space Duality

Yuheng Shi, Mingjia Li, Minjing Dong et al.

ICCV 2025posterarXiv:2407.18559
24
citations

VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning

Jinglei Zhang, Yuanfan Guo, Rolandos Alexandros Potamias et al.

ICCV 2025poster

Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection

Dat NGUYEN, Marcella Astrid, Anis Kacem et al.

ICCV 2025posterarXiv:2501.01184

WalkVLM: Aid Visually Impaired People Walking by Vision Language Model

Zhiqiang Yuan, Ting Zhang, Yeshuang Zhu et al.

ICCV 2025poster

WarpHE4D: Dense 4D Head Map toward Full Head Reconstruction

Jongseob Yun, Yong-Hoon Kwon, Min-Gyu Park et al.

ICCV 2025poster

Wasserstein Style Distribution Analysis and Transform for Stylized Image Generation

Xi Yu, Xiang Gu, Zhihao Shi et al.

ICCV 2025highlight
1
citations

Wavelet Policy: Lifting Scheme for Policy Learning in Long-Horizon Tasks

Hao Huang, Shuaihang Yuan, Geeta Chandra Raju Bethala et al.

ICCV 2025poster

Wave-MambaAD: Wavelet-driven State Space Model for Multi-class Unsupervised Anomaly Detection

Qiao Zhang, Mingwen Shao, Xinyuan Chen et al.

ICCV 2025poster

WaveMamba: Wavelet-Driven Mamba Fusion for RGB-Infrared Object Detection

Haodong Zhu, Wenhao Dong, Linlin Yang et al.

ICCV 2025poster
5
citations

WAVE: Warp-Based View Guidance for Consistent Novel View Synthesis Using a Single Image

Jiwoo Park, Tae Choi, Youngjun Jun et al.

ICCV 2025poster

Weakly-Supervised Learning of Dense Functional Correspondences

Stefan Stojanov, Linan Zhao, Yunzhi Zhang et al.

ICCV 2025poster

Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning

Yafei Zhang, Lingqi Kong, Huafeng Li et al.

ICCV 2025poster

WeaveSeg: Iterative Contrast-weaving and Spectral Feature-refining for Nuclei Instance Segmentation

Jiajia Li, Huisi Wu, Jing Qin

ICCV 2025highlight

Web Artifact Attacks Disrupt Vision Language Models

Maan Qraitem, Piotr Teterwak, Kate Saenko et al.

ICCV 2025poster
1
citations

What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning

Chi-Hsi Kung, Frangil Ramirez, Juhyung Ha et al.

ICCV 2025poster
2
citations

What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models

Lorenzo Baraldi, Davide Bucciarelli, Federico Betti et al.

ICCV 2025poster
2
citations

What If: Understanding Motion Through Sparse Interactions

Stefan A. Baumann, Nick Stracke, Timy Phan et al.

ICCV 2025poster

What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?

Jinhong Ni, Chang-Bin Zhang, Qiang Zhang et al.

ICCV 2025poster

What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

Xavier Thomas, Deepti Ghadiyaram

ICCV 2025poster
2
citations

What's Making That Sound Right Now? Video-centric Audio-Visual Localization

hahyeon choi, Junhoo Lee, Nojun Kwak

ICCV 2025poster

What to Distill? Fast Knowledge Distillation with Adaptive Sampling

Byungchul Chae, Seonyeong Heo

ICCV 2025highlight

What we need is explicit controllability: Training 3D gaze estimator using only facial images

Tingwei Li, Jun Bao, Zhenzhong Kuang et al.

ICCV 2025poster

What You Have is What You Track: Adaptive and Robust Multimodal Tracking

Yuedong Tan, Jiawei Shao, Eduard Zamfir et al.

ICCV 2025poster
3
citations

When Anchors Meet Cold Diffusion: A Multi-Stage Approach to Lane Detection

Bo-Lun Huang, Tzu-Hsiang Ni, Feng-Kai Huang et al.

ICCV 2025poster

When and Where do Data Poisons Attack Textual Inversion?

Jeremy Styborski, Mingzhi Lyu, Jiayou Lu et al.

ICCV 2025poster