ICCV 2025 Papers

UniVerse: Unleashing the Scene Prior of Video Diffusion Models for Robust Radiance Field Reconstruction

Jin Cao, Hongrui Wu, Ziyong Feng et al.

UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing

Tsu-Jui Fu, Yusu Qian, Chen Chen et al.

Unknown Text Learning for CLIP-based Few-Shot Open-set Recognition

Rui Ma, Qilong Wang, Bing Cao et al.

Unlearning the Noisy Correspondence Makes CLIP More Robust

Haochen Han, Alex Jinpeng Wang, Peijun Ye et al.

Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin

Fangyikang Wang, Hubery Yin, Lei Qian et al.

ICCV 2025posterarXiv:2508.02288

Unleashing the Temporal Potential of Stereo Event Cameras for Continuous-Time 3D Object Detection

Jae Young Kang, Hoonhee Cho, Kuk-Jin Yoon

Unleashing Vecset Diffusion Model for Fast Shape Generation

Zeqiang Lai, Zhao Yunfei, Zibo Zhao et al.

Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation

Yihong Cao, Jiaming Zhang, Xu Zheng et al.

Unlocking the Potential of Diffusion Priors in Blind Face Restoration

Yunqi Miao, Zhiyu Qu, Mingqi Gao et al.

UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields

Fabian Perez, Sara Rojas Martinez, Carlos Hinojosa et al.

Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving

Junhao Ge, Zuhong Liu, Longteng Fan et al.

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Yingyu Liang, Zhizhou Sha, Zhenmei Shi et al.

UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI

Fangwei Zhong, Kui Wu, Churan Wang et al.

Unsupervised Histopathological Image Semantic Segmentation with Overlapping Patches Consistency Constraint

Wentian Cai, Weizhao Weng, Zihao Huang et al.

Unsupervised Identification of Protein Compositions and Conformations via Implicit Content-Transformation Disentanglement

Mostofa Rafid Uddin, Jana Armouti, Min Xu

ICCV 2025posterarXiv:2506.14605

Unsupervised Imaging Inverse Problems with Diffusion Distribution Matching

Giacomo Meanti, Thomas Ryckeboer, Michael Arbel et al.

Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras

Shuang Guo, Friedhelm Hamann, Guillermo Gallego

Unsupervised Part Discovery via Descriptor-Based Masked Image Restoration with Optimized Constraints

Jiahao Xia, Yike Wu, Wenjian Huang et al.

Unsupervised RGB-D Point Cloud Registration for Scenes with Low Overlap and Photometric Inconsistency

yejun Shou, Haocheng Wang, Lingfeng Shen et al.

Unsupervised Visible-Infrared Person Re-identification under Unpaired Settings

Haoyu Yao, Bin Yang, Wenke Huang et al.

Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization

Kesen Zhao, Beier Zhu, Qianru Sun et al.

Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA

Zhixuan Li, Hyunse Yoon, Sanghoon Lee et al.

UnZipLoRA: Separating Content and Style from a Single Image

Chang Liu, Viraj Shah, Aiyu Cui et al.

UPP: Unified Point-Level Prompting for Robust Point Cloud Analysis

Zixiang Ai, Zhenyu Cui, Yuxin Peng et al.

ICCV 2025posterarXiv:2507.00721

UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement

Xiao Zhang, Fei Wei, Yong Wang et al.

UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence

Jie Feng, Shengyuan Wang, Tianhui Liu et al.

ICCV 2025posterarXiv:2503.06132

USP: Unified Self-Supervised Pretraining for Image Generation and Understanding

Xiangxiang Chu, Renda Li, Yong Wang

UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling

Peiming Li, Ziyi Wang, Yulin Yuan et al.

U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration

Xiaofan Li, Zhihao Xu, Chenming Wu et al.

V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video

Jianqi Chen, Biao Zhang, Xiangjun Tang et al.

V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Junqi Ge, Ziyi Chen, Jintao Lin et al.

V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction

Zewei Zhou, Hao Xiang, Zhaoliang Zheng et al.

V2XScenes: A Multiple Challenging Traffic Conditions Dataset for Large-Range Vehicle-Infrastructure Collaborative Perception

Bowen Wang, Yafei Wang, Wei Gong et al.

ICCV 2025posterarXiv:2503.07598

VACE: All-in-One Video Creation and Editing

Zeyinzi Jiang, Zhen Han, Chaojie Mao et al.

169

VAFlow: Video-to-Audio Generation with Cross-Modality Flow Matching

Xihua Wang, Xin Cheng, Yuyue Wang et al.

VAGUE: Visual Contexts Clarify Ambiguous Expressions

Heejeong Nam, Jinwoo Ahn, Keummin Ka et al.

VALLR: Visual ASR Language Model for Lip Reading

Marshall Thomas, Edward Fish, Richard Bowden

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Weiming Ren, Wentao Ma, Huan Yang et al.

VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting

Hao Chen, Tao Han, Song Guo et al.

Variance-Based Pruning for Accelerating and Compressing Trained Networks

Uranik Berisha, Jens Mehnert, Alexandru Condurache

VCA: Video Curious Agent for Long Video Understanding

Zeyuan Yang, Delin Chen, Xueyang Yu et al.

Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision

Yuting He, Shuo Li

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

Shoubin Yu, Difan Liu, Ziqiao Ma et al.

VehicleMAE: View-asymmetry Mutual Learning for Vehicle Re-identification Pre-training via Masked AutoEncoders

Qi Wang, Zeyu Zhang, Dong Wang et al.

Verbalized Representation Learning for Interpretable Few-Shot Generalization

Cheng-Fu Yang, Da Yin, Wenbo Hu et al.

Versatile Transition Generation with Image-to-Video Diffusion

Zuhao Yang, Jiahui Zhang, Yingchen Yu et al.

VertexRegen: Mesh Generation with Continuous Level of Detail

Xiang Zhang, Yawar Siddiqui, Armen Avetisyan et al.

VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization

Sihan Yang, Runsen Xu, Chenhang Cui et al.

ICCV 2025posterarXiv:2508.08237

VGGSounder: Audio-Visual Evaluations for Foundation Models

Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu et al.