ICCV 2025 Papers
2,701 papers found • Page 4 of 55
ART: Adaptive Relation Tuning for Generalized Relation Prediction
Gopika Sudhakaran, Hikaru Shindo, Patrick Schramowski et al.
ArtEditor: Learning Customized Instructional Image Editor from Few-Shot Examples
Shijie Huang, Yiren Song, Yuxuan Zhang et al.
Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description
Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech et al.
Arti-PG: A Toolbox for Procedurally Synthesizing Large-Scale and Diverse Articulated Objects with Rich Annotations
Jianhua Sun, Yuxuan Li, Jiude Wei et al.
AR-VRM: Imitating Human Motions for Visual Robot Manipulation with Analogical Reasoning
Dejie Yang, Zijing Zhao, Yang Liu
ASCENT: Annotation-free Self-supervised Contrastive Embeddings for 3D Neuron Tracking in Fluorescence Microscopy
Haejun Han, Hang Lu
ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching
Yuxuan Yuan, Luyao Tang, Chaoqi Chen et al.
A Simple yet Mighty Hartley Diffusion Versatilist for Generalizable Dense Vision Tasks
Qi Bi, Jingjun Yi, Huimin Huang et al.
Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering
Imad Eddine MAROUF, Enzo Tartaglione, Stéphane Lathuilière et al.
AstroLoc: Robust Space to Ground Image Localizer
Gabriele Berton, Alex Stoken, Carlo Masone
A Structure-aware and Motion-adaptive Framework for 3D Human Pose Estimation with Mamba
Ye Lu, Jie Wang, Jianjun Gao et al.
Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset
Ruofei WANG, Peiqi Duan, Boxin Shi et al.
ATAS: Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction
Soonwoo Cha, Jiwoo Song, Juan Yeo et al.
ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking
Xiaokun Feng, Shiyu Hu, Xuchen Li et al.
A Tiny Change, A Giant Leap: Long-Tailed Class-Incremental Learning via Geometric Prototype Alignment
xinyi lai, Luojun Lin, Weijie Chen et al.
ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling
Jinhyung Park, Javier Romero, Shunsuke Saito et al.
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan, Zining Wang, Pei Fu et al.
Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder
Wonwoong Cho, Yan-Ying Chen, Matthew Klenk et al.
Attention to Neural Plagiarism: Diffusion Models Can Plagiarize Your Copyrighted Images!
zihang zou, Boqing Gong, Liqiang Wang
Attention to the Burtiness in Visual Prompt Tuning!
Yuzhu Wang, Manni Duan, Shu Kong
Attention to Trajectory: Trajectory-Aware Open-Vocabulary Tracking
Yunhao Li, Yifan Jiao, Dan Meng et al.
AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation
Hao Li, Ju Dai, Feng Zhou et al.
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Fating Hong, Zunnan Xu, Zixiang Zhou et al.
Augmented and Softened Matching for Unsupervised Visible-Infrared Person Re-Identification
Zhiqi Pang, Chunyu Wang, Lingling Zhao et al.
Augmented Mass-Spring Model for Real-Time Dense Hair Simulation
Jorge Herrera, Yi Zhou, Xin Sun et al.
Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning
Zhengxuan Wei, Jiajin Tang, Sibei Yang
A Unified Framework for Industrial Cel-Animation Colorization with Temporal-Structural Awareness
Xiaoyi Feng, Tao Huang, Peng Wang et al.
A Unified Framework for Motion Reasoning and Generation in Human Interaction
Jeongeun Park, Sungjoon Choi, Sangdoo Yun
A Unified Framework to BRIDGE Complete and Incomplete Deep Multi-View Clustering under Non-IID Missing Patterns
Xiaorui Jiang, Buyun He, Peng Yuan Zhou et al.
A Unified Interpretation of Training-Time Out-of-Distribution Detection
Xu Cheng, Xin Jiang, Zechao Li
AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury, Hanan Gani, Nishit Anand et al.
Authentic 4D Driving Simulation with a Video Generation Model
Lening Wang, Wenzhao Zheng, Dalong Du et al.
AutoComPose: Automatic Generation of Pose Transition Descriptions for Composed Pose Retrieval Using Multimodal LLMs
Yi-Ting Shen, Sungmin Eum, Doheon Lee et al.
Auto-Controlled Image Perception in MLLMs via Visual Perception Tokens
Runpeng Yu, Xinyin Ma, Xinchao Wang
Automated Model Evaluation for Object Detection via Prediction Consistency and Reliability
Seungju Yoo, Hyuk Kwon, Joong-Won Hwang et al.
Automated Red Teaming for Text-to-Image Models through Feedback-Guided Prompt Iteration with Vision-Language Models
Wei Xu, Kangjie Chen, Jiawei Qiu et al.
AutoOcc: Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting
Xiaoyu Zhou, Jingqi Wang, Yongtao Wang et al.
AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts
Yufan Liu, Wanqian Zhang, Huashan Chen et al.
Autoregressive Denoising Score Matching is a Good Video Anomaly Detector
hanwen Zhang, Congqi Cao, Qinyi Lv et al.
Auto-Regressively Generating Multi-View Consistent Images
JiaKui Hu, Yuxiao Yang, Jialun Liu et al.
Auto-Regressive Transformation for Image Alignment
Kanggeon Lee, Soochahn Lee, Kyoung Mu Lee
AutoScape: Geometry-Consistent Long-Horizon Scene Generation
Jiacheng Chen, Ziyu Jiang, Mingfu Liang et al.
Auto-Vocabulary Semantic Segmentation
Osman Ülger, Maksymilian Kulicki, Yuki Asano et al.
Auxiliary Prompt Tuning of Vision-Language Models for Few-Shot Out-of-Distribution Detection
Wenjun Miao, Guansong Pang, Zihan Wang et al.
AVAM: a Universal Training-free Adaptive Visual Anchoring Embedded into Multimodal Large Language Model for Multi-image Question Answering
Kang Zeng, Guojin Zhong, Jintao Cheng et al.
Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars
Tobias Kirschstein, Javier Romero, Artem Sevastopolsky et al.
AV-Flow: Transforming Text to Audio-Visual Human-like Interactions
Aggelina Chatziagapi, Louis-Philippe Morency, Hongyu Gong et al.
A View-consistent Sampling Method for Regularized Training of Neural Radiance Fields
Aoxiang Fan, Corentin Dumery, Nicolas Talabot et al.
A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets
Zexi Jia, Chuanwei Huang, Yeshuang Zhu et al.
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin et al.