2025 Highlight Papers

ICCV 2025highlightarXiv:2506.18368

Sequential keypoint density estimator: an overlooked baseline of skeleton-based video anomaly detection

Anja Delić, Matej Grcic, Siniša Šegvić

CVPR 2025highlightarXiv:2504.14687

Seurat: From Moving Points to Depth

Seokju Cho, Gabriel Huang, Seungryong Kim et al.

ICCV 2025highlightarXiv:2508.02278

SGAD: Semantic and Geometric-aware Descriptor for Local Feature Matching

Xiangzeng Liu, CHI WANG, Guanglu Shi et al.

Shape Abstraction via Marching Differentiable Support Functions

Sunkyung Park, Jeongmin Lee, Dongjun Lee

ICCV 2025highlightarXiv:2407.13764

Shape of Motion: 4D Reconstruction from a Single Video

Qianqian Wang, Vickie Ye, Hang Gao et al.

Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models

Itay Benou, Tammy Riklin Raviv

CVPR 2025highlightarXiv:2502.20134

ICCV 2025highlightarXiv:2507.00585

Similarity Memory Prior is All You Need for Medical Image Segmentation

Hao Tang, Zhiqing Guo, Liejun Wang et al.

SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment

Katrin Renz, Long Chen, Elahe Arani et al.

CVPR 2025highlightarXiv:2503.09594

CVPR 2025highlightarXiv:2411.03745

Simulator HC: Regression-based Online Simulation of Starting Problem-Solution Pairs for Homotopy Continuation in Geometric Vision

Xinyue Zhang, Zijia Dai, Wanting Xu et al.

SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons

Yuanyou Xu, Zongxin Yang, Yi Yang

CVPR 2025highlightarXiv:2408.15270

SkillMimic: Learning Basketball Interaction Skills from Demonstrations

Yinhuai Wang, Qihan Zhao, Runyi Yu et al.

CVPR 2025highlightarXiv:2412.09401

SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos

Yuzheng Liu, Siyan Dong, Shuzhe Wang et al.

Sliced Wasserstein Bridge for Open-Vocabulary Video Instance Segmentation

Zheyun Qin, Deng Yu, Chuanchen Luo et al.

CVPR 2025highlightarXiv:2507.22264

SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

Shaoan Xie, Lingjing Kong, Yujia Zheng et al.

SMSTracker: Tri-path Score Mask Sigma Fusion for Multi-Modal Tracking

Sixian Chan, Zedong Li, Xiaoqin Zhang et al.

CVPR 2025highlightarXiv:2412.09619

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Jierun Chen, Dongting Hu, Xijie Huang et al.

SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning

Seokju Yun, Seunghye Chae, Dongheon Lee et al.

CVPR 2025highlightarXiv:2412.04077

CVPR 2025highlightarXiv:2503.16429

Sonata: Self-Supervised Learning of Reliable Point Representations

Xiaoyang Wu, Daniel DeTone, Duncan Frost et al.

CVPR 2025highlightarXiv:2504.05576

SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding

Mingfei Chen, Israel D. Gebru, Ishwarya Ananthabhotla et al.

SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts

Shijia Zhao, Qiming Xia, Xusheng Guo et al.

CVPR 2025highlightarXiv:2503.06467

ICCV 2025highlightarXiv:2505.02178

Sparfels: Fast Reconstruction from Sparse Unposed Imagery

Shubhendu Jena, Amine Ouasfi, Mae Younes et al.

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models

Xingrui Wang, Wufei Ma, Tiezheng Zhang et al.

CVPR 2025highlightarXiv:2505.00788

SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models

Wufei Ma, Luoxin Ye, Nessa McWeeney et al.

Spatio-Spectral Pattern Illumination for Direct and Indirect Separation from a Single Hyperspectral Image

Shin Ishihara, Imari Sato

SpecTRe-GS: Modeling Highly Specular Surfaces with Reflected Nearby Objects by Tracing Rays in 3D Gaussian Splatting

Jiajun Tang, Fan Fei, Zhihao Li et al.

CVPR 2025highlightarXiv:2411.15482

SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving

Su Sun, Cheng Zhao, Zhuoyang Sun et al.

ICCV 2025highlightarXiv:2507.04263

SRefiner: Soft-Braid Attention for Multi-Agent Trajectory Refinement

Liwen Xiao, Zhiyu Pan, Zhicheng Wang et al.

SSHNet: Unsupervised Cross-modal Homography Estimation via Problem Reformulation and Split Optimization

Junchen Yu, Siyuan Cao, Runmin Zhang et al.

CVPR 2025highlightarXiv:2409.17993

StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth

Zheng Zhang, Lihe Yang, Tianyu Yang et al.

ICCV 2025highlightarXiv:2507.23483

Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with Two-Stage Depth Diffusion

Mutian Xu, Chongjie Ye, Haolin Liu et al.

ICCV 2025highlightarXiv:2503.05549

Stereo Any Video: Temporally Consistent Stereo Matching

Junpeng Jing, Weixun Luo, Ye Mao et al.

STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models

Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer et al.

CVPR 2025highlightarXiv:2408.16807

CVPR 2025highlightarXiv:2504.02823

STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection

Divya Velayudhan, Abdelfatah Ahmed, Mohamad Alansari et al.

ICCV 2025highlightarXiv:2412.03489

Stochastic Gradient Estimation for Higher-Order Differentiable Rendering

Zican Wang, Michael Fischer, Tobias Ritschel

StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data

Yixu Wang, Yan Teng, Yingchun Wang et al.

ICCV 2025highlightarXiv:2509.23594

Straighten Viscous Rectified Flow via Noise Optimization

Jimin Dai, Jiexi Yan, Jian Yang et al.

ICCV 2025highlightarXiv:2507.10218

Structure-Aware Correspondence Learning for Relative Pose Estimation

Yihan Chen, Wenfei Yang, Huan Ren et al.

CVPR 2025highlightarXiv:2503.18671

Structured 3D Latents for Scalable and Versatile 3D Generation

Jianfeng XIANG, Zelong Lv, Sicheng Xu et al.

CVPR 2025highlightarXiv:2412.01506

Structure from Collision

Takuhiro Kaneko

CVPR 2025highlightarXiv:2505.21335

Structure-from-Motion with a Non-Parametric Camera Model

Yihan Wang, Linfei Pan, Marc Pollefeys et al.

ICCV 2025highlightarXiv:2507.18944

Structure Matters: Revisiting Boundary Refinement in Video Object Segmentation

Guanyi Qin, Ziyue Wang, Daiyun Shen et al.

Style-Editor: Text-driven Object-centric Style Editing

Jihun Park, Jongmin Gim, Kyoungmin Lee et al.

CVPR 2025highlightarXiv:2408.08461

Style Evolving along Chain-of-Thought for Unknown-Domain Object Detection

Zihao Zhang, Aming Wu, Yahong Han

CVPR 2025highlightarXiv:2503.09968

CVPR 2025highlightarXiv:2501.11319

StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer

ruojun xu, Weijie Xi, Xiaodi Wang et al.

ICCV 2025highlightarXiv:2510.08458

SummDiff: Generative Modeling of Video Summarization with Diffusion

Kwanseok Kim, Jaehoon Hahm, Sumin Kim et al.

SuperEvent: Cross-Modal Learning of Event-based Keypoint Detection for SLAM

Yannick Burkhardt, Simon Schaefer, Stefan Leutenegger

ICCV 2025highlightarXiv:2504.00139

Super Resolved Imaging with Adaptive Optics

Robin Swanson, Esther Y. H. Lin, Masen Lamb et al.

ICCV 2025highlightarXiv:2508.04648

Supervising Sound Localization by In-the-wild Egomotion

Anna Min, Ziyang Chen, Hang Zhao et al.

CVPR 2025highlightarXiv:2503.20354

SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity

Ke Ma, Jiaqi Tang, Bin Guo et al.