Highlight Papers

ICCV 2025highlightarXiv:2506.18368

Sequential keypoint density estimator: an overlooked baseline of skeleton-based video anomaly detection

Anja Delić, Matej Grcic, Siniša Šegvić

Seurat: From Moving Points to Depth

Seokju Cho, Gabriel Huang, Seungryong Kim et al.

SGAD: Semantic and Geometric-aware Descriptor for Local Feature Matching

Xiangzeng Liu, CHI WANG, Guanglu Shi et al.

Shape Abstraction via Marching Differentiable Support Functions

Sunkyung Park, Jeongmin Lee, Dongjun Lee

Shape of Motion: 4D Reconstruction from a Single Video

Qianqian Wang, Vickie Ye, Hang Gao et al.

CVPR 2025highlightarXiv:2502.20134

Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models

Itay Benou, Tammy Riklin Raviv

Similarity Memory Prior is All You Need for Medical Image Segmentation

Hao Tang, Zhiqing Guo, Liejun Wang et al.

SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment

Katrin Renz, Long Chen, Elahe Arani et al.

CVPR 2025highlightarXiv:2411.03745

Simulator HC: Regression-based Online Simulation of Starting Problem-Solution Pairs for Homotopy Continuation in Geometric Vision

Xinyue Zhang, Zijia Dai, Wanting Xu et al.

SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons

Yuanyou Xu, Zongxin Yang, Yi Yang

CVPR 2025highlightarXiv:2408.15270

SkillMimic: Learning Basketball Interaction Skills from Demonstrations

Yinhuai Wang, Qihan Zhao, Runyi Yu et al.

SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos

Yuzheng Liu, Siyan Dong, Shuzhe Wang et al.

Sliced Wasserstein Bridge for Open-Vocabulary Video Instance Segmentation

Zheyun Qin, Deng Yu, Chuanchen Luo et al.

CVPR 2025highlightarXiv:2507.22264

SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

Shaoan Xie, Lingjing Kong, Yujia Zheng et al.

SMSTracker: Tri-path Score Mask Sigma Fusion for Multi-Modal Tracking

Sixian Chan, Zedong Li, Xiaoqin Zhang et al.

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Jierun Chen, Dongting Hu, Xijie Huang et al.

CVPR 2025highlightarXiv:2412.04077

SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning

Seokju Yun, Seunghye Chae, Dongheon Lee et al.

Sonata: Self-Supervised Learning of Reliable Point Representations

Xiaoyang Wu, Daniel DeTone, Duncan Frost et al.

SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding

Mingfei Chen, Israel D. Gebru, Ishwarya Ananthabhotla et al.

SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts

Shijia Zhao, Qiming Xia, Xusheng Guo et al.

Sparfels: Fast Reconstruction from Sparse Unposed Imagery

Shubhendu Jena, Amine Ouasfi, Mae Younes et al.

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models

Xingrui Wang, Wufei Ma, Tiezheng Zhang et al.

SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models

Wufei Ma, Luoxin Ye, Nessa McWeeney et al.

Spatio-Spectral Pattern Illumination for Direct and Indirect Separation from a Single Hyperspectral Image

Shin Ishihara, Imari Sato

SpecTRe-GS: Modeling Highly Specular Surfaces with Reflected Nearby Objects by Tracing Rays in 3D Gaussian Splatting

Jiajun Tang, Fan Fei, Zhihao Li et al.

CVPR 2025highlightarXiv:2411.15482

SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving

Su Sun, Cheng Zhao, Zhuoyang Sun et al.

SRefiner: Soft-Braid Attention for Multi-Agent Trajectory Refinement

Liwen Xiao, Zhiyu Pan, Zhicheng Wang et al.

CVPR 2025highlightarXiv:2409.17993

SSHNet: Unsupervised Cross-modal Homography Estimation via Problem Reformulation and Split Optimization

Junchen Yu, Siyuan Cao, Runmin Zhang et al.

StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth

Zheng Zhang, Lihe Yang, Tianyu Yang et al.

Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with Two-Stage Depth Diffusion

Mutian Xu, Chongjie Ye, Haolin Liu et al.

Stereo Any Video: Temporally Consistent Stereo Matching

Junpeng Jing, Weixun Luo, Ye Mao et al.

STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models

Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer et al.

CVPR 2025highlightarXiv:2504.02823

STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection

Divya Velayudhan, Abdelfatah Ahmed, Mohamad Alansari et al.

Stochastic Gradient Estimation for Higher-Order Differentiable Rendering

Zican Wang, Michael Fischer, Tobias Ritschel

StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data

Yixu Wang, Yan Teng, Yingchun Wang et al.

Straighten Viscous Rectified Flow via Noise Optimization

Jimin Dai, Jiexi Yan, Jian Yang et al.

CVPR 2025highlightarXiv:2503.18671

Structure-Aware Correspondence Learning for Relative Pose Estimation

Yihan Chen, Wenfei Yang, Huan Ren et al.

Structured 3D Latents for Scalable and Versatile 3D Generation

Jianfeng XIANG, Zelong Lv, Sicheng Xu et al.

Structure from Collision

Takuhiro Kaneko

Structure-from-Motion with a Non-Parametric Camera Model

Yihan Wang, Linfei Pan, Marc Pollefeys et al.

Structure Matters: Revisiting Boundary Refinement in Video Object Segmentation

Guanyi Qin, Ziyue Wang, Daiyun Shen et al.

Style-Editor: Text-driven Object-centric Style Editing

Jihun Park, Jongmin Gim, Kyoungmin Lee et al.

Style Evolving along Chain-of-Thought for Unknown-Domain Object Detection

Zihao Zhang, Aming Wu, Yahong Han

CVPR 2025highlightarXiv:2501.11319

StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer

ruojun xu, Weijie Xi, Xiaodi Wang et al.

ICCV 2025highlightarXiv:2510.08458

SummDiff: Generative Modeling of Video Summarization with Diffusion

Kwanseok Kim, Jaehoon Hahm, Sumin Kim et al.

SuperEvent: Cross-Modal Learning of Event-based Keypoint Detection for SLAM

Yannick Burkhardt, Simon Schaefer, Stefan Leutenegger

Super Resolved Imaging with Adaptive Optics

Robin Swanson, Esther Y. H. Lin, Masen Lamb et al.

Supervising Sound Localization by In-the-wild Egomotion

Anna Min, Ziyang Chen, Hang Zhao et al.

SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity

Ke Ma, Jiaqi Tang, Bin Guo et al.