CVPR Highlight Papers

712 papers found • Page 7 of 15

SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment

Katrin Renz, Long Chen, Elahe Arani et al.

CVPR 2025highlight
45
citations

Simulator HC: Regression-based Online Simulation of Starting Problem-Solution Pairs for Homotopy Continuation in Geometric Vision

Xinyue Zhang, Zijia Dai, Wanting Xu et al.

CVPR 2025highlightarXiv:2411.03745

SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons

Yuanyou Xu, Zongxin Yang, Yi Yang

CVPR 2025highlight

SkillMimic: Learning Basketball Interaction Skills from Demonstrations

Yinhuai Wang, Qihan Zhao, Runyi Yu et al.

CVPR 2025highlightarXiv:2408.15270
13
citations

SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos

Yuzheng Liu, Siyan Dong, Shuzhe Wang et al.

CVPR 2025highlight

SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

Shaoan Xie, Lingjing Kong, Yujia Zheng et al.

CVPR 2025highlightarXiv:2507.22264
4
citations

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Jierun Chen, Dongting Hu, Xijie Huang et al.

CVPR 2025highlight

SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning

Seokju Yun, Seunghye Chae, Dongheon Lee et al.

CVPR 2025highlightarXiv:2412.04077
8
citations

Sonata: Self-Supervised Learning of Reliable Point Representations

Xiaoyang Wu, Daniel DeTone, Duncan Frost et al.

CVPR 2025highlight
39
citations

SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding

Mingfei Chen, Israel D. Gebru, Ishwarya Ananthabhotla et al.

CVPR 2025highlight

SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts

Shijia Zhao, Qiming Xia, Xusheng Guo et al.

CVPR 2025highlight
2
citations

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models

Xingrui Wang, Wufei Ma, Tiezheng Zhang et al.

CVPR 2025highlight

SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models

Wufei Ma, Luoxin Ye, Nessa McWeeney et al.

CVPR 2025highlight

SpecTRe-GS: Modeling Highly Specular Surfaces with Reflected Nearby Objects by Tracing Rays in 3D Gaussian Splatting

Jiajun Tang, Fan Fei, Zhihao Li et al.

CVPR 2025highlight

SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving

Su Sun, Cheng Zhao, Zhuoyang Sun et al.

CVPR 2025highlightarXiv:2411.15482
10
citations

SSHNet: Unsupervised Cross-modal Homography Estimation via Problem Reformulation and Split Optimization

Junchen Yu, Siyuan Cao, Runmin Zhang et al.

CVPR 2025highlightarXiv:2409.17993
4
citations

STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models

Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer et al.

CVPR 2025highlight
12
citations

STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection

Divya Velayudhan, Abdelfatah Ahmed, Mohamad Alansari et al.

CVPR 2025highlightarXiv:2504.02823
2
citations

Structure-Aware Correspondence Learning for Relative Pose Estimation

Yihan Chen, Wenfei Yang, Huan Ren et al.

CVPR 2025highlightarXiv:2503.18671

Structured 3D Latents for Scalable and Versatile 3D Generation

Jianfeng XIANG, Zelong Lv, Sicheng Xu et al.

CVPR 2025highlight

Structure from Collision

Takuhiro Kaneko

CVPR 2025highlight

Structure-from-Motion with a Non-Parametric Camera Model

Yihan Wang, Linfei Pan, Marc Pollefeys et al.

CVPR 2025highlight

Style-Editor: Text-driven Object-centric Style Editing

Jihun Park, Jongmin Gim, Kyoungmin Lee et al.

CVPR 2025highlight

Style Evolving along Chain-of-Thought for Unknown-Domain Object Detection

Zihao Zhang, Aming Wu, Yahong Han

CVPR 2025highlightarXiv:2503.09968
1
citations

StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer

ruojun xu, Weijie Xi, Xiaodi Wang et al.

CVPR 2025highlightarXiv:2501.11319
6
citations

Supervising Sound Localization by In-the-wild Egomotion

Anna Min, Ziyang Chen, Hang Zhao et al.

CVPR 2025highlight

SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity

Ke Ma, Jiaqi Tang, Bin Guo et al.

CVPR 2025highlightarXiv:2503.20354
3
citations

Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation

Xiang Li, Zixuan Huang, Anh Thai et al.

CVPR 2025highlight
4
citations

T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting

Yifei Qian, Zhongliang Guo, Bowen Deng et al.

CVPR 2025highlightarXiv:2502.20625
8
citations

Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs

Yingji Zhong, Zhihao Li, Dave Zhenyu Chen et al.

CVPR 2025highlightarXiv:2503.05082
11
citations

TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

Jianchuan Chen, Jingchuan Hu, Gaige Wang et al.

CVPR 2025highlightarXiv:2503.17032
11
citations

Task-driven Image Fusion with Learnable Fusion Loss

Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.

CVPR 2025highlightarXiv:2412.03240
19
citations

Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding

Wenxuan Guo, Xiuwei Xu, Ziwei Wang et al.

CVPR 2025highlight

TFCustom: Customized Image Generation with Time-Aware Frequency Feature Guidance

Mushui Liu, Dong She, Qihan Huang et al.

CVPR 2025highlight
5
citations

Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems

Song Xia, Yi Yu, Wenhan Yang et al.

CVPR 2025highlight

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

Yunzhi Zhang, Zizhang Li, Matt Zhou et al.

CVPR 2025highlightarXiv:2410.16770
15
citations

TIDE: Training Locally Interpretable Domain Generalization Models Enables Test-time Correction

Aishwarya Agarwal, Srikrishna Karanam, Vineet Gandhi

CVPR 2025highlightarXiv:2411.16788

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Feng Liu, Shiwei Zhang, Xiaofeng Wang et al.

CVPR 2025highlight

TinyFusion: Diffusion Transformers Learned Shallow

Gongfan Fang, Kunjun Li, Xinyin Ma et al.

CVPR 2025highlight

TKG-DM: Training-free Chroma Key Content Generation Diffusion Model

Ryugo Morita, Stanislav Frolov, Brian Bernhard Moser et al.

CVPR 2025highlight
4
citations

Towards Autonomous Micromobility through Scalable Urban Simulation

Wayne Wu, Honglin He, Chaoyuan Zhang et al.

CVPR 2025highlight

Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency

Yikai Wang, Chenjie Cao, Junqiu Yu et al.

CVPR 2025highlight

Towards Explainable and Unprecedented Accuracy in Matching Challenging Finger Crease Patterns

Zhenyu Zhou, Chengdong Dong, Ajay Kumar

CVPR 2025highlight

Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text

Guotao liang, Baoquan Zhang, Zhiyuan Wen et al.

CVPR 2025highlight

Towards In-the-wild 3D Plane Reconstruction from a Single Image

Jiachen Liu, Rui Yu, Sili Chen et al.

CVPR 2025highlightarXiv:2506.02493
5
citations

Towards RAW Object Detection in Diverse Conditions

Zhong-Yu Li, Xin Jin, Bo-Yuan Sun et al.

CVPR 2025highlight
5
citations

Towards Scalable Human-aligned Benchmark for Text-guided Image Editing

Suho Ryu, Kihyun Kim, Eugene Baek et al.

CVPR 2025highlightarXiv:2505.00502
1
citations

Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation

Rohith Peddi, Saurabh ., Ayush Abhay Shrivastava et al.

CVPR 2025highlightarXiv:2411.13059
3
citations

Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models

Jiacong Xu, Shao-Yuan Lo, Bardia Safaei et al.

CVPR 2025highlight

Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better

Zihang Lai, Andrea Vedaldi

CVPR 2025highlightarXiv:2503.19904
3
citations