CVPR Highlight Papers
712 papers found • Page 7 of 15
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment
Katrin Renz, Long Chen, Elahe Arani et al.
Simulator HC: Regression-based Online Simulation of Starting Problem-Solution Pairs for Homotopy Continuation in Geometric Vision
Xinyue Zhang, Zijia Dai, Wanting Xu et al.
SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons
Yuanyou Xu, Zongxin Yang, Yi Yang
SkillMimic: Learning Basketball Interaction Skills from Demonstrations
Yinhuai Wang, Qihan Zhao, Runyi Yu et al.
SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos
Yuzheng Liu, Siyan Dong, Shuzhe Wang et al.
SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
Shaoan Xie, Lingjing Kong, Yujia Zheng et al.
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Jierun Chen, Dongting Hu, Xijie Huang et al.
SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning
Seokju Yun, Seunghye Chae, Dongheon Lee et al.
Sonata: Self-Supervised Learning of Reliable Point Representations
Xiaoyang Wu, Daniel DeTone, Duncan Frost et al.
SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding
Mingfei Chen, Israel D. Gebru, Ishwarya Ananthabhotla et al.
SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts
Shijia Zhao, Qiming Xia, Xusheng Guo et al.
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models
Xingrui Wang, Wufei Ma, Tiezheng Zhang et al.
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma, Luoxin Ye, Nessa McWeeney et al.
SpecTRe-GS: Modeling Highly Specular Surfaces with Reflected Nearby Objects by Tracing Rays in 3D Gaussian Splatting
Jiajun Tang, Fan Fei, Zhihao Li et al.
SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving
Su Sun, Cheng Zhao, Zhuoyang Sun et al.
SSHNet: Unsupervised Cross-modal Homography Estimation via Problem Reformulation and Split Optimization
Junchen Yu, Siyuan Cao, Runmin Zhang et al.
STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models
Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer et al.
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
Divya Velayudhan, Abdelfatah Ahmed, Mohamad Alansari et al.
Structure-Aware Correspondence Learning for Relative Pose Estimation
Yihan Chen, Wenfei Yang, Huan Ren et al.
Structured 3D Latents for Scalable and Versatile 3D Generation
Jianfeng XIANG, Zelong Lv, Sicheng Xu et al.
Structure from Collision
Takuhiro Kaneko
Structure-from-Motion with a Non-Parametric Camera Model
Yihan Wang, Linfei Pan, Marc Pollefeys et al.
Style-Editor: Text-driven Object-centric Style Editing
Jihun Park, Jongmin Gim, Kyoungmin Lee et al.
Style Evolving along Chain-of-Thought for Unknown-Domain Object Detection
Zihao Zhang, Aming Wu, Yahong Han
StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer
ruojun xu, Weijie Xi, Xiaodi Wang et al.
Supervising Sound Localization by In-the-wild Egomotion
Anna Min, Ziyang Chen, Hang Zhao et al.
SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity
Ke Ma, Jiaqi Tang, Bin Guo et al.
Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation
Xiang Li, Zixuan Huang, Anh Thai et al.
T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting
Yifei Qian, Zhongliang Guo, Bowen Deng et al.
Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs
Yingji Zhong, Zhihao Li, Dave Zhenyu Chen et al.
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting
Jianchuan Chen, Jingchuan Hu, Gaige Wang et al.
Task-driven Image Fusion with Learnable Fusion Loss
Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.
Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding
Wenxuan Guo, Xiuwei Xu, Ziwei Wang et al.
TFCustom: Customized Image Generation with Time-Aware Frequency Feature Guidance
Mushui Liu, Dong She, Qihan Huang et al.
Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems
Song Xia, Yi Yu, Wenhan Yang et al.
The Scene Language: Representing Scenes with Programs, Words, and Embeddings
Yunzhi Zhang, Zizhang Li, Matt Zhou et al.
TIDE: Training Locally Interpretable Domain Generalization Models Enables Test-time Correction
Aishwarya Agarwal, Srikrishna Karanam, Vineet Gandhi
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Feng Liu, Shiwei Zhang, Xiaofeng Wang et al.
TinyFusion: Diffusion Transformers Learned Shallow
Gongfan Fang, Kunjun Li, Xinyin Ma et al.
TKG-DM: Training-free Chroma Key Content Generation Diffusion Model
Ryugo Morita, Stanislav Frolov, Brian Bernhard Moser et al.
Towards Autonomous Micromobility through Scalable Urban Simulation
Wayne Wu, Honglin He, Chaoyuan Zhang et al.
Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency
Yikai Wang, Chenjie Cao, Junqiu Yu et al.
Towards Explainable and Unprecedented Accuracy in Matching Challenging Finger Crease Patterns
Zhenyu Zhou, Chengdong Dong, Ajay Kumar
Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text
Guotao liang, Baoquan Zhang, Zhiyuan Wen et al.
Towards In-the-wild 3D Plane Reconstruction from a Single Image
Jiachen Liu, Rui Yu, Sili Chen et al.
Towards RAW Object Detection in Diverse Conditions
Zhong-Yu Li, Xin Jin, Bo-Yuan Sun et al.
Towards Scalable Human-aligned Benchmark for Text-guided Image Editing
Suho Ryu, Kihyun Kim, Eugene Baek et al.
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation
Rohith Peddi, Saurabh ., Ayush Abhay Shrivastava et al.
Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models
Jiacong Xu, Shao-Yuan Lo, Bardia Safaei et al.
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Zihang Lai, Andrea Vedaldi