Most Cited CVPR &quot;planning&quot; Papers

CVPR 2024arXiv:2405.19819

#3802

Gated Fields: Learning Scene Reconstruction from Gated Videos

Andrea Ramazzina, Stefanie Walz, Pragyan Dahal et al.

CVPR 2025arXiv:2503.10257

#3803

AMR-Transformer: Enabling Efficient Long-range Interaction for Complex Neural Fluid Simulation

Zeyi Xu, Jinfan Liu, Kuangxu Chen et al.

CVPR 2025arXiv:2412.17741

#3804

Reasoning to Attend: Try to Understand How <SEG> Token Works

Rui Qian, Xin Yin, Dejing Dou

CVPR 2025arXiv:2412.05161

#3805

DNF: Unconditional 4D Generation with Dictionary-based Neural Fields

Xinyi Zhang, Naiqi Li, Angela Dai

CVPR 2025arXiv:2503.19776

#3806

Resilient Sensor Fusion Under Adverse Sensor Failures via Multi-Modal Expert Fusion

Konyul Park, Yecheol Kim, Daehun Kim et al.

CVPR 2025arXiv:2411.11361

#3807

Scalable Autoregressive Monocular Depth Estimation

Jinhong Wang, Jintai Chen, Jian liu et al.

#3808

POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement Learning

Jiayi Guan, Li Shen, Ao Zhou et al.

CVPR 2025arXiv:2503.16096

#3809

MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures

Lucas Morin, Valery Weber, Ahmed Nassar et al.

CVPR 2025arXiv:2504.16023

#3810

PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning

Song Wang, Xiaolu Liu, Lingdong Kong et al.

CVPR 2025arXiv:2410.11374

#3811

Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing

Yoonjeon Kim, Soohyun Ryu, Yeonsung Jung et al.

CVPR 2025arXiv:2506.01037

#3812

Self-supervised ControlNet with Spatio-Temporal Mamba for Real-world Video Super-resolution

Shijun Shi, Jing Xu, Lijing Lu et al.

#3813

On the Out-Of-Distribution Generalization of Large Multimodal Models

Xingxuan Zhang, Jiansheng Li, Wenjing Chu et al.

CVPR 2025highlightarXiv:2409.17993

#3814

SSHNet: Unsupervised Cross-modal Homography Estimation via Problem Reformulation and Split Optimization

Junchen Yu, Siyuan Cao, Runmin Zhang et al.

CVPR 2025arXiv:2504.16030

#3815

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Joya Chen, Yiqi Lin, Ziyun Zeng et al.

CVPR 2025arXiv:2503.22328

#3816

VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow

Yancong Lin, Shiming Wang, Liangliang Nan et al.

CVPR 2024arXiv:2404.00524

#3817

TexVocab: Texture Vocabulary-conditioned Human Avatars

Yuxiao Liu, Zhe Li, Yebin Liu et al.

CVPR 2025arXiv:2505.24476

#3818

Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model

Yuting Zhang, Hao Lu, Qingyong Hu et al.

CVPR 2025arXiv:2406.05826

#3819

PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection

Wei Li, Pin-Yu Chen, Sijia Liu et al.

CVPR 2025arXiv:2411.18082

#3820

Dual-view X-ray Detection: Can AI Detect Prohibited Items from Dual-view X-ray Images like Humans?

Renshuai Tao, Haoyu Wang, Yuzhe Guo et al.

CVPR 2025highlightarXiv:2503.15019

#3821

Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene

Shengqiong Wu, Hao Fei, Jingkang Yang et al.

#3822

Dual-Agent Optimization framework for Cross-Domain Few-Shot Segmentation

Zhaoyang Li, Yuan Wang, Wangkai Li et al.

CVPR 2024arXiv:2404.06350

#3823

Rolling Shutter Correction with Intermediate Distortion Flow Estimation

Mingdeng Cao, Sidi Yang, Yujiu Yang et al.

CVPR 2025arXiv:2509.00649

#3824

MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation

Aviral Chharia, Wenbo Gou, Haoye Dong

CVPR 2025arXiv:2504.05457

#3825

Taxonomy-Aware Evaluation of Vision-Language Models

Vésteinn Snæbjarnarson, Kevin Du, Niklas Stoehr et al.

CVPR 2025arXiv:2504.04756

#3826

Continuous Locomotive Crowd Behavior Generation

Inhwan Bae, Junoh Lee, Hae-Gon Jeon

CVPR 2025arXiv:2412.00965

#3827

Token Cropr: Faster ViTs for Quite a Few Tasks

Benjamin Bergner, Christoph Lippert, Aravindh Mahendran

CVPR 2025arXiv:2503.03651

#3828

DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles

Rui Zhao, Weijia Mao, Mike Zheng Shou

CVPR 2025arXiv:2506.03737

#3829

ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices

Hao Yu, Tangyu Jiang, Shuning Jia et al.

CVPR 2025arXiv:2504.21749

#3830

Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space

Leonhard Sommer, Olaf Dünkel, Christian Theobalt et al.

CVPR 2025highlightarXiv:2412.05826

#3831

Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features

Yuanbo Xiangli, Ruojin Cai, Hanyu Chen et al.

CVPR 2025arXiv:2503.21777

#3832

Test-Time Visual In-Context Tuning

Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr et al.

CVPR 2025arXiv:2504.20468

#3833

Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception

Yuanchen Wu, Lu Zhang, Hang Yao et al.

#3834

Diffusion Model is Effectively Its Own Teacher

Xinyin Ma, Runpeng Yu, Songhua Liu et al.

#3835

SnowMaster: Comprehensive Real-world Image Desnowing via MLLM with Multi-Model Feedback Optimization

Jianyu LAI, Sixiang Chen, yunlong lin et al.

CVPR 2025highlightarXiv:2411.18180

#3836

DistinctAD: Distinctive Audio Description Generation in Contexts

Bo Fang, Wenhao Wu, Qiangqiang Wu et al.

CVPR 2025highlightarXiv:2506.06898

#3837

NSD-Imagery: A Benchmark Dataset for Extending fMRI Vision Decoding Methods to Mental Imagery

Reese Kneeland, Paul Scotti, Ghislain St-Yves et al.

CVPR 2025highlightarXiv:2412.06191

#3838

Event Fields: Capturing Light Fields at High Speed, Resolution, and Dynamic Range

Ziyuan Qu, Zihao Zou, Vivek Boominathan et al.

#3839

Radio Frequency Ray Tracing with Neural Object Representation for Enhanced RF Modeling

Xingyu Chen, Zihao Feng, Kun Qian et al.

CVPR 2025highlightarXiv:2505.04656

#3840

MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation

Zilong Chen, Yikai Wang, Wenqiang Sun et al.

CVPR 2025arXiv:2404.10620

#3841

PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction

Sinisa Stekovic, Arslan Artykov, Stefan Ainetter et al.

CVPR 2025arXiv:2503.04718

#3842

Floxels: Fast Unsupervised Voxel Based Scene Flow Estimation

David T. Hoffmann, Syed Haseeb Raza, Hanqiu Jiang et al.

CVPR 2025arXiv:2411.17786

#3843

DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching

Emanuele Aiello, Umberto Michieli, Diego Valsesia et al.

#3844

GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning

Guangyan Chen, Te Cui, Meiling Wang et al.

CVPR 2024arXiv:2404.02176

#3845

Versatile Navigation Under Partial Observability via Value-guided Diffusion Policy

Gengyu Zhang, Hao Tang, Yan Yan

CVPR 2025highlightarXiv:2403.11295

#3846

Order-One Rolling Shutter Cameras

Marvin Anas Hahn, Kathlén Kohn, Orlando Marigliano et al.

CVPR 2025arXiv:2411.14797

#3847

Continual SFT Matches Multimodal RLHF with Negative Supervision

Ke Zhu, Yu Wang, Yanpeng Sun et al.

CVPR 2025arXiv:2505.21755

#3848

FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering

Chengyue Huang, Brisa Maneechotesuwan, Shivang Chopra et al.

CVPR 2025arXiv:2503.19897

#3849

Scaling Down Text Encoders of Text-to-Image Diffusion Models

Lifu Wang, Daqing Liu, Xinchen Liu et al.

#3850

Noise-Resistant Video Anomaly Detection via RGB Error-Guided Multiscale Predictive Coding and Dynamic Memory

Han Hu, Wenli Du, Peng Liao et al.

CVPR 2024arXiv:2404.17620

#3851

Neural Modes: Self-supervised Learning of Nonlinear Modal Subspaces

Jiahong Wang, Yinwei DU, Stelian Coros et al.

CVPR 2025arXiv:2411.14169

#3852

Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting

Jingyi Xu, Xieyuanli Chen, Junyi Ma et al.

CVPR 2025arXiv:2503.02593

#3853

CMMLoc: Advancing Text-to-PointCloud Localization with Cauchy-Mixture-Model Based Framework

Yanlong Xu, Haoxuan Qu, Jun Liu et al.

CVPR 2025arXiv:2411.12817

#3854

What Makes a Good Dataset for Knowledge Distillation?

Logan Frank, Jim Davis

CVPR 2024arXiv:2401.02937

#3855

Locally Adaptive Neural 3D Morphable Models

Michail Tarasiou, Rolandos Alexandros Potamias, Eimear O' Sullivan et al.

CVPR 2024arXiv:2310.08332

#3856

Real-Time Neural BRDF with Spherically Distributed Primitives

Yishun Dou, Zhong Zheng, Qiaoqiao Jin et al.

CVPR 2025arXiv:2411.13549

#3857

Generating 3D-Consistent Videos from Unposed Internet Photos

Gene Chou, Kai Zhang, Sai Bi et al.

CVPR 2025arXiv:2503.15110

#3858

GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation

Ziqin Huang, Gu Wang, Chenyangguang Zhang et al.

CVPR 2025arXiv:2411.11223

#3859

Efficient Transfer Learning for Video-language Foundation Models

Haoxing Chen, Zizheng Huang, Yan Hong et al.

CVPR 2024arXiv:2401.10217

#3860

Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions

Namitha Padmanabhan, Matthew A Gwilliam, Pulkit Kumar et al.

CVPR 2025arXiv:2412.04227

#3861

Foundations of the Theory of Performance-Based Ranking

Sébastien Piérard, Anaïs Halin, Anthony Cioppa et al.

#3862

Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking

Hongkai Wei, YANG YANG, Shijie Sun et al.

CVPR 2025arXiv:2411.12592

#3863

SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction

Yutao Tang, Yuxiang Guo, Deming Li et al.

CVPR 2025arXiv:2503.12460

#3864

Exploring Contextual Attribute Density in Referring Expression Counting

Zhicheng Wang, Zhiyu Pan, Zhan Peng et al.

CVPR 2025arXiv:2412.04317

#3865

FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression

Bo Tong, Bokai Lai, Yiyi Zhou et al.

CVPR 2025highlightarXiv:2506.07087

#3866

UCOD-DPL: Unsupervised Camouflaged Object Detection via Dynamic Pseudo-label Learning

Weiqi Yan, Lvhai Chen, Huaijia Kou et al.

CVPR 2025arXiv:2503.23702

#3867

3D Dental Model Segmentation with Geometrical Boundary Preserving

Shufan Xi, Zexian Liu, Junlin Chang et al.

CVPR 2025arXiv:2503.20936

#3868

LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos

Daniel Etaat, Dvij Rajesh Kalaria, Nima Rahmanian et al.

CVPR 2025highlightarXiv:2503.20354

#3869

SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity

Ke Ma, Jiaqi Tang, Bin Guo et al.

CVPR 2025arXiv:2412.11767

#3870

IDEA-Bench: How Far are Generative Models from Professional Designing?

Chen Liang, Lianghua Huang, Jingwu Fang et al.

CVPR 2025arXiv:2412.11423

#3871

Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models

Namhyuk Ahn, KiYoon Yoo, Wonhyuk Ahn et al.

CVPR 2025arXiv:2503.18074

#3872

WISE: A Framework for Gigapixel Whole-Slide-Image Lossless Compression

Yu Mao, Jun Wang, Nan Guan et al.

CVPR 2025arXiv:2509.00332

#3873

CryptoFace: End-to-End Encrypted Face Recognition

Wei Ao, Vishnu Naresh Boddeti

CVPR 2025arXiv:2410.05869

#3874

Believing is Seeing: Unobserved Object Detection using Generative Models

Subhransu S. Bhattacharjee, Dylan Campbell, Rahul Shome

CVPR 2024arXiv:2403.18554

#3875

CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection

Jiayi Zhu, Qing Guo, Felix Juefei Xu et al.

CVPR 2025arXiv:2503.12840

#3876

Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics

Chen Liu, Liying Yang, Peike Li et al.

CVPR 2024arXiv:2403.04368

#3877

Learning to Remove Wrinkled Transparent Film with Polarized Prior

Jiaqi Tang, RUIZHENG WU, Xiaogang Xu et al.

CVPR 2025arXiv:2503.06369

#3878

Spectral State Space Model for Rotation-Invariant Visual Representation Learning

Sahar Dastani, Ali Bahri, Moslem Yazdanpanah et al.

CVPR 2025arXiv:2411.15255

#3879

OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction

Gehui Li, Bin Chen, Chen Zhao et al.

#3880

DaCapo: Score Distillation as Stacked Bridge for Fast and High-quality 3D Editing

Yufei Huang, Bangyan Liao, Yuqi Hu et al.

CVPR 2025arXiv:2504.01428

#3881

MuTri: Multi-view Tri-alignment for OCT to OCTA 3D Image Translation

zhuangzhuang chen, hualiang wang, Chubin Ou et al.

CVPR 2024arXiv:2403.17502

#3882

SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder

Dihan Zheng, Yihang Zou, Xiaowen Zhang et al.

CVPR 2024arXiv:2403.15139

#3883

Deep Generative Model based Rate-Distortion for Image Downscaling Assessment

yuanbang liang, Bhavesh Garg, Paul L. Rosin et al.

CVPR 2025arXiv:2503.00325

#3884

CADRef: Robust Out-of-Distribution Detection via Class-Aware Decoupled Relative Feature Leveraging

Zhiwei Ling, Yachen Chang, Hailiang Zhao et al.

CVPR 2025arXiv:2503.18703

#3885

Channel Consistency Prior and Self-Reconstruction Strategy Based Unsupervised Image Deraining

Guanglu Dong, Tianheng Zheng, Yuanzhouhan Cao et al.

CVPR 2024arXiv:2403.02769

#3886

HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes

Yichen Yao, Zimo Jiang, YUJING SUN et al.

CVPR 2025highlightarXiv:2503.08306

#3887

Reasoning in Visual Navigation of End-to-end Trained Agents: A Dynamical Systems Approach

Steeven JANNY, Hervé Poirier, Leonid Antsfeld et al.

CVPR 2025arXiv:2411.11288

#3888

Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition

Yang Chen, Jingcai Guo, Song Guo et al.

CVPR 2025arXiv:2503.18406

#3889

Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning

Sherry X. Chen, Misha Sra, Pradeep Sen

CVPR 2024arXiv:2403.13293

#3890

Building Optimal Neural Architectures using Interpretable Knowledge

Keith Mills, Fred Han, Mohammad Salameh et al.

CVPR 2025arXiv:2411.02818

#3891

LiVOS: Light Video Object Segmentation with Gated Linear Matching

Qin Liu, Jianfeng Wang, Zhengyuan Yang et al.

CVPR 2025arXiv:2504.08125

#3892

Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects

Shalini Maiti, Lourdes Agapito, Filippos Kokkinos

CVPR 2025arXiv:2504.00247

#3893

MultiMorph: On-demand Atlas Construction

Mazdak Abulnaga, Andrew Hoopes, Neel Dey et al.

CVPR 2024highlightarXiv:2406.05533

#3894

PAPR in Motion: Seamless Point-level 3D Scene Interpolation

Shichong Peng, Yanshu Zhang, Ke Li

CVPR 2025arXiv:2503.23241

#3895

Geometry in Style: 3D Stylization via Surface Normal Deformation

Nam Anh Dinh, Itai Lang, Hyunwoo Kim et al.

CVPR 2025arXiv:2503.07481

#3896

Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References

Yitang Li, Mingxian Lin, Zhuo Lin et al.

CVPR 2025arXiv:2503.01175

#3897

HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation

Hongye Cheng, Tianyu Wang, guangsi shi et al.

CVPR 2024arXiv:2303.12050

#3898

CurveCloudNet: Processing Point Clouds with 1D Structure

Colton Stearns, Alex Fu, Jiateng Liu et al.

CVPR 2025highlightarXiv:2507.22264

#3899

SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

Shaoan Xie, Lingjing Kong, Yujia Zheng et al.

#3900

OmniStereo: Real-time Omnidireactional Depth Estimation with Multiview Fisheye Cameras

Jiaxi Deng, Yushen Wang, Haitao Meng et al.

CVPR 2025highlightarXiv:2503.17142

#3901

Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models

Davide Berasi, Matteo Farina, Massimiliano Mancini et al.

CVPR 2024arXiv:2403.13870

#3902

ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations

Rwiddhi Chakraborty, Adrian de Sena Sletten, Michael C. Kampffmeyer

CVPR 2025arXiv:2411.16468

#3903

Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency

Yutong Wang, Jiajie Teng, Jiajiong Cao et al.

CVPR 2025arXiv:2412.11519

#3904

LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model

Xi Wang, Hongzhen Li, Heng Fang et al.

CVPR 2024arXiv:2405.18810

#3905

UniPTS: A Unified Framework for Proficient Post-Training Sparsity

JingJing Xie, Yuxin Zhang, Mingbao Lin et al.

#3906

HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver

Cong Wei, Haoxian Tan, Yujie Zhong et al.

CVPR 2025arXiv:2503.01167

#3907

Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data

Haoxin Li, Boyang Li

CVPR 2025arXiv:2503.10740

#3908

Subnet-Aware Dynamic Supernet Training for Neural Architecture Search

Jeimin Jeon, Youngmin Oh, Junghyup Lee et al.

#3909

Neural Hierarchical Decomposition for Single Image Plant Modeling

Zhihao Liu, Zhanglin Cheng, Naoto Yokoya

CVPR 2025arXiv:2502.19955

#3910

RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges

Thibaut Loiseau, Guillaume Bourmaud

CVPR 2024arXiv:2406.04961

#3911

Multiplane Prior Guided Few-Shot Aerial Scene Rendering

Zihan Gao, Licheng Jiao, Lingling Li et al.

CVPR 2024arXiv:2404.17184

#3912

Low-Rank Knowledge Decomposition for Medical Foundation Models

Yuhang Zhou, Haolin li, Siyuan Du et al.

#3913

BiLoRA: Almost-Orthogonal Parameter Spaces for Continual Learning

Hao Zhu, Yifei Zhang, Junhao Dong et al.

CVPR 2025arXiv:2503.16406

#3914

VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness

SeungJu Cha, Kwanyoung Lee, Ye-Chan Kim et al.

CVPR 2025arXiv:2503.00548

#3915

Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing

Yanjun Li, Zhaoyang Li, Honghui Chen et al.

CVPR 2024arXiv:2312.02528

#3916

Towards Automatic Power Battery Detection: New Challenge Benchmark Dataset and Baseline

Xiaoqi Zhao, Youwei Pang, Zhenyu Chen et al.

CVPR 2025arXiv:2411.14762

#3917

Efficient Long Video Tokenization via Coordinate-based Patch Reconstruction

Huiwon Jang, Sihyun Yu, Jinwoo Shin et al.

CVPR 2025arXiv:2503.18987

#3918

Balanced Direction from Multifarious Choices: Arithmetic Meta-Learning for Domain Generalization

Xiran Wang, Jian Zhang, Lei Qi et al.

CVPR 2025arXiv:2503.17984

#3919

Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting

Maochen Yang, Zekun Li, Jian Zhang et al.

CVPR 2025arXiv:2503.18267

#3920

Enhancing Dataset Distillation via Non-Critical Region Refinement

Minh-Tuan Tran, Trung Le, Xuan-May Le et al.

CVPR 2025arXiv:2505.05505

#3921

Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation

Yiming Qin, Zhu Xu, Yang Liu

#3922

Robust-MVTON: Learning Cross-Pose Feature Alignment and Fusion for Robust Multi-View Virtual Try-On

Nannan Zhang, Yijiang Li, Dong Du et al.

CVPR 2025arXiv:2412.09910

#3923

Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attack on Breast Ultrasound Images

Yasamin Medghalchi, Moein Heidari, Clayton Allard et al.

CVPR 2024arXiv:2407.04260

#3924

Efficient Detection of Long Consistent Cycles and its Application to Distributed Synchronization

Shaohan Li, Yunpeng Shi, Gilad Lerman

CVPR 2024arXiv:2404.00979

#3925

PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation

Jinfeng Xu, Siyuan Yang, Xianzhi Li et al.

CVPR 2025highlightarXiv:2411.17313

#3926

Event Ellipsometer: Event-based Mueller-Matrix Video Imaging

Ryota Maeda, Yunseong Moon, Seung-Hwan Baek

CVPR 2025arXiv:2411.17249

#3927

Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors

Zhengfei Kuang, Tianyuan Zhang, Kai Zhang et al.

CVPR 2024arXiv:2312.13328

#3928

NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis

Zinuo You, Andreas Geiger, Anpei Chen

CVPR 2025highlightarXiv:2411.10504

#3929

USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting

Kang Chen, Jiyuan Zhang, Zecheng Hao et al.

CVPR 2025arXiv:2503.00147

#3930

Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance

Sanchayan Santra, Vishal Chudasama, Pankaj Wasnik et al.

CVPR 2025arXiv:2502.20981

#3931

Distribution Prototype Diffusion Learning for Open-set Supervised Anomaly Detection

Fuyun Wang, Tong Zhang, Yuanzhi Wang et al.

CVPR 2025arXiv:2504.02397

#3932

Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval

Boseung Jeong, Jicheol Park, Sungyeon Kim et al.

#3933

Simplification Is All You Need against Out-of-Distribution Overconfidence

Keke Tang, Chao Hou, Weilong Peng et al.

#3934

Unity in Diversity: Video Editing via Gradient-Latent Purification

Junyu Gao, Kunlin Yang, Xuan Yao et al.

CVPR 2025arXiv:2505.06218

#3935

Let Humanoids Hike! Integrative Skill Development on Complex Trails

Kwan-Yee Lin, Stella X. Yu

CVPR 2025arXiv:2503.21150

#3936

The Devil is in Low-Level Features for Cross-Domain Few-Shot Segmentation

Yuhan Liu, Yixiong Zou, Yuhua Li et al.

CVPR 2024arXiv:2312.10634

#3937

Anomaly Score: Evaluating Generative Models and Individual Generated Images based on Complexity and Vulnerability

Jaehui Hwang, Junghyuk Lee, Jong-Seok Lee

#3938

Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual Grounding

Sai Wang, Yutian Lin, Yu Wu

CVPR 2025arXiv:2503.18817

#3939

Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations

Jeonghyeon Kim, Sangheum Hwang

#3940

Fine-grained Prototypical Voting with Heterogeneous Mixup for Semi-supervised 2D-3D Cross-modal Retrieval

Fan Zhang, Xian-Sheng Hua, Chong Chen et al.

CVPR 2025arXiv:2506.04453

#3941

Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning

Hasin Us Sami, Swapneel Sen, Amit K. Roy-Chowdhury et al.

CVPR 2025arXiv:2503.16997

#3942

Steady Progress Beats Stagnation: Mutual Aid of Foundation and Conventional Models in Mixed Domain Semi-Supervised Medical Image Segmentation

Qinghe Ma, Jian Zhang, Zekun Li et al.

CVPR 2025arXiv:2503.07390

#3943

PersonaBooth: Personalized Text-to-Motion Generation

Boeun Kim, Hea In Jeong, JungHoon Sung et al.

CVPR 2025arXiv:2501.04815

#3944

Towards Generalizable Trajectory Prediction using Dual-Level Representation Learning and Adaptive Prompting

Kaouther Messaoud, Matthieu Cord, Alex Alahi

CVPR 2025arXiv:2505.19799

#3945

A Regularization-Guided Equivariant Approach for Image Restoration

Yulu Bai, Jiahong Fu, Qi Xie et al.

CVPR 2025arXiv:2506.02781

#3946

FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts

Tongyuan Bai, Wangyuanfan Bai, Dong Chen et al.

CVPR 2025arXiv:2505.11676

#3947

DPSeg: Dual-Prompt Cost Volume Learning for Open-Vocabulary Semantic Segmentation

Ziyu Zhao, Xiaoguang Li, Lingjia Shi et al.

CVPR 2025highlightarXiv:2503.19904

#3948

Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better

Zihang Lai, Andrea Vedaldi

#3949

SlowFormer: Adversarial Attack on Compute and Energy Consumption of Efficient Vision Transformers

Navaneet K L, Soroush Abbasi Koohpayegani, Essam Sleiman et al.

CVPR 2024arXiv:2404.15707

#3950

ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-view Images

Jinseo Jeong, Junseo Koo, Qimeng Zhang et al.

CVPR 2024highlightarXiv:2403.16194

#3951

Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery

Siddharth Tourani, Ahmed Alwheibi, Arif Mahmood et al.

CVPR 2025highlightarXiv:2501.06481

#3952

Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation

Xiaoying Xing, Avinab Saha, Junfeng He et al.

CVPR 2025arXiv:2503.20282

#3953

Faster Parameter-Efficient Tuning with Token Redundancy Reduction

Kwonyoung Kim, Jungin Park, Jin Kim et al.

CVPR 2025arXiv:2412.01160

#3954

ControlFace: Harnessing Facial Parametric Control for Face Rigging

Wooseok Jang, Youngjun Hong, Geonho Cha et al.

CVPR 2025arXiv:2502.10060

#3955

DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery

Utkarsh Mall, Cheng Perng Phoo, Mia Chiquier et al.

#3956

Supervising Sound Localization by In-the-wild Egomotion

Anna Min, Ziyang Chen, Hang Zhao et al.

#3957

Towards Consistent Multi-Task Learning: Unlocking the Potential of Task-Specific Parameters

Xiaohan Qin, Xiaoxing Wang, Junchi Yan

CVPR 2025arXiv:2503.08387

#3958

Recognition-Synergistic Scene Text Editing

Zhengyao Fang, Pengyuan Lyu, Jingjing Wu et al.

CVPR 2025arXiv:2501.12216

#3959

RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression

Uri Gadot, Shie Mannor, Assaf Shocher et al.

CVPR 2025arXiv:2211.09810

#3960

Tightening Robustness Verification of MaxPool-based Neural Networks via Minimizing the Over-Approximation Zone

Yuan Xiao, Yuchen Chen, Shiqing Ma et al.

CVPR 2025arXiv:2503.19824

#3961

AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers

Jiazhi Guan, Kaisiyuan Wang, Zhiliang Xu et al.

CVPR 2025arXiv:2407.03314

#3962

BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs

Zhantao Yang, Ruili Feng, Keyu Yan et al.

#3963

Identity-Clothing Similarity Modeling for Unsupervised Clothing Change Person Re-Identification

Zhiqi Pang, Junjie Wang, Lingling Zhao et al.

CVPR 2025arXiv:2504.02579

#3964

Bridging the Gap between Gaussian Diffusion Models and Universal Quantization for Image Compression

Lucas Relic, Roberto Azevedo, Yang Zhang et al.

#3965

Sensitivity-Aware Efficient Fine-Tuning via Compact Dynamic-Rank Adaptation

Tianran Chen, Jiarui Chen, Baoquan Zhang et al.

#3966

Less Attention is More: Prompt Transformer for Generalized Category Discovery

Wei Zhang, Baopeng Zhang, Zhu Teng et al.

CVPR 2025arXiv:2404.00916

#3967

Gyro-based Neural Single Image Deblurring

Heemin Yang, Jaesung Rim, Seungyong Lee et al.

CVPR 2025arXiv:2505.16971

#3968

UniPhy: Learning a Unified Constitutive Model for Inverse Physics Simulation

Himangi Mittal, Peiye Zhuang, Hsin-Ying Lee et al.

#3969

BrepGiff: Lightweight Generation of Complex B-rep with 3D GAT Diffusion

Hao Guo, Xiaoshui Huang, Hao jiacheng et al.

CVPR 2024arXiv:2506.14263

#3970

Towards Robust Learning to Optimize with Theoretical Guarantees

Qingyu Song, Wei Lin, Juncheng Wang et al.

#3971

DriveScape: High-Resolution Driving Video Generation by Multi-View Feature Fusion

Wei Wu, Xi Guo, Weixuan TANG et al.

#3972

RivuletMLP: An MLP-based Architecture for Efficient Compressed Video Quality Enhancement

Gang He, Weiran Wang, Guancheng Quan et al.

#3973

Leveraging SD Map to Augment HD Map-based Trajectory Prediction

Zhiwei Dong, Ran Ding, Wei Li et al.

#3974

VODiff: Controlling Object Visibility Order in Text-to-Image Generation

Dong Liang, Jinyuan Jia, Yuhao Liu et al.

CVPR 2025arXiv:2503.14161

#3975

CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models

Yiqi Zhu, Ziyue Wang, Can Zhang et al.

CVPR 2025arXiv:2412.16460

#3976

Positive2Negative: Breaking the Information-Lossy Barrier in Self-Supervised Single Image Denoising

Tong Li, Lizhi Wang, Zhiyuan Xu et al.

#3977

ICP: Immediate Compensation Pruning for Mid-to-high Sparsity

Xin Luo, Fu Xueming, Zihang Jiang et al.

CVPR 2025arXiv:2510.08791

#3978

Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering

Yuanhao Zou, Zhaozheng Yin

#3979

Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection

Feng Yan, Xiaoheng Jiang, Yang Lu et al.

CVPR 2025arXiv:2504.01466

#3980

Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes

Kaiwei Zhang, Dandan Zhu, Xiongkuo Min et al.

#3981

GPAvatar: High-fidelity Head Avatars by Learning Efficient Gaussian Projections

Weiqi Feng, Dong Han, Zekang Zhou et al.

CVPR 2025arXiv:2504.15159

#3982

Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration

Junyuan Deng, Xinyi Wu, Yongxing Yang et al.

CVPR 2024arXiv:2401.03785

#3983

Identifying Important Group of Pixels using Interactions

Kosuke Sumiyasu, Kazuhiko Kawamoto, Hiroshi Kera

CVPR 2025arXiv:2505.20764

#3984

ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval

Eric Xing, Pranavi Kolouju, Robert Pless et al.

CVPR 2025highlightarXiv:2412.00932

#3985

FIction: 4D Future Interaction Prediction from Video

Kumar Ashutosh, Georgios Pavlakos, Kristen Grauman

CVPR 2025arXiv:2503.00068

#3986

PI-HMR: Towards Robust In-bed Temporal Human Shape Reconstruction with Contact Pressure Sensing

Ziyu Wu, Yufan Xiong, Mengting Niu et al.

CVPR 2024arXiv:2312.09913

#3987

LAENeRF: Local Appearance Editing for Neural Radiance Fields

Lukas Radl, Michael Steiner, Andreas Kurz et al.

CVPR 2025arXiv:2503.16630

#3988

TriTex: Learning Texture from a Single Mesh via Triplane Semantic Features

Dana Cohen-Bar, Daniel Cohen-Or, Gal Chechik et al.

CVPR 2025arXiv:2503.21140

#3989

Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation

Junjie Chen, Weilong Chen, Yifan Zuo et al.

#3990

FedCALM: Conflict-aware Layer-wise Mitigation for Selective Aggregation in Deeper Personalized Federated Learning

Hao Zheng, Zhigang Hu, Boyu Wang et al.

#3991

Spherical Manifold Guided Diffusion Model for Panoramic Image Generation

Xiancheng Sun, Mai Xu, Shengxi Li et al.

#3992

Annotation Ambiguity Aware Semi-Supervised Medical Image Segmentation

Suruchi Kumari, Pravendra Singh

CVPR 2025arXiv:2506.08964

#3993

ORIDa: Object-centric Real-world Image Composition Dataset

Jinwoo Kim, Sangmin Han, Jinho Jeong et al.

#3994

BWFormer: Building Wireframe Reconstruction from Airborne LiDAR Point Cloud with Transformer

Yuzhou Liu, Lingjie Zhu, Hanqiao Ye et al.

CVPR 2025arXiv:2503.04446

#3995

SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity

Yijie Xu, Bolun Zheng, Wei Zhu et al.

CVPR 2025arXiv:2505.10679

#3996

Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized?

Jianyang Xie, Yitian Zhao, Yanda Meng et al.

CVPR 2025arXiv:2306.11339

#3997

Masking meets Supervision: A Strong Learning Alliance

Byeongho Heo, Taekyung Kim, Sangdoo Yun et al.

CVPR 2024highlightarXiv:2403.20018

#3998

SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image

Yunhao Li, Xiaodong Wang, Ping Wang et al.

CVPR 2025arXiv:2409.09318

#3999

ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models

Yahan Tu, Rui Hu, Jitao Sang

CVPR 2025arXiv:2504.14860

#4000

Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer

Ziyi Liu, Yangcen Liu