Most Cited CVPR "object categories" Papers

5,589 papers found • Page 28 of 28

#5401

SVFR: A Unified Framework for Generalized Video Face Restoration

Zhiyao Wang, Xu Chen, Chengming Xu et al.

CVPR 2025posterarXiv:2501.01235
#5402

Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution

Huan Zheng, Wencheng Han, Jianbing Shen

CVPR 2025posterarXiv:2411.03239
#5403

Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds

Mohamed Abdelsamad, Michael Ulrich, Claudius Glaeser et al.

CVPR 2025posterarXiv:2502.20316
#5404

CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image

Jingshun Huang, Haitao Lin, Tianyu Wang et al.

CVPR 2025highlightarXiv:2504.11230
#5405

Plug-and-Play PPO: An Adaptive Point Prompt Optimizer Making SAM Greater

Xueyu Liu, Rui Wang, Yexin Lai et al.

CVPR 2025poster
#5406

Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception

ruotian peng, Haiying He, Yake Wei et al.

CVPR 2025posterarXiv:2504.06666
#5407

SCAP: Transductive Test-Time Adaptation via Supportive Clique-based Attribute Prompting

Chenyu Zhang, Kunlun Xu, Zichen Liu et al.

CVPR 2025posterarXiv:2503.12866
#5408

Neuro-3D: Towards 3D Visual Decoding from EEG Signals

Zhanqiang Guo, Jiamin Wu, Yonghao Song et al.

CVPR 2025posterarXiv:2411.12248
#5409

Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency

Feng Wang, Timing Yang, Yaodong Yu et al.

CVPR 2025posterarXiv:2410.07599
#5410

WeGen: A Unified Model for Interactive Multimodal Generation as We Chat

Zhipeng Huang, Shaobin Zhuang, Canmiao Fu et al.

CVPR 2025posterarXiv:2503.01115
#5411

Reducing Class-wise Confusion for Incremental Learning with Disentangled Manifolds

Huitong Chen, Yu Wang, Yan Fan et al.

CVPR 2025posterarXiv:2503.17677
#5412

Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning

Hairui Ren, Fan Tang, He Zhao et al.

CVPR 2025posterarXiv:2504.11930
#5413

TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation

Ruineng Li, Daitao Xing, Huiming Sun et al.

CVPR 2025posterarXiv:2504.08181
#5414

Task-Aware Clustering for Prompting Vision-Language Models

Fusheng Hao, Fengxiang He, Fuxiang Wu et al.

CVPR 2025poster
#5415

Hunyuan-Portrait: Implicit Condition Control for Enhanced Portrait Animation

Zunnan Xu, Zhentao Yu, Zixiang Zhou et al.

CVPR 2025poster
#5416

MeshArt: Generating Articulated Meshes with Structure-Guided Transformers

Daoyi Gao, Mohd Yawar Nihal Siddiqui, Lei Li et al.

CVPR 2025posterarXiv:2412.11596
#5417

Non-Natural Image Understanding with Advancing Frequency-based Vision Encoders

Wang Lin, Qingsong Wang, Yueying Feng et al.

CVPR 2025poster
#5418

Navigating the Unseen: Zero-shot Scene Graph Generation via Capsule-Based Equivariant Features

Wenhuan Huang, Yi JI, guiqian zhu et al.

CVPR 2025poster
#5419

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea

Qifan Yu, Wei Chow, Zhongqi Yue et al.

CVPR 2025posterarXiv:2411.15738
#5420

RefPose: Leveraging Reference Geometric Correspondences for Accurate 6D Pose Estimation of Unseen Objects

Jaeguk Kim, Jaewoo Park, Keuntek Lee et al.

CVPR 2025posterarXiv:2505.10841
#5421

GRAE-3DMOT: Geometry Relation-Aware Encoder for Online 3D Multi-Object Tracking

Hyunseop Kim, Hyo-Jun Lee, Yonguk Lee et al.

CVPR 2025poster
#5422

Generative Modeling of Class Probability for Multi-Modal Representation Learning

JungKyoo Shin, Bumsoo Kim, Eunwoo Kim

CVPR 2025highlightarXiv:2503.17417
#5423

Unified Medical Lesion Segmentation via Self-referring Indicator

Shijie Chang, Xiaoqi Zhao, Lihe Zhang et al.

CVPR 2025poster
#5424

SGSST: Scaling Gaussian Splatting Style Transfer

Bruno Galerne, Jianling WANG, Lara Raad et al.

CVPR 2025poster
#5425

DIO: Decomposable Implicit 4D Occupancy-Flow World Model

Christopher Diehl, Quinlan Sykora, Ben Agro et al.

CVPR 2025poster
#5426

HERA: Hybrid Explicit Representation for Ultra-Realistic Head Avatars

Hongrui Cai, Yuting Xiao, Xuan Wang et al.

CVPR 2025poster
#5427

Hierarchical Adaptive Filtering Network for Text Image Specular Highlight Removal

Zhi Jiang, Jingbo Hu, Ling Zhang et al.

CVPR 2025poster
#5428

Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation

Sayak Nag, Udita Ghosh, Calvin-Khang Ta et al.

CVPR 2025posterarXiv:2503.13947
#5429

Move-in-2D: 2D-Conditioned Human Motion Generation

Hsin-Ping Huang, Yang Zhou, Jui-Hsien Wang et al.

CVPR 2025posterarXiv:2412.13185
#5430

Boosting Domain Incremental Learning: Selecting the Optimal Parameters is All You Need

Qiang Wang, Xiang Song, Yuhang He et al.

CVPR 2025posterarXiv:2505.23744
#5431

Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction

Wenke Xia, Ruoxuan Feng, Dong Wang et al.

CVPR 2025posterarXiv:2504.14588
#5432

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

Wanhua Li, Renping Zhou, Jiawei Zhou et al.

CVPR 2025posterarXiv:2503.10437
#5433

Quaffure: Real-Time Quasi-Static Neural Hair Simulation

Tuur Stuyck, Gene Wei-Chin Lin, Egor Larionov et al.

CVPR 2025posterarXiv:2412.10061
#5434

Implicit Bias Injection Attacks against Text-to-Image Diffusion Models

Huayang Huang, Xiangye Jin, Jiaxu Miao et al.

CVPR 2025posterarXiv:2504.01819
#5435

Reversing Flow for Image Restoration

Haina Qin, Wenyang Luo, Bing Li et al.

CVPR 2025posterarXiv:2506.16961
#5436

MTADiffusion: Mask Text Alignment Diffusion Model for Object Inpainting

jun huang, Ting Liu, Yihang Wu et al.

CVPR 2025posterarXiv:2506.23482
#5437

DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh

Jingyu Zhuang, Di Kang, Linchao Bao et al.

CVPR 2025posterarXiv:2411.15205
#5438

Open-Canopy: Towards Very High Resolution Forest Monitoring

Fajwel Fogel, Yohann PERRON, Nikola Besic et al.

CVPR 2025highlightarXiv:2407.09392
#5439

S2D-LFE: Sparse-to-Dense Light Field Event Generation

Yutong Liu, Wenming Weng, Yueyi Zhang et al.

CVPR 2025poster
#5440

Beyond Generation: A Diffusion-based Low-level Feature Extractor for Detecting AI-generated Images

Nan Zhong, Haoyu Chen, Yiran Xu et al.

CVPR 2025poster
#5441

Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation

Pu Cao, Feng Zhou, Lu Yang et al.

CVPR 2025posterarXiv:2312.08195
#5442

MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations

Ziyang Zhang, Yang Yu, Yucheng Chen et al.

CVPR 2025posterarXiv:2503.01019
#5443

Inference-Scale Complexity in ANN-SNN Conversion for High-Performance and Low-Power Applications

Tong Bu, Maohua Li, Zhaofei Yu

CVPR 2025posterarXiv:2409.03368
#5444

GazeGene: Large-scale Synthetic Gaze Dataset with 3D Eyeball Annotations

Yiwei Bao, Zhiming Wang, Feng Lu

CVPR 2025poster
#5445

Multirate Neural Image Compression with Adaptive Lattice Vector Quantization

Hao Xu, Xiaolin Wu, Xi Zhang

CVPR 2025highlight
#5446

VideoGEM: Training-free Action Grounding in Videos

Felix Vogel, Walid Bousselham, Anna Kukleva et al.

CVPR 2025posterarXiv:2503.20348
#5447

SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Aleksei Bokhovkin, Quan Meng, Shubham Tulsiani et al.

CVPR 2025posterarXiv:2412.01801
#5448

DefMamba: Deformable Visual State Space Model

Leiye Liu, Miao Zhang, Jihao Yin et al.

CVPR 2025posterarXiv:2504.05794
#5449

PEER Pressure: Model-to-Model Regularization for Single Source Domain Generalization

Dongkyu Cho, Inwoo Hwang, Sanghack Lee

CVPR 2025posterarXiv:2505.12745
#5450

Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks

Han Wang, Gang Wang, Huan Zhang

CVPR 2025posterarXiv:2411.16721
#5451

Less is More: Efficient Image Vectorization with Adaptive Parameterization

Kaibo Zhao, Liang Bao, Yufei Li et al.

CVPR 2025poster
#5452

Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Wenbin An, Feng Tian, Sicong Leng et al.

CVPR 2025posterarXiv:2406.12718
#5453

PERSE: Personalized 3D Generative Avatars from A Single Portrait

Hyunsoo Cha, Inhee Lee, Hanbyul Joo

CVPR 2025posterarXiv:2412.21206
#5454

Towards Explainable and Unprecedented Accuracy in Matching Challenging Finger Crease Patterns

Zhenyu Zhou, Chengdong Dong, Ajay Kumar

CVPR 2025highlight
#5455

Animate and Sound an Image

Xihua Wang, Ruihua Song, Chongxuan Li et al.

CVPR 2025poster
#5456

Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration

Yiyang Chen, Tianyu Ding, Lei Wang et al.

CVPR 2025poster
#5457

Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning

Xueyi Ke, Satoshi Tsutsui, Yayun Zhang et al.

CVPR 2025posterarXiv:2501.05205
#5458

BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance

Xin Ye, Burhan Yaman, Sheng Cheng et al.

CVPR 2025highlightarXiv:2502.19694
#5459

Stop Learning it all to Mitigate Visual Hallucination, Focus on the Hallucination Target.

Dokyoon Yoon, Youngsook Song, Woomyoung Park

CVPR 2025posterarXiv:2506.11417
#5460

LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians

Jiamin WU, Kenkun Liu, Han Gao et al.

CVPR 2025posterarXiv:2404.16323
#5461

Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction

Dubing Chen, Huan Zheng, Jin Fang et al.

CVPR 2025posterarXiv:2504.12959
#5462

Star with Bilinear Mapping

Zelin Peng, Yu Huang, Zhengqin Xu et al.

CVPR 2025poster
#5463

OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction

Gehui Li, Bin Chen, Chen Zhao et al.

CVPR 2025posterarXiv:2411.15255
#5464

ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting

Guo Junfu, Yu Xin, Gaoyi Liu et al.

CVPR 2025posterarXiv:2503.08135
#5465

Motions as Queries: One-Stage Multi-Person Holistic Human Motion Capture

Kenkun Liu, Yurong Fu, Weihao Yuan et al.

CVPR 2025poster
#5466

Large-scale Multi-view Tensor Clustering with Implicit Linear Kernels

Jiyuan Liu, Xinwang Liu, chuankun Li et al.

CVPR 2025poster
#5467

Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation

Andrea Maracani, Savas Ozkan, Sijun Cho et al.

CVPR 2025posterarXiv:2503.16184
#5468

Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation

Zhuoran ZHAO, Linlin Yang, Pengzhan Sun et al.

CVPR 2025posterarXiv:2503.19307
#5469

Zero-Shot 4D Lidar Panoptic Segmentation

Yushan Zhang, Aljoša Ošep, Laura Leal-Taixe et al.

CVPR 2025posterarXiv:2504.00848
#5470

POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation

Lanyun Zhu, Tianrun Chen, Qianxiong Xu et al.

CVPR 2025posterarXiv:2504.00640
#5471

HyperNet Fields: Efficiently Training Hypernetworks without Ground Truth by Learning Weight Trajectories

Eric Hedlin, Munawar Hayat, Fatih Porikli et al.

CVPR 2025posterarXiv:2412.17040
#5472

RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection

Yunfei Long, Abhinav Kumar, Xiaoming Liu et al.

CVPR 2025posterarXiv:2504.09086
#5473

IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification

Yuhao Wang, Yongfeng Lv, Pingping Zhang et al.

CVPR 2025posterarXiv:2503.10324
#5474

MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices

Jianwen Jiang, Gaojie Lin, Zhengkun Rong et al.

CVPR 2025posterarXiv:2407.05712
#5475

Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising

Yongli Xiang, Ziming Hong, Lina Yao et al.

CVPR 2025posterarXiv:2503.17198
#5476

Joint Vision-Language Social Bias Removal for CLIP

Haoyu Zhang, Yangyang Guo, Mohan Kankanhalli

CVPR 2025posterarXiv:2411.12785
#5477

CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning

Yang Yue, Yulin Wang, Chenxin Tao et al.

CVPR 2025posterarXiv:2504.13820
#5478

Knowledge Bridger: Towards Training-Free Missing Modality Completion

Guanzhou Ke, Shengfeng He, Xiao-Li Wang et al.

CVPR 2025posterarXiv:2502.19834
#5479

SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling

Qi Zhu, Jiangwei Lao, Deyi Ji et al.

CVPR 2025poster
#5480

AdaDARE-gamma: Balancing Stability and Plasticity in Multi-modal LLMs through Efficient Adaptation

Jingyi Xie, Jintao Yang, Zhunchen Luo et al.

CVPR 2025poster
#5481

Fingerprinting Denoising Diffusion Probabilistic Models

Huan Teng, Yuhui Quan, Chengyu Wang et al.

CVPR 2025poster
#5482

Query Efficient Black-Box Visual Prompting with Subspace Learning

Haozhen Zhang, Zhaogeng Liu, Hualin Zhang et al.

CVPR 2025poster
#5483

Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses

Yongfan Liu, Hyoukjun Kwon

CVPR 2025posterarXiv:2411.10013
#5484

Seeing is Not Believing: Adversarial Natural Object Optimization for Hard-Label 3D Scene Attacks

Daizong Liu, Wei Hu

CVPR 2025poster
#5485

Once-Tuning-Multiple-Variants: Tuning Once and Expanded as Multiple Vision-Language Model Variants

Chong Yu, Tao Chen, Zhongxue Gan

CVPR 2025poster
#5486

Heterogeneous Skeleton-Based Action Representation Learning

Xiaoyan Ma, jidong kuang, Hongsong Wang et al.

CVPR 2025posterarXiv:2506.03481
#5487

Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation

Junjie Chen, Weilong Chen, Yifan Zuo et al.

CVPR 2025posterarXiv:2503.21140
#5488

Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes

Hyeonggon Ryu, Seongyu Kim, Joon Chung et al.

CVPR 2025poster
#5489

FASTer: Focal token Acquiring-and-Scaling Transformer for Long-term 3D Objection Detection

Chenxu Dang, Pei An, Xinmin Zhang et al.

CVPR 2025posterarXiv:2503.01899
#5490

Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes

Aodi Li, Liansheng Zhuang, Xiao Long et al.

CVPR 2025posterarXiv:2412.13573
#5491

Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization

Feifei Li, Mi Zhang, Yiming Sun et al.

CVPR 2025posterarXiv:2503.15197
#5492

Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning

Debora Caldarola, Pietro Cagnasso, Barbara Caputo et al.

CVPR 2025posterarXiv:2412.03752
#5493

LoKi: Low-dimensional KAN for Efficient Fine-tuning Image Models

Xuan Cai, Renjie Pan, Hua Yang

CVPR 2025poster
#5494

DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos

Lorenzo Mur-Labadia, Jose J. Guerrero, Ruben Martinez-Cantin

CVPR 2025highlightarXiv:2503.08344
#5495

AdMiT: Adaptive Multi-Source Tuning in Dynamic Environments

Xiangyu Chang, Fahim Faisal Niloy, Sk Miraj Ahmed et al.

CVPR 2025poster
#5496

Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning

Mi Luo, Zihui Xue, Alex Dimakis et al.

CVPR 2025poster
#5497

GIF: Generative Inspiration for Face Recognition at Scale

Mohammad Saadabadi Saadabadi, Sahar Rahimi Malakshan, Ali Dabouei et al.

CVPR 2025poster
#5498

CrossSDF: 3D Reconstruction of Thin Structures From Cross-Sections

Thomas Walker, Salvatore Esposito, Daniel Rebain et al.

CVPR 2025posterarXiv:2412.04120
#5499

Investigating the Role of Weight Decay in Enhancing Nonconvex SGD

Tao Sun, Yuhao Huang, Li Shen et al.

CVPR 2025poster
#5500

MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction

Jingcheng Ni, Yuxin Guo, Yichen Liu et al.

CVPR 2025posterarXiv:2502.11663
#5501

AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning

Kaixuan Wu, Xinde Li, Xinglin Li et al.

CVPR 2025poster
#5502

Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification

Gaozheng Pei, Shaojie Lyu, Gong Chen et al.

CVPR 2025posterarXiv:2503.01407
#5503

Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows

Shentong Mo, Yibing Song

CVPR 2025poster
#5504

DL2G: Degradation-guided Local-to-Global Restoration for Eyeglass Reflection Removal

Yizhilv, Xiao Lu, Hong Ding et al.

CVPR 2025poster
#5505

Efficient Decoupled Feature 3D Gaussian Splatting via Hierarchical Compression

Zhenqi Dai, Ting Liu, Yanning Zhang

CVPR 2025poster
#5506

Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes

Yiming Dou, Wonseok Oh, Yuqing Luo et al.

CVPR 2025posterarXiv:2506.09989
#5507

Domain Generalization in CLIP via Learning with Diverse Text Prompts

Changsong Wen, Zelin Peng, Yu Huang et al.

CVPR 2025poster
#5508

Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space

Yifan Zhou, Zeqi Xiao, Shuai Yang et al.

CVPR 2025posterarXiv:2503.09419
#5509

Weakly Supervised Contrastive Adversarial Training for Learning Robust Features from Semi-supervised Data

Lilin Zhang, Chengpei Wu, Ning Yang

CVPR 2025posterarXiv:2503.11032
#5510

CarPlanner: Consistent Auto-regressive Trajectory Planning for Large-Scale Reinforcement Learning in Autonomous Driving

Dongkun Zhang, Jiaming Liang, Ke Guo et al.

CVPR 2025posterarXiv:2502.19908
#5511

UCM-VeID V2: A Richer Dataset and A Pre-training Method for UAV Cross-Modality Vehicle Re-Identification

Xingyue Liu, Jiahao Qi, Chen Chen et al.

CVPR 2025poster
#5512

Unboxed: Geometrically and Temporally Consistent Video Outpainting

Zhongrui Yu, Martina Megaro-Boldini, Robert Sumner et al.

CVPR 2025poster
#5513

Visual Lexicon: Rich Image Features in Language Space

XuDong Wang, Xingyi Zhou, Alireza Fathi et al.

CVPR 2025posterarXiv:2412.06774
#5514

Continual SFT Matches Multimodal RLHF with Negative Supervision

Ke Zhu, Yu Wang, Yanpeng Sun et al.

CVPR 2025posterarXiv:2411.14797
#5515

Decoupled Motion Expression Video Segmentation

Hao Fang, Runmin Cong, Xiankai Lu et al.

CVPR 2025poster
#5516

Mixture of Submodules for Domain Adaptive Person Search

Minsu Kim, Seungryong Kim, Kwanghoon Sohn

CVPR 2025poster
#5517

Unsupervised Discovery of Facial Landmarks and Head Pose

Satyajit Tourani, Siddharth Tourani, Arif Mahmood et al.

CVPR 2025poster
#5518

Dynamic Integration of Task-Specific Adapters for Class Incremental Learning

Jiashuo Li, Shaokun Wang, Bo Qian et al.

CVPR 2025posterarXiv:2409.14983
#5519

Test-time Augmentation Improves Efficiency in Conformal Prediction

Divya M Shanmugam, Helen Lu, Swami Sankaranarayanan et al.

CVPR 2025posterarXiv:2505.22764
#5520

GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding

Yawen Shao, Wei Zhai, Yuhang Yang et al.

CVPR 2025posterarXiv:2411.19626
#5521

Dual Diffusion for Unified Image Generation and Understanding

Zijie Li, Henry Li, Yichun Shi et al.

CVPR 2025posterarXiv:2501.00289
#5522

Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning

Huabin Liu, Filip Ilievski, Cees G. M. Snoek

CVPR 2025posterarXiv:2501.05069
#5523

Enduring, Efficient and Robust Trajectory Prediction Attack in Autonomous Driving via Optimization-Driven Multi-Frame Perturbation Framework

Yi Yu, Weizhen Han, Libing Wu et al.

CVPR 2025highlight
#5524

UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning

Long Zhou, Fereshteh Shakeri, Aymen Sadraoui et al.

CVPR 2025posterarXiv:2412.16739
#5525

Flash-Split: 2D Reflection Removal with Flash Cues and Latent Diffusion Separation

Tianfu Wang, Mingyang Xie, Haoming Cai et al.

CVPR 2025posterarXiv:2501.00637
#5526

Test-Time Backdoor Detection for Object Detection Models

Hangtao Zhang, Yichen Wang, Shihui Yan et al.

CVPR 2025posterarXiv:2503.15293
#5527

Classifier-Free Guidance Inside the Attraction Basin May Cause Memorization

Anubhav Jain, Yuya Kobayashi, Takashi Shibuya et al.

CVPR 2025posterarXiv:2411.16738
#5528

Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual

Chong Wang, Lanqing Guo, Zixuan Fu et al.

CVPR 2025posterarXiv:2503.01288
#5529

Harnessing Frequency Spectrum Insights for Image Copyright Protection Against Diffusion Models

Zhenguang Liu, Chao Shuai, Shaojing Fan et al.

CVPR 2025posterarXiv:2503.11071
#5530

Gain from Neighbors: Boosting Model Robustness in the Wild via Adversarial Perturbations Toward Neighboring Classes

Zhou Yang, Mingtao Feng, Tao Huang et al.

CVPR 2025poster
#5531

Enhancing Creative Generation on Stable Diffusion-based Models

Jiyeon Han, Dahee Kwon, Gayoung Lee et al.

CVPR 2025posterarXiv:2503.23538
#5532

EquiPose: Exploiting Permutation Equivariance for Relative Camera Pose Estimation

Yuzhen Liu, Qiulei Dong

CVPR 2025poster
#5533

EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues

Sagar Soni, Akshay Dudhane, Hiyam Debary et al.

CVPR 2025posterarXiv:2412.15190
#5534

Visual Consensus Prompting for Co-Salient Object Detection

Jie Wang, Nana Yu, Zihao Zhang et al.

CVPR 2025posterarXiv:2504.14254
#5535

Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification

Dongseob Kim, Hyunjung Shim

CVPR 2025posterarXiv:2503.16873
#5536

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Hang Yin, Xiuwei Xu, Linqing Zhao et al.

CVPR 2025posterarXiv:2503.10630
#5537

Floating No More: Object-Ground Reconstruction from a Single Image

Yunze Man, Yichen Sheng, Jianming Zhang et al.

CVPR 2025posterarXiv:2407.18914
#5538

pFedMxF: Personalized Federated Class-Incremental Learning with Mixture of Frequency Aggregation

Yifei Zhang, Hao Zhu, Alysa Ziying Tan et al.

CVPR 2025poster
#5539

The Art of Deception: Color Visual Illusions and Diffusion Models

Alexandra Gomez-Villa, Kai Wang, C.Alejandro Parraga et al.

CVPR 2025posterarXiv:2412.10122
#5540

ACAttack: Adaptive Cross Attacking RGB-T Tracker via Multi-Modal Response Decoupling

Xinyu Xiang, Qinglong Yan, HAO ZHANG et al.

CVPR 2025poster
#5541

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Bingda Tang, Sayak Paul, Boyang Zheng et al.

CVPR 2025posterarXiv:2505.10046
#5542

SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting

Dongliang Luo, Hanshen Zhu, Ziyang Zhang et al.

CVPR 2025posterarXiv:2504.09966
#5543

Distilling Long-tailed Datasets

Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang et al.

CVPR 2025posterarXiv:2408.14506
#5544

Knowledge Memorization and Rumination for Pre-trained Model-based Class-Incremental Learning

Zijian Gao, Wangwang Jia, Xingxing Zhang et al.

CVPR 2025poster
#5545

PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction

Eduard Poesina, Adriana Valentina Costache, Adrian-Gabriel Chifu et al.

CVPR 2025posterarXiv:2406.04746
#5546

CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices

Mariamma Antony, Rajiv Porana, Sahil M. Lathiya et al.

CVPR 2025poster
#5547

From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling

Jinhong Lin, Cheng-En Wu, Huanran Li et al.

CVPR 2025posterarXiv:2411.10685
#5548

SINR: Sparsity Driven Compressed Implicit Neural Representations

Dhananjaya Jayasundara, Sudarshan Rajagopalan, Yasiru Ranasinghe et al.

CVPR 2025posterarXiv:2503.19576
#5549

Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways

Yi Liu, Hao Zhou, Benlei Cui et al.

CVPR 2025highlightarXiv:2503.07026
#5550

DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition

Caoshuo Li, Tanzhe Li, Xiaobin Hu et al.

CVPR 2025posterarXiv:2503.14867
#5551

RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability

Minh Kha Do, Kang Han, Phu Lai et al.

CVPR 2025poster
#5552

MESC-3D:Mining Effective Semantic Cues for 3D Reconstruction from a Single Image

Shaoming Li, Qing Cai, Songqi KONG et al.

CVPR 2025poster
#5553

Probing the Mid-level Vision Capabilities of Self-Supervised Learning

Xuweiyi Chen, Markus Marks, Zezhou Cheng

CVPR 2025posterarXiv:2411.17474
#5554

ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation

Zirun Guo, Tao Jin

CVPR 2025posterarXiv:2503.10358
#5555

Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning

Jeongryong Lee, Yejee Shin, Geonhui Son et al.

CVPR 2025poster
#5556

The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation

Bingjie Gao, Xinyu Gao, Xiaoxue Wu et al.

CVPR 2025posterarXiv:2504.11739
#5557

VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning

Haoran Xu, Peixi Peng, Guang Tan et al.

CVPR 2025poster
#5558

Beyond Image Classification: A Video Benchmark and Dual-Branch Hybrid Discrimination Framework for Compositional Zero-Shot Learning

Dongyao Jiang, Haodong Jing, Yongqiang Ma et al.

CVPR 2025poster
#5559

AniGrad: Anisotropic Gradient-Adaptive Sampling for 3D Reconstruction From Monocular Video

Noah Stier, Alex Rich, Pradeep Sen et al.

CVPR 2025poster
#5560

Easy-editable Image Vectorization with Multi-layer Multi-scale Distributed Visual Feature Embedding

Ye Chen, Zhangli Hu, Zhongyin Zhao et al.

CVPR 2025poster
#5561

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding

Yudong Han, Qingpei Guo, Liyuan Pan et al.

CVPR 2025posterarXiv:2411.12355
#5562

Automated Proof of Polynomial Inequalities via Reinforcement Learning

Banglong Liu, Niuniu Qi, Xia Zeng et al.

CVPR 2025posterarXiv:2503.06592
#5563

BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence

Xuewu Lin, Tianwei Lin, Alan Huang et al.

CVPR 2025posterarXiv:2411.14869
#5564

MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors

Riku Murai, Eric Dexheimer, Andrew J. Davison

CVPR 2025highlightarXiv:2412.12392
#5565

Toward Robust Neural Reconstruction from Sparse Point Sets

Amine Ouasfi, Shubhendu Jena, Eric Marchand et al.

CVPR 2025posterarXiv:2412.16361
#5566

Just Dance with pi! A Poly-modal Inductor for Weakly-supervised Video Anomaly Detection

Snehashis Majhi, Giacomo D'Amicantonio, Antitza Dantcheva et al.

CVPR 2025highlight
#5567

Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering

Yutao Feng, Xiang Feng, Yintong Shang et al.

CVPR 2025posterarXiv:2401.15318
#5568

Improving Accuracy and Calibration via Differentiated Deep Mutual Learning

Han Liu, Peng Cui, Bingning Wang et al.

CVPR 2025poster
#5569

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer

Jiahao Cui, Hui Li, Qingkun Su et al.

CVPR 2025posterarXiv:2412.00733
#5570

Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation

Fengfan Zhou, Bangjie Yin, Hefei Ling et al.

CVPR 2025posterarXiv:2411.15555
#5571

Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward

Zhiwei Jia, Yuesong Nan, Huixi Zhao et al.

CVPR 2025posterarXiv:2411.15247
#5572

CaricatureBooth: Data-Free Interactive Caricature Generation in a Photo Booth

Zhiyu Qu, Yunqi Miao, Zhensong Zhang et al.

CVPR 2025poster
#5573

Channel-wise Noise Scheduled Diffusion for Inverse Rendering in Indoor Scenes

JunYong Choi, Min-Cheol Sagong, SeokYeong Lee et al.

CVPR 2025posterarXiv:2503.09993
#5574

Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models

Yan Xie, Zequn Zeng, Hao Zhang et al.

CVPR 2025posterarXiv:2505.07209
#5575

Embodied Scene Understanding for Vision Language Models via MetaVQA

Weizhen Wang, Chenda Duan, Zhenghao Peng et al.

CVPR 2025posterarXiv:2501.09167
#5576

Learning Flow Fields in Attention for Controllable Person Image Generation

Zijian Zhou, Shikun Liu, Xiao Han et al.

CVPR 2025posterarXiv:2412.08486
#5577

Forming Auxiliary High-confident Instance-level Loss to Promote Learning from Label Proportions

Tianhao Ma, Han Chen, Juncheng Hu et al.

CVPR 2025posterarXiv:2411.10364
#5578

Towards Open-Vocabulary Audio-Visual Event Localization

Jinxing Zhou, Dan Guo, Ruohao Guo et al.

CVPR 2025posterarXiv:2411.11278
#5579

ADU: Adaptive Detection of Unknown Categories in Black-Box Domain Adaptation

Yushan Lai, Guowen Li, Haoyuan Liang et al.

CVPR 2025poster
#5580

Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models

Zichen Miao, WEI CHEN, Qiang Qiu

CVPR 2025highlightarXiv:2503.18337
#5581

MVBoost: Boost 3D Reconstruction with Multi-View Refinement

Xiangyu Liu, Xiaomei Zhang, Zhiyuan Ma et al.

CVPR 2025posterarXiv:2411.17772
#5582

CARL: A Framework for Equivariant Image Registration

Hastings Greer, Lin Tian, François-Xavier Vialard et al.

CVPR 2025posterarXiv:2405.16738
#5583

AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation

Datao Tang, Xiangyong Cao, Xuan Wu et al.

CVPR 2025posterarXiv:2411.15497
#5584

UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines

Chen Tang, Xinzhu Ma, Encheng Su et al.

CVPR 2025posterarXiv:2503.20748
#5585

RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds

Kang You, Tong Chen, Dandan Ding et al.

CVPR 2025posterarXiv:2503.12382
#5586

MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks

Yifei Liu, Zhihang Zhong, Yifan Zhan et al.

CVPR 2025posterarXiv:2412.20522
#5587

Forensics Adapter: Adapting CLIP for Generalizable Face Forgery Detection

Xinjie Cui, Yuezun Li, Ao Luo et al.

CVPR 2025poster
#5588

Co-Speech Gesture Video Generation with Implicit Motion-Audio Entanglement

Xinjie Li, Ziyi Chen, Xinlu Yu et al.

CVPR 2025poster
#5589

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Chengyue Wu, Xiaokang Chen, Zhiyu Wu et al.

CVPR 2025posterarXiv:2410.13848