Most Cited CVPR "real-time decision-making" Papers

5,589 papers found • Page 3 of 28

#401

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models

Lin Li, Haoyan Guan, Jianing Qiu et al.

CVPR 2024posterarXiv:2403.01849
44
citations
#402

Vision-and-Language Navigation via Causal Learning

Liuyi Wang, Zongtao He, Ronghao Dang et al.

CVPR 2024posterarXiv:2404.10241
44
citations
#403

Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features

Youngmin Chung, Ji Hun Ha, Kyeong Chan Im et al.

CVPR 2024posterarXiv:2403.07592
44
citations
#404

Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation

Zhipeng Du, Miaojing Shi, Jiankang Deng

CVPR 2024posterarXiv:2312.01220
43
citations
#405

Posterior Distillation Sampling

Juil Koo, Chanho Park, Minhyuk Sung

CVPR 2024posterarXiv:2311.13831
43
citations
#406

4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations

Wenbo Wang, Hsuan-I Ho, Chen Guo et al.

CVPR 2024highlightarXiv:2404.18630
43
citations
#407

MET3R: Measuring Multi-View Consistency in Generated Images

Mohammad Asim, Christopher Wewer, Thomas Wimmer et al.

CVPR 2025posterarXiv:2501.06336
43
citations
#408

Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity

Yuhang Chen, Wenke Huang, Mang Ye

CVPR 2024posterarXiv:2405.16585
43
citations
#409

Error Detection in Egocentric Procedural Task Videos

Shih-Po Lee, Zijia Lu, Zekun Zhang et al.

CVPR 2024poster
43
citations
#410

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision

Yi Yu, Xue Yang, Qingyun Li et al.

CVPR 2024posterarXiv:2311.14758
43
citations
#411

Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring

Xin Gao, Tianheng Qiu, Xinyu Zhang et al.

CVPR 2024posterarXiv:2401.00027
43
citations
#412

Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text

Vasileios Baltatzis, Rolandos Alexandros Potamias, Evangelos Ververas et al.

CVPR 2024posterarXiv:2312.02702
43
citations
#413

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Rong Li, Shijie Li, Lingdong Kong et al.

CVPR 2025posterarXiv:2412.04383
43
citations
#414

DAVE - A Detect-and-Verify Paradigm for Low-Shot Counting

Jer Pelhan, Alan Lukezic, Vitjan Zavrtanik et al.

CVPR 2024posterarXiv:2404.16622
43
citations
#415

LEAD: Learning Decomposition for Source-free Universal Domain Adaptation

Sanqing Qu, Tianpei Zou, Lianghua He et al.

CVPR 2024posterarXiv:2403.03421
43
citations
#416

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

Dahyun Kang, Piotr Bojanowski, Huy V. Vo et al.

CVPR 2025posterarXiv:2412.16334
42
citations
#417

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control

Mariam Hassan, Sebastian Stapf, Ahmad Rahimi et al.

CVPR 2025posterarXiv:2412.11198
42
citations
#418

Learning the 3D Fauna of the Web

Zizhang Li, Dor Litvak, Ruining Li et al.

CVPR 2024posterarXiv:2401.02400
42
citations
#419

HouseCat6D - A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios

HyunJun Jung, Shun-Cheng Wu, Patrick Ruhkamp et al.

CVPR 2024highlightarXiv:2212.10428
42
citations
#420

Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion

Kiran Chhatre, Radek Danecek, Nikos Athanasiou et al.

CVPR 2024posterarXiv:2312.04466
42
citations
#421

UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All

Yuanhuiyi Lyu, Xu Zheng, Jiazhou Zhou et al.

CVPR 2024posterarXiv:2403.12532
42
citations
#422

CAGE: Controllable Articulation GEneration

Jiayi Liu, Hou In Ivan Tam, Ali Mahdavi Amiri et al.

CVPR 2024posterarXiv:2312.09570
42
citations
#423

Universal Actions for Enhanced Embodied Foundation Models

Jinliang Zheng, Jianxiong Li, Dongxiu Liu et al.

CVPR 2025posterarXiv:2501.10105
42
citations
#424

SemCity: Semantic Scene Generation with Triplane Diffusion

Jumin Lee, Sebin Lee, Changho Jo et al.

CVPR 2024posterarXiv:2403.07773
42
citations
#425

AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation

Haonan Wang, Qixiang ZHANG, Yi Li et al.

CVPR 2024posterarXiv:2403.01818
42
citations
#426

Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning

Yiwen Ye, Yutong Xie, Jianpeng Zhang et al.

CVPR 2024highlightarXiv:2311.17597
42
citations
#427

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana et al.

CVPR 2025highlightarXiv:2411.16508
42
citations
#428

Exploiting Diffusion Prior for Generalizable Dense Prediction

Hsin-Ying Lee, Hung-Yu Tseng, Hsin-Ying Lee et al.

CVPR 2024posterarXiv:2311.18832
42
citations
#429

Towards Efficient Replay in Federated Incremental Learning

Yichen Li, Qunwei Li, Haozhao Wang et al.

CVPR 2024posterarXiv:2403.05890
41
citations
#430

Generative Proxemics: A Prior for 3D Social Interaction from Images

Vickie Ye, Vickie Ye, Georgios Pavlakos et al.

CVPR 2024posterarXiv:2306.09337
41
citations
#431

DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation

Yifei Li, Hsiaoyu Chen, Egor Larionov et al.

CVPR 2024posterarXiv:2311.12194
41
citations
#432

FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization

Shuai Tan, Bin Ji, Ye Pan

CVPR 2024posterarXiv:2403.06375
41
citations
#433

Test-Time Domain Generalization for Face Anti-Spoofing

Qianyu Zhou, Ke-Yue Zhang, Taiping Yao et al.

CVPR 2024posterarXiv:2403.19334
41
citations
#434

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Yuqian Yuan, Hang Zhang, Wentong Li et al.

CVPR 2025posterarXiv:2501.00599
41
citations
#435

Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment

Zheren Fu, Lei Zhang, Hou Xia et al.

CVPR 2024poster
41
citations
#436

Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach

Guoqiang Liang, Kanghao Chen, Hangyu Li et al.

CVPR 2024posterarXiv:2404.00834
41
citations
#437

Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios

Jie Xu, Yazhou Ren, Xiaolong Wang et al.

CVPR 2024posterarXiv:2303.17245
41
citations
#438

ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection

Yichen Bai, Zongbo Han, Bing Cao et al.

CVPR 2024posterarXiv:2311.15243
40
citations
#439

A Distractor-Aware Memory for Visual Object Tracking with SAM2

Alan Lukezic, Jovana Videnović, Matej Kristan

CVPR 2025posterarXiv:2411.17576
40
citations
#440

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha et al.

CVPR 2025posterarXiv:2502.04144
40
citations
#441

Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving

Ziying Song, Caiyan Jia, Lin Liu et al.

CVPR 2025posterarXiv:2503.03125
40
citations
#442

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Zhiyuan Yan, Yandan Zhao, Shen Chen et al.

CVPR 2025posterarXiv:2408.17065
40
citations
#443

Prompt Learning via Meta-Regularization

Jinyoung Park, Juyeon Ko, Hyunwoo J. Kim

CVPR 2024posterarXiv:2404.00851
40
citations
#444

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

Dave Zhenyu Chen, Haoxuan Li, Hsin-Ying Lee et al.

CVPR 2024highlightarXiv:2311.17261
40
citations
#445

Multi-subject Open-set Personalization in Video Generation

Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2025posterarXiv:2501.06187
40
citations
#446

AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP

wenxin ma, Xu Zhang, Qingsong Yao et al.

CVPR 2025posterarXiv:2503.06661
40
citations
#447

Scene Adaptive Sparse Transformer for Event-based Object Detection

Yansong Peng, Li Hebei, Yueyi Zhang et al.

CVPR 2024posterarXiv:2404.01882
40
citations
#448

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

Hualie Jiang, Zhiqiang Lou, Laiyan Ding et al.

CVPR 2025posterarXiv:2501.09466
40
citations
#449

A Vision Check-up for Language Models

Pratyusha Sharma, Tamar Rott Shaham, Manel Baradad et al.

CVPR 2024highlightarXiv:2401.01862
40
citations
#450

OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation

Hui Li, Mingwang Xu, Qingkun Su et al.

CVPR 2025highlightarXiv:2412.00115
40
citations
#451

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

Xu Yang, Changxing Ding, Zhibin Hong et al.

CVPR 2024posterarXiv:2404.01089
40
citations
#452

NOPE: Novel Object Pose Estimation from a Single Image

Van Nguyen Nguyen, Thibault Groueix, Georgy Ponimatkin et al.

CVPR 2024posterarXiv:2303.13612
40
citations
#453

UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes

David Rozenberszki, Or Litany, Angela Dai

CVPR 2024posterarXiv:2303.14541
40
citations
#454

PAD: Patch-Agnostic Defense against Adversarial Patch Attacks

Lihua Jing, Rui Wang, Wenqi Ren et al.

CVPR 2024posterarXiv:2404.16452
39
citations
#455

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

Yibo Wang, Ruiyuan Gao, Kai Chen et al.

CVPR 2024posterarXiv:2403.13304
39
citations
#456

Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data

Yu Deng, Duomin Wang, Xiaohang Ren et al.

CVPR 2024posterarXiv:2311.18729
39
citations
#457

Learning Diffusion Texture Priors for Image Restoration

Tian Ye, Sixiang Chen, Wenhao Chai et al.

CVPR 2024highlight
39
citations
#458

NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models

Yusuf Dalva, Pinar Yanardag

CVPR 2024posterarXiv:2312.05390
39
citations
#459

Gradient Reweighting: Towards Imbalanced Class-Incremental Learning

Jiangpeng He

CVPR 2024posterarXiv:2402.18528
39
citations
#460

Sonata: Self-Supervised Learning of Reliable Point Representations

Xiaoyang Wu, Daniel DeTone, Duncan Frost et al.

CVPR 2025highlightarXiv:2503.16429
39
citations
#461

A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak et al.

CVPR 2024posterarXiv:2312.07395
39
citations
#462

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

Qingping SUN, Yanjun Wang, Ailing Zeng et al.

CVPR 2024posterarXiv:2403.17934
39
citations
#463

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer

Jiajun Deng, Tianyu He, Li Jiang et al.

CVPR 2025posterarXiv:2501.01163
39
citations
#464

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Junbo Niu, Yifei Li, Ziyang Miao et al.

CVPR 2025posterarXiv:2501.05510
39
citations
#465

ZeroShape: Regression-based Zero-shot Shape Reconstruction

Zixuan Huang, Stefan Stojanov, Anh Thai et al.

CVPR 2024posterarXiv:2312.14198
39
citations
#466

GLACE: Global Local Accelerated Coordinate Encoding

Fangjinhua Wang, Xudong Jiang, Silvano Galliani et al.

CVPR 2024posterarXiv:2406.04340
39
citations
#467

DrVideo: Document Retrieval Based Long Video Understanding

Ziyu Ma, Chenhui Gou, Hengcan Shi et al.

CVPR 2025posterarXiv:2406.12846
39
citations
#468

SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction

Yang Zhou, Hao Shao, Letian Wang et al.

CVPR 2024posterarXiv:2403.11492
39
citations
#469

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

Lianghui Zhu, Zilong Huang, Bencheng Liao et al.

CVPR 2025posterarXiv:2405.18428
38
citations
#470

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Fangfu Liu, Diankun Wu, Yi Wei et al.

CVPR 2024posterarXiv:2312.06655
38
citations
#471

5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks

Dongshuo Yin, Leiyi Hu, Bin Li et al.

CVPR 2025posterarXiv:2408.08345
38
citations
#472

Multi-view Aggregation Network for Dichotomous Image Segmentation

Qian Yu, Xiaoqi Zhao, Youwei Pang et al.

CVPR 2024highlightarXiv:2404.07445
38
citations
#473

EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning

Hongxia Xie, Chu-Jun Peng, Yu-Wen Tseng et al.

CVPR 2024posterarXiv:2404.16670
38
citations
#474

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

Zehuan Huang, Yuanchen Guo, Xingqiao An et al.

CVPR 2025posterarXiv:2412.03558
38
citations
#475

FastVLM: Efficient Vision Encoding for Vision Language Models

Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li et al.

CVPR 2025posterarXiv:2412.13303
38
citations
#476

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Mengqi Huang, Zhendong Mao, Mingcong Liu et al.

CVPR 2024posterarXiv:2403.00483
38
citations
#477

Video-Guided Foley Sound Generation with Multimodal Controls

Ziyang Chen, Prem Seetharaman, Bryan Russell et al.

CVPR 2025posterarXiv:2411.17698
38
citations
#478

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Yufan He, Pengfei Guo, Yucheng Tang et al.

CVPR 2025posterarXiv:2406.05285
38
citations
#479

Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

JunDa Cheng, Wei Yin, Kaixuan Wang et al.

CVPR 2024posterarXiv:2403.07535
38
citations
#480

GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction

Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng et al.

CVPR 2025posterarXiv:2412.04384
38
citations
#481

PEM: Prototype-based Efficient MaskFormer for Image Segmentation

Niccolò Cavagnero, Gabriele Rosi, Claudia Cuttano et al.

CVPR 2024posterarXiv:2402.19422
37
citations
#482

Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection

Yajing Liu, Shijun Zhou, Xiyao Liu et al.

CVPR 2024highlightarXiv:2405.15225
37
citations
#483

SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models

Feifei Wang, Zhentao Tan, Tianyi Wei et al.

CVPR 2024posterarXiv:2312.07865
37
citations
#484

FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models

Jinglin Xu, Yijie Guo, Yuxin Peng

CVPR 2024highlightarXiv:2405.05216
37
citations
#485

Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression

Zichong Meng, Yiming Xie, Xiaogang Peng et al.

CVPR 2025posterarXiv:2411.16575
37
citations
#486

TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution

linwei dong, Qingnan Fan, Yihong Guo et al.

CVPR 2025posterarXiv:2411.18263
37
citations
#487

ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation

Moayed Haji Ali, Guha Balakrishnan, Vicente Ordonez

CVPR 2024posterarXiv:2311.18822
37
citations
#488

FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning

Rishub Tamirisa, Chulin Xie, Wenxuan Bao et al.

CVPR 2024posterarXiv:2404.02478
37
citations
#489

MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation

Hanzhe Hu, Zhizhuo Zhou, Varun Jampani et al.

CVPR 2024posterarXiv:2404.03656
37
citations
#490

Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion

Lucas Nunes, Rodrigo Marcuzzi, Benedikt Mersch et al.

CVPR 2024posterarXiv:2403.13470
37
citations
#491

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

Xin Li, Yunfei Wu, Xinghua Jiang et al.

CVPR 2024posterarXiv:2402.19014
37
citations
#492

HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models

Li Pang, Xiangyu Rui, Long Cui et al.

CVPR 2024posterarXiv:2402.15865
37
citations
#493

Disentangled Prompt Representation for Domain Generalization

De Cheng, Zhipeng Xu, XINYANG JIANG et al.

CVPR 2024poster
37
citations
#494

DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement

Jiuming Liu, Guangming Wang, Weicai Ye et al.

CVPR 2024poster
37
citations
#495

Revisiting Single Image Reflection Removal In the Wild

Yurui Zhu, Bo Li, Xueyang Fu et al.

CVPR 2024posterarXiv:2311.17320
37
citations
#496

Question Aware Vision Transformer for Multimodal Reasoning

Roy Ganz, Yair Kittenplon, Aviad Aberdam et al.

CVPR 2024highlightarXiv:2402.05472
37
citations
#497

SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving

Georg Hess, Carl Lindström, Maryam Fatemi et al.

CVPR 2025posterarXiv:2411.16816
37
citations
#498

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

XuDong Wang, Ishan Misra, Ziyun Zeng et al.

CVPR 2024posterarXiv:2308.14710
37
citations
#499

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

Zhicai Wang, Longhui Wei, Tan Wang et al.

CVPR 2024posterarXiv:2403.19600
37
citations
#500

Vision-Language Models Do Not Understand Negation

Kumail Alhamoud, Shaden Alshammari, Yonglong Tian et al.

CVPR 2025posterarXiv:2501.09425
36
citations
#501

Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives

Alex Hanson, Allen Tu, Geng Lin et al.

CVPR 2025posterarXiv:2412.00578
36
citations
#502

Distilling Semantic Priors from SAM to Efficient Image Restoration Models

Quan Zhang, Xiaoyu Liu, Wei Li et al.

CVPR 2024posterarXiv:2403.16368
36
citations
#503

View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network

Quan Zhang, Lei Wang, Vishal M. Patel et al.

CVPR 2024posterarXiv:2403.14513
36
citations
#504

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

Yiyu Zhuang, Jiaxi Lv, Hao Wen et al.

CVPR 2025posterarXiv:2412.14963
36
citations
#505

Mitigating Motion Blur in Neural Radiance Fields with Events and Frames

Marco Cannici, Davide Scaramuzza

CVPR 2024posterarXiv:2403.19780
36
citations
#506

Control4D: Efficient 4D Portrait Editing with Text

Ruizhi Shao, Jingxiang Sun, Cheng Peng et al.

CVPR 2024posterarXiv:2305.20082
36
citations
#507

Amodal Completion via Progressive Mixed Context Diffusion

Katherine Xu, Lingzhi Zhang, Jianbo Shi

CVPR 2024highlightarXiv:2312.15540
36
citations
#508

DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection

Yuhao Sun, Lingyun Yu, Hongtao Xie et al.

CVPR 2024posterarXiv:2405.09882
36
citations
#509

VicTR: Video-conditioned Text Representations for Activity Recognition

Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani et al.

CVPR 2024posterarXiv:2304.02560
36
citations
#510

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Chaehun Shin, Jooyoung Choi, Heeseung Kim et al.

CVPR 2025posterarXiv:2411.15466
36
citations
#511

Re-thinking Temporal Search for Long-Form Video Understanding

Jinhui Ye, Zihan Wang, Haosen Sun et al.

CVPR 2025posterarXiv:2504.02259
36
citations
#512

How to Configure Good In-Context Sequence for Visual Question Answering

Li Li, Jiawei Peng, huiyi chen et al.

CVPR 2024posterarXiv:2312.01571
36
citations
#513

Communication-Efficient Federated Learning with Accelerated Client Gradient

Geeho Kim, Jinkyu Kim, Bohyung Han

CVPR 2024posterarXiv:2201.03172
36
citations
#514

Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark

Ziyang Chen, Israel D. Gebru, Christian Richardt et al.

CVPR 2024highlightarXiv:2403.18821
36
citations
#515

Producing and Leveraging Online Map Uncertainty in Trajectory Prediction

Xunjiang Gu, Guanyu Song, Igor Gilitschenski et al.

CVPR 2024posterarXiv:2403.16439
36
citations
#516

GenZI: Zero-Shot 3D Human-Scene Interaction Generation

Lei Li, Angela Dai

CVPR 2024posterarXiv:2311.17737
36
citations
#517

Score-Guided Diffusion for 3D Human Recovery

Anastasis Stathopoulos, Ligong Han, Dimitris N. Metaxas

CVPR 2024posterarXiv:2403.09623
36
citations
#518

Neural Redshift: Random Networks are not Random Functions

Damien Teney, Armand Nicolicioiu, Valentin Hartmann et al.

CVPR 2024posterarXiv:2403.02241
36
citations
#519

Interactive Continual Learning: Fast and Slow Thinking

Biqing Qi, Xinquan Chen, Junqi Gao et al.

CVPR 2024posterarXiv:2403.02628
35
citations
#520

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Hao Li, Changyao TIAN, Jie Shao et al.

CVPR 2025posterarXiv:2412.09604
35
citations
#521

ICP-Flow: LiDAR Scene Flow Estimation with ICP

Yancong Lin, Holger Caesar

CVPR 2024posterarXiv:2402.17351
35
citations
#522

Alchemist: Parametric Control of Material Properties with Diffusion Models

Prafull Sharma, Varun Jampani, Yuanzhen Li et al.

CVPR 2024posterarXiv:2312.02970
35
citations
#523

Collaborating Foundation Models for Domain Generalized Semantic Segmentation

Yasser Benigmim, Subhankar Roy, Slim Essid et al.

CVPR 2024posterarXiv:2312.09788
35
citations
#524

Towards General Visual-Linguistic Face Forgery Detection

Ke Sun, Shen Chen, Taiping Yao et al.

CVPR 2025posterarXiv:2307.16545
35
citations
#525

RobustSAM: Segment Anything Robustly on Degraded Images

Wei-Ting Chen, Yu Jiet Vong, Sy-Yen Kuo et al.

CVPR 2024highlightarXiv:2406.09627
35
citations
#526

Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

Dazhong Shen, Guanglu Song, Zeyue Xue et al.

CVPR 2024posterarXiv:2404.05384
35
citations
#527

Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture

Fei Wang, Dan Guo, Kun Li et al.

CVPR 2024posterarXiv:2403.07347
35
citations
#528

LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis

Zehan Zheng, Fan Lu, Weiyi Xue et al.

CVPR 2024posterarXiv:2404.02742
35
citations
#529

UnO: Unsupervised Occupancy Fields for Perception and Forecasting

Ben Agro, Quinlan Sykora, Sergio Casas et al.

CVPR 2024posterarXiv:2406.08691
35
citations
#530

Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.

CVPR 2025posterarXiv:2503.17316
35
citations
#531

MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots

Tianchen Deng, Guole Shen, Chen Xun et al.

CVPR 2025poster
35
citations
#532

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi et al.

CVPR 2024posterarXiv:2312.12416
35
citations
#533

StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models

Yunzhi Yan, Zhen Xu, Haotong Lin et al.

CVPR 2025posterarXiv:2412.13188
35
citations
#534

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Yunhao Ge, Xiaohui Zeng, Jacob Huffman et al.

CVPR 2024posterarXiv:2404.19752
35
citations
#535

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models

Shenghao Fu, Qize Yang, Qijie Mo et al.

CVPR 2025highlightarXiv:2501.18954
34
citations
#536

MonoNPHM: Dynamic Head Reconstruction from Monocular Videos

Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos et al.

CVPR 2024highlightarXiv:2312.06740
34
citations
#537

High-fidelity Person-centric Subject-to-Image Synthesis

Yibin Wang, Weizhong Zhang, Jianwei Zheng et al.

CVPR 2024posterarXiv:2311.10329
34
citations
#538

Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning

Shiming Chen, Wenjin Hou, Salman Khan et al.

CVPR 2024posterarXiv:2404.07713
34
citations
#539

ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions

Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na et al.

CVPR 2025posterarXiv:2401.10232
34
citations
#540

AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities

Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.

CVPR 2025highlightarXiv:2412.14123
34
citations
#541

Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models

Peifei Zhu, Tsubasa Takahashi, Hirokatsu Kataoka

CVPR 2024posterarXiv:2404.09401
34
citations
#542

One Diffusion to Generate Them All

Duong H. Le, Tuan Pham, Sangho Lee et al.

CVPR 2025posterarXiv:2411.16318
34
citations
#543

ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association

Shuxiao Ding, Lukas Schneider, Marius Cordts et al.

CVPR 2024posterarXiv:2405.08909
34
citations
#544

CoralSCOP: Segment any COral Image on this Planet

Zheng Ziqiang, Liang Haixin, Binh-Son Hua et al.

CVPR 2024highlight
34
citations
#545

Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation

Yuanhong Chen, Yuyuan Liu, Hu Wang et al.

CVPR 2024posterarXiv:2304.02970
34
citations
#546

GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

Xiao Chen, Quanyi Li, Tai Wang et al.

CVPR 2024posterarXiv:2402.16174
34
citations
#547

ExtDM: Distribution Extrapolation Diffusion Model for Video Prediction

Zhicheng Zhang, Junyao Hu, Wentao Cheng et al.

CVPR 2024poster
34
citations
#548

Active Generalized Category Discovery

Shijie Ma, Fei Zhu, Zhun Zhong et al.

CVPR 2024posterarXiv:2403.04272
34
citations
#549

Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention

Xingyu Zhou, Leheng Zhang, Xiaorui Zhao et al.

CVPR 2024posterarXiv:2401.06312
34
citations
#550

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

Wei Li, Bing Hu, Rui Shao et al.

CVPR 2025posterarXiv:2503.03663
33
citations
#551

Simple Semantic-Aided Few-Shot Learning

Hai Zhang, Junzhe Xu, Shanlin Jiang et al.

CVPR 2024posterarXiv:2311.18649
33
citations
#552

CoGS: Controllable Gaussian Splatting

Heng Yu, Joel Julin, Zoltán Á. Milacski et al.

CVPR 2024posterarXiv:2312.05664
33
citations
#553

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

Zixuan Huang, Mark Boss, Aaryaman Vasishta et al.

CVPR 2025posterarXiv:2501.04689
33
citations
#554

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Ailin Deng, Tri Cao, Zhirui Chen et al.

CVPR 2025posterarXiv:2503.02199
33
citations
#555

Learning Object State Changes in Videos: An Open-World Perspective

Zihui Xue, Kumar Ashutosh, Kristen Grauman

CVPR 2024posterarXiv:2312.11782
33
citations
#556

PartGen: Part-level 3D Generation and Reconstruction with Multi-view Diffusion Models

Minghao Chen, Roman Shapovalov, Iro Laina et al.

CVPR 2025highlightarXiv:2412.18608
33
citations
#557

AutoAD III: The Prequel – Back to the Pixels

Tengda Han, Max Bain, Arsha Nagrani et al.

CVPR 2024posterarXiv:2404.14412
33
citations
#558

Rethinking Generalizable Face Anti-spoofing via Hierarchical Prototype-guided Distribution Refinement in Hyperbolic Space

Chengyang Hu, Ke-Yue Zhang, Taiping Yao et al.

CVPR 2024highlight
32
citations
#559

MAS: Multi-view Ancestral Sampling for 3D Motion Generation Using 2D Diffusion

Roy Kapon, Guy Tevet, Daniel Cohen-Or et al.

CVPR 2024posterarXiv:2310.14729
32
citations
#560

MonSter: Marry Monodepth to Stereo Unleashes Power

JunDa Cheng, Longliang Liu, Gangwei Xu et al.

CVPR 2025highlight
32
citations
#561

Towards Generalizable Multi-Object Tracking

Zheng Qin, Le Wang, Sanping Zhou et al.

CVPR 2024posterarXiv:2406.00429
32
citations
#562

Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

Renshuai Liu, Bowen Ma, Wei Zhang et al.

CVPR 2024highlightarXiv:2401.01207
32
citations
#563

Diffusion Renderer: Neural Inverse and Forward Rendering with Video Diffusion Models

Ruofan Liang, Žan Gojčič, Huan Ling et al.

CVPR 2025poster
32
citations
#564

GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs

Mustafa Munir, William Avery, Md Mostafijur Rahman et al.

CVPR 2024posterarXiv:2405.06849
32
citations
#565

REACTO: Reconstructing Articulated Objects from a Single Video

Chaoyue Song, Jiacheng Wei, Chuan-Sheng Foo et al.

CVPR 2024posterarXiv:2404.11151
32
citations
#566

Physical Property Understanding from Language-Embedded Feature Fields

Albert J. Zhai, Yuan Shen, Emily Y. Chen et al.

CVPR 2024posterarXiv:2404.04242
32
citations
#567

Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

Hao Li, Ying Chen, Yifei Chen et al.

CVPR 2024posterarXiv:2402.19326
32
citations
#568

3D-HGS: 3D Half-Gaussian Splatting

Haolin Li, Jinyang Liu, Mario Sznaier et al.

CVPR 2025posterarXiv:2406.02720
32
citations
#569

Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail

Luca Bartolomei, Fabio Tosi, Matteo Poggi et al.

CVPR 2025posterarXiv:2412.04472
32
citations
#570

HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D

Sangmin Woo, byeongjun park, Hyojun Go et al.

CVPR 2024posterarXiv:2312.15980
32
citations
#571

Generative Gaussian Splatting for Unbounded 3D City Generation

Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.

CVPR 2025posterarXiv:2406.06526
32
citations
#572

Three Pillars Improving Vision Foundation Model Distillation for Lidar

Gilles Puy, Spyros Gidaris, Alexandre Boulch et al.

CVPR 2024posterarXiv:2310.17504
32
citations
#573

A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution

Zhixiong Yang, Jingyuan Xia, Shengxi Li et al.

CVPR 2024posterarXiv:2404.15620
32
citations
#574

How Far Can We Compress Instant-NGP-Based NeRF?

Yihang Chen, Qianyi Wu, Mehrtash Harandi et al.

CVPR 2024posterarXiv:2406.04101
32
citations
#575

OpenStreetView-5M: The Many Roads to Global Visual Geolocation

Guillaume Astruc, Nicolas Dufour, Ioannis Siglidis et al.

CVPR 2024posterarXiv:2404.18873
32
citations
#576

AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture Search

Junghyup Lee, Bumsub Ham

CVPR 2024posterarXiv:2403.19232
32
citations
#577

Transductive Zero-Shot and Few-Shot CLIP

Ségolène Martin, Yunshi HUANG, Fereshteh Shakeri et al.

CVPR 2024highlightarXiv:2405.18437
32
citations
#578

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

Tiantian Geng, Jinrui Zhang, Qingni Wang et al.

CVPR 2025posterarXiv:2411.19772
32
citations
#579

Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM

Pingping Zhang, Tianyu Yan, Yang Liu et al.

CVPR 2024highlightarXiv:2404.04996
32
citations
#580

Inversion-Free Image Editing with Language-Guided Diffusion Models

Sihan Xu, Yidong Huang, Jiayi Pan et al.

CVPR 2024poster
32
citations
#581

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Lital Binyamin, Yoad Tewel, Hilit Segev et al.

CVPR 2025posterarXiv:2406.10210
32
citations
#582

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Hao Chen, Ze Wang, Xiang Li et al.

CVPR 2025posterarXiv:2412.10958
32
citations
#583

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.

CVPR 2025posterarXiv:2406.04321
32
citations
#584

Material Palette: Extraction of Materials from a Single Image

Ivan Lopes, Fabio Pizzati, Raoul de Charette

CVPR 2024posterarXiv:2311.17060
31
citations
#585

Open-World Human-Object Interaction Detection via Multi-modal Prompts

Jie Yang, Bingliang Li, Ailing Zeng et al.

CVPR 2024posterarXiv:2406.07221
31
citations
#586

ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More

Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu et al.

CVPR 2024highlightarXiv:2403.12534
31
citations
#587

PREGO: Online Mistake Detection in PRocedural EGOcentric Videos

Alessandro Flaborea, Guido M. D&amp, #x27 et al.

CVPR 2024posterarXiv:2404.01933
31
citations
#588

Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling

Shentong Mo, Pedro Morgado

CVPR 2024posterarXiv:2312.01017
31
citations
#589

Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding

seil kang, Jinyeong Kim, Junhyeok Kim et al.

CVPR 2025highlightarXiv:2503.06287
31
citations
#590

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Hyeonho Jeong, Chun-Hao P. Huang, Jong Chul Ye et al.

CVPR 2025posterarXiv:2412.06016
31
citations
#591

Contextrast: Contextual Contrastive Learning for Semantic Segmentation

Changki Sung, Wanhee Kim, Jungho An et al.

CVPR 2024posterarXiv:2404.10633
31
citations
#592

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Rui Qian, Shuangrui Ding, Xiaoyi Dong et al.

CVPR 2025posterarXiv:2501.03218
31
citations
#593

Complexity Experts are Task-Discriminative Learners for Any Image Restoration

Eduard Zamfir, Zongwei Wu, Nancy Mehta et al.

CVPR 2025posterarXiv:2411.18466
31
citations
#594

Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion

Fan Zhang, Shaodi You, Yu Li et al.

CVPR 2024highlightarXiv:2312.12471
31
citations
#595

CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection

Mikhail Kennerley, Jian-Gang Wang, Bharadwaj Veeravalli et al.

CVPR 2024posterarXiv:2403.19278
31
citations
#596

SpecNeRF: Gaussian Directional Encoding for Specular Reflections

Li Ma, Vasu Agrawal, Haithem Turki et al.

CVPR 2024highlightarXiv:2312.13102
31
citations
#597

ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models

Xubing Ye, Yukang Gan, Yixiao Ge et al.

CVPR 2025posterarXiv:2412.00447
31
citations
#598

FlowIE: Efficient Image Enhancement via Rectified Flow

Yixuan Zhu, Wenliang Zhao, Ao Li et al.

CVPR 2024posterarXiv:2406.00508
31
citations
#599

InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion

Jihyun Lee, Shunsuke Saito, Giljoo Nam et al.

CVPR 2024posterarXiv:2403.17422
30
citations
#600

Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning

xin zhang, Jiawei Du, Weiying Xie et al.

CVPR 2024posterarXiv:2311.13613
30
citations