Most Cited CVPR "face disturbance" Papers

5,589 papers found • Page 2 of 28

#201

VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis

Linshan Wu, Jia-Xin Zhuang, Hao Chen

CVPR 2024arXiv:2402.17300
76
citations
#202

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

Yiran Qin, Enshen Zhou, Qichang Liu et al.

CVPR 2024arXiv:2312.07472
76
citations
#203

Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

Leheng Zhang, Yawei Li, Xingyu Zhou et al.

CVPR 2024arXiv:2401.08209
76
citations
#204

SAI3D: Segment Any Instance in 3D Scenes

Yingda Yin, Yuzheng Liu, Yang Xiao et al.

CVPR 2024arXiv:2312.11557
76
citations
#205

Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis

Jiawen Li, Yuxuan Chen, Hongbo Chu et al.

CVPR 2024arXiv:2403.07719
75
citations
#206

GROUNDHOG: Grounding Large Language Models to Holistic Segmentation

Yichi Zhang, Ziqiao Ma, Xiaofeng Gao et al.

CVPR 2024arXiv:2402.16846
75
citations
#207

Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll

CVPR 2024arXiv:2312.11360
75
citations
#208

Structure-Aware Sparse-View X-ray 3D Reconstruction

Yuanhao Cai, Jiahao Wang, Alan L. Yuille et al.

CVPR 2024arXiv:2311.10959
75
citations
#209

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Yilun Zhao, Lujing Xie, Haowei Zhang et al.

CVPR 2025arXiv:2501.12380
74
citations
#210

Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology

Andrew Song, Richard J. Chen, Tong Ding et al.

CVPR 2024arXiv:2405.11643
74
citations
#211

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers

Walid Bousselham, Felix Petersen, Vittorio Ferrari et al.

CVPR 2024arXiv:2312.00878
74
citations
#212

COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction

Qihang Ma, Xin Tan, Yanyun Qu et al.

CVPR 2024arXiv:2312.01919
73
citations
#213

Multimodal Autoregressive Pre-training of Large Vision Encoders

Enrico Fini, Mustafa Shukor, Xiujun Li et al.

CVPR 2025highlightarXiv:2411.14402
73
citations
#214

LLaFS: When Large Language Models Meet Few-Shot Segmentation

Lanyun Zhu, Tianrun Chen, Deyi Ji et al.

CVPR 2024arXiv:2311.16926
73
citations
#215

ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion

Jiayu Yang, Ziang Cheng, Yunfei Duan et al.

CVPR 2024arXiv:2310.10343
72
citations
#216

MoCha-Stereo: Motif Channel Attention Network for Stereo Matching

Ziyang Chen, Wei Long, He Yao et al.

CVPR 2024arXiv:2404.06842
72
citations
#217

CoSeR: Bridging Image and Language for Cognitive Super-Resolution

Haoze Sun, Wenbo Li, Jianzhuang Liu et al.

CVPR 2024arXiv:2311.16512
71
citations
#218

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Evonne Ng, Javier Romero, Timur Bagautdinov et al.

CVPR 2024arXiv:2401.01885
71
citations
#219

Depth Information Assisted Collaborative Mutual Promotion Network for Single Image Dehazing

Yafei Zhang, Shen Zhou, Huafeng Li

CVPR 2024arXiv:2403.01105
71
citations
#220

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Ye Yuan, Xueting Li, Yangyi Huang et al.

CVPR 2024highlightarXiv:2312.11461
70
citations
#221

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

Xi Chen, Zhifei Zhang, He Zhang et al.

CVPR 2025highlightarXiv:2412.07774
70
citations
#222

VideoDPO: Omni-Preference Alignment for Video Diffusion Generation

Runtao Liu, Haoyu Wu, Zheng Ziqiang et al.

CVPR 2025arXiv:2412.14167
70
citations
#223

Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection

Zhiwei Yang, Jing Liu, Peng Wu

CVPR 2024arXiv:2404.08531
70
citations
#224

MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception

Thien-Minh Nguyen, Shenghai Yuan, Thien Nguyen et al.

CVPR 2024highlightarXiv:2403.11496
70
citations
#225

SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

Hsuan-I Ho, Jie Song, Otmar Hilliges

CVPR 2024arXiv:2311.15855
69
citations
#226

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

Moreno D&#x27, Incà, Elia Peruzzo et al.

CVPR 2024highlightarXiv:2404.07990
69
citations
#227

Free3D: Consistent Novel View Synthesis without 3D Representation

Chuanxia Zheng, Andrea Vedaldi

CVPR 2024arXiv:2312.04551
68
citations
#228

Optimizing Diffusion Noise Can Serve As Universal Motion Priors

Korrawe Karunratanakul, Konpat Preechakul, Emre Aksan et al.

CVPR 2024arXiv:2312.11994
68
citations
#229

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

Hanwen Jiang, Arjun Karpur, Bingyi Cao et al.

CVPR 2024arXiv:2405.12979
68
citations
#230

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Luo, Xue Yang, Wenhan Dou et al.

CVPR 2025arXiv:2410.08202
68
citations
#231

Adaptive Keyframe Sampling for Long Video Understanding

Xi Tang, Jihao Qiu, Lingxi Xie et al.

CVPR 2025arXiv:2502.21271
68
citations
#232

Towards Realistic Scene Generation with LiDAR Diffusion Models

Haoxi Ran, Vitor Guizilini, Yue Wang

CVPR 2024arXiv:2404.00815
68
citations
#233

BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP

Jiawang Bai, Kuofeng Gao, Shaobo Min et al.

CVPR 2024arXiv:2311.16194
68
citations
#234

One-Minute Video Generation with Test-Time Training

Jiarui Xu, Shihao Han, Karan Dalal et al.

CVPR 2025arXiv:2504.05298
67
citations
#235

Scaling Laws for Data Filtering— Data Curation cannot be Compute Agnostic

Sachin Goyal, Pratyush Maini, Zachary Lipton et al.

CVPR 2024arXiv:2404.07177
67
citations
#236

DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

Kaiwen Zhang, Yifan Zhou, Xudong XU et al.

CVPR 2024arXiv:2312.07409
66
citations
#237

OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning

Haiyang Ying, Yixuan Yin, Jinzhi Zhang et al.

CVPR 2024arXiv:2311.11666
66
citations
#238

FINER: Flexible Spectral-bias Tuning in Implicit NEural Representation by Variable-periodic Activation Functions

Zhen Liu, Hao Zhu, Qi Zhang et al.

CVPR 2024arXiv:2312.02434
66
citations
#239

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

Shanshan Zhong, Zhongzhan Huang, Shanghua Gao et al.

CVPR 2024arXiv:2312.02439
65
citations
#240

Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

Duo Zheng, Shijia Huang, Liwei Wang

CVPR 2025arXiv:2412.00493
65
citations
#241

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

Oindrila Saha, Grant Horn, Subhransu Maji

CVPR 2024arXiv:2401.02460
65
citations
#242

SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Zhenglin Huang, Jinwei Hu, Yiwei He et al.

CVPR 2025arXiv:2412.04292
64
citations
#243

Open-Vocabulary Video Anomaly Detection

Peng Wu, Xuerong Zhou, Guansong Pang et al.

CVPR 2024arXiv:2311.07042
64
citations
#244

MonoCD: Monocular 3D Object Detection with Complementary Depths

Longfei Yan, Pei Yan, Shengzhou Xiong et al.

CVPR 2024arXiv:2404.03181
64
citations
#245

Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation

Shuting He, Henghui Ding

CVPR 2024arXiv:2404.03645
64
citations
#246

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Zhixuan Liang, Yao Mu, Hengbo Ma et al.

CVPR 2024arXiv:2312.11598
64
citations
#247

UniScene: Unified Occupancy-centric Driving Scene Generation

Bohan Li, Jiazhe Guo, Hongsi Liu et al.

CVPR 2025arXiv:2412.05435
64
citations
#248

NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Weining Ren, Zihan Zhu, Boyang Sun et al.

CVPR 2024arXiv:2405.18715
64
citations
#249

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

Jiamian Wang, Guohao Sun, Pichao Wang et al.

CVPR 2024highlightarXiv:2403.17998
63
citations
#250

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

Jisu Nam, Heesu Kim, DongJae Lee et al.

CVPR 2024arXiv:2402.09812
63
citations
#251

Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation

guo, Tianwei Lin

CVPR 2024arXiv:2312.10113
63
citations
#252

Video Interpolation with Diffusion Models

Siddhant Jain, Daniel Watson, Aleksander Holynski et al.

CVPR 2024arXiv:2404.01203
63
citations
#253

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

Yuchao Gu, Yipin Zhou, Bichen Wu et al.

CVPR 2024arXiv:2312.02087
63
citations
#254

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Ziqi Pang, Tianyuan Zhang, Fujun Luan et al.

CVPR 2025arXiv:2412.01827
63
citations
#255

IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation

Yizhi Song, Zhifei Zhang, Zhe Lin et al.

CVPR 2024arXiv:2403.10701
63
citations
#256

Source-Free Domain Adaptation with Frozen Multimodal Foundation Model

Song Tang, Wenxin Su, Mao Ye et al.

CVPR 2024arXiv:2311.16510
62
citations
#257

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Zhangyang Qi, Ye Fang, Zeyi Sun et al.

CVPR 2024highlightarXiv:2312.02980
62
citations
#258

DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation

Junming Chen, Yunfei Liu, Jianan Wang et al.

CVPR 2024arXiv:2401.04747
62
citations
#259

Koala: Key Frame-Conditioned Long Video-LLM

Reuben Tan, Ximeng Sun, Ping Hu et al.

CVPR 2024highlightarXiv:2404.04346
62
citations
#260

SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement

Mark Boss, Zixuan Huang, Aaryaman Vasishta et al.

CVPR 2025arXiv:2408.00653
62
citations
#261

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models

Yongting Zhang, Lu Chen, Guodong Zheng et al.

CVPR 2025arXiv:2406.12030
61
citations
#262

Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships

Sebastian Koch, Narunas Vaskevicius, Mirco Colosi et al.

CVPR 2024arXiv:2402.12259
61
citations
#263

StableAnimator: High-Quality Identity-Preserving Human Image Animation

Shuyuan Tu, Zhen Xing, Xintong Han et al.

CVPR 2025arXiv:2411.17697
61
citations
#264

A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models

Julio Silva-Rodríguez, Sina Hajimiri, Ismail Ben Ayed et al.

CVPR 2024arXiv:2312.12730
61
citations
#265

PEEKABOO: Interactive Video Generation via Masked-Diffusion

Yash Jain, Anshul Nasery, Vibhav Vineet et al.

CVPR 2024arXiv:2312.07509
61
citations
#266

DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction

Weiyi Lv, Yuhang Huang, NING Zhang et al.

CVPR 2024arXiv:2403.02075
61
citations
#267

HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention

Xiaolong Tang, Meina Kan, Shiguang Shan et al.

CVPR 2024arXiv:2404.06351
61
citations
#268

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

Ege Ozguroglu, Ruoshi Liu, Dídac Surís et al.

CVPR 2024highlightarXiv:2401.14398
61
citations
#269

ControlRoom3D: Room Generation using Semantic Proxy Rooms

Jonas Schult, Sam Tsai, Lukas Höllein et al.

CVPR 2024arXiv:2312.05208
60
citations
#270

Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding

Zhiheng Cheng, Qingyue Wei, Hongru Zhu et al.

CVPR 2024arXiv:2403.18271
60
citations
#271

DePT: Decoupled Prompt Tuning

Ji Zhang, Shihan Wu, Lianli Gao et al.

CVPR 2024arXiv:2309.07439
60
citations
#272

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Ryan Burgert, Yuancheng Xu, Wenqi Xian et al.

CVPR 2025arXiv:2501.08331
59
citations
#273

DIFIX3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki et al.

CVPR 2025arXiv:2503.01774
59
citations
#274

PerceptionGPT: Effectively Fusing Visual Perception into LLM

Renjie Pi, Lewei Yao, Jiahui Gao et al.

CVPR 2024highlightarXiv:2311.06612
59
citations
#275

Point Cloud Pre-training with Diffusion Models

xiao zheng, Xiaoshui Huang, Guofeng Mei et al.

CVPR 2024arXiv:2311.14960
59
citations
#276

Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers

Hongjie Wang, Bhishma Dedhia, Niraj Jha

CVPR 2024arXiv:2305.17328
59
citations
#277

Driving Everywhere with Large Language Model Policy Adaptation

Boyi Li, Yue Wang, Jiageng Mao et al.

CVPR 2024arXiv:2402.05932
59
citations
#278

Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos

Linyi Jin, Richard Tucker, Zhengqi Li et al.

CVPR 2025arXiv:2412.09621
58
citations
#279

Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction

Junuk Cha, Jihyeon Kim, Jae Shin Yoon et al.

CVPR 2024arXiv:2404.00562
58
citations
#280

Seamless Human Motion Composition with Blended Positional Encodings

German Barquero, Sergio Escalera, Cristina Palmero

CVPR 2024arXiv:2402.15509
58
citations
#281

RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

Tianyu Yu, Haoye Zhang, Qiming Li et al.

CVPR 2025highlightarXiv:2405.17220
58
citations
#282

MUSt3R: Multi-view Network for Stereo 3D Reconstruction

Yohann Cabon, Lucas Stoffl, Leonid Antsfeld et al.

CVPR 2025highlightarXiv:2503.01661
57
citations
#283

Align3R: Aligned Monocular Depth Estimation for Dynamic Videos

Edward LOO, Tianyu HUANG, Peng Li et al.

CVPR 2025highlightarXiv:2412.03079
57
citations
#284

FedAS: Bridging Inconsistency in Personalized Federated Learning

Xiyuan Yang, Wenke Huang, Mang Ye

CVPR 2024
57
citations
#285

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

Yanzuo Lu, Manlin Zhang, Jinhua Ma et al.

CVPR 2024highlightarXiv:2402.18078
57
citations
#286

Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction

Devikalyan Das, Christopher Wewer, Raza Yunus et al.

CVPR 2024arXiv:2312.01196
57
citations
#287

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Orr Zohar, Xiaohan Wang, Yann Dubois et al.

CVPR 2025arXiv:2412.10360
55
citations
#288

Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives

Ronghui Li, Yuxiang Zhang, Yachao Zhang et al.

CVPR 2024arXiv:2403.10518
55
citations
#289

Stable Flow: Vital Layers for Training-Free Image Editing

Omri Avrahami, Or Patashnik, Ohad Fried et al.

CVPR 2025arXiv:2411.14430
55
citations
#290

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors

Nicolae Ristea, Florinel Croitoru, Radu Tudor Ionescu et al.

CVPR 2024arXiv:2306.12041
55
citations
#291

Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID

Wentao Tan, Changxing Ding, Jiayu Jiang et al.

CVPR 2024arXiv:2405.04940
55
citations
#292

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

Zhiwu Qing, Shiwei Zhang, Jiayu Wang et al.

CVPR 2024arXiv:2312.04483
55
citations
#293

Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

Lei Chen, Yuan Meng, Chen Tang et al.

CVPR 2025arXiv:2406.17343
54
citations
#294

Wonderland: Navigating 3D Scenes from a Single Image

Hanwen Liang, Junli Cao, Vidit Goel et al.

CVPR 2025arXiv:2412.12091
54
citations
#295

Rethinking Diffusion Model for Multi-Contrast MRI Super-Resolution

Guangyuan Li, Chen Rao, Juncheng Mo et al.

CVPR 2024arXiv:2404.04785
54
citations
#296

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Songhao Han, Wei Huang, Hairong Shi et al.

CVPR 2025arXiv:2411.14794
54
citations
#297

Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching

Shitong Shao, Zeyuan Yin, Muxin Zhou et al.

CVPR 2024highlightarXiv:2311.17950
54
citations
#298

Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction

Inhwan Bae, Junoh Lee, Hae-Gon Jeon

CVPR 2024arXiv:2403.18447
54
citations
#299

Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance

Tomer Garber, Tom Tirer

CVPR 2024arXiv:2312.16519
54
citations
#300

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

Zebin Xing, Xingyu Zhang, Yang Hu et al.

CVPR 2025arXiv:2503.05689
54
citations
#301

Text2Loc: 3D Point Cloud Localization from Natural Language

Yan Xia, Letian Shi, Zifeng Ding et al.

CVPR 2024arXiv:2311.15977
54
citations
#302

3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting

Qi Wu, Janick Martinez Esturo, Ashkan Mirzaei et al.

CVPR 2025arXiv:2412.12507
54
citations
#303

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Rang Meng, Xingyu Zhang, Yuming Li et al.

CVPR 2025arXiv:2411.10061
54
citations
#304

SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

Yamei Chen, Yan Di, Guangyao Zhai et al.

CVPR 2024arXiv:2311.11125
54
citations
#305

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

Peng Lu, Tao Jiang, Yining Li et al.

CVPR 2024arXiv:2312.07526
54
citations
#306

ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration

Chaojun Ni, Guosheng Zhao, Xiaofeng Wang et al.

CVPR 2025arXiv:2411.19548
54
citations
#307

MemFlow: Optical Flow Estimation and Prediction with Memory

Qiaole Dong, Yanwei Fu

CVPR 2024arXiv:2404.04808
54
citations
#308

WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild

Rolandos Alexandros Potamias, Jinglei Zhang, Jiankang Deng et al.

CVPR 2025arXiv:2409.12259
53
citations
#309

Text-Image Alignment for Diffusion-Based Perception

Neehar Kondapaneni, Markus Marks, Manuel Knott et al.

CVPR 2024arXiv:2310.00031
53
citations
#310

FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion

George Cazenavette, Avneesh Sud, Thomas Leung et al.

CVPR 2024arXiv:2406.08603
53
citations
#311

Task Singular Vectors: Reducing Task Interference in Model Merging

Antonio Andrea Gargiulo, Donato Crisostomi, Maria Sofia Bucarelli et al.

CVPR 2025arXiv:2412.00081
53
citations
#312

Multiple Object Tracking as ID Prediction

Ruopeng Gao, Ji Qi, Limin Wang

CVPR 2025arXiv:2403.16848
53
citations
#313

GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh

Jing Wen, Xiaoming Zhao, Jason Ren et al.

CVPR 2024arXiv:2404.07991
53
citations
#314

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Xiang Wang, Shiwei Zhang, Hangjie Yuan et al.

CVPR 2024arXiv:2312.15770
53
citations
#315

Goku: Flow Based Video Generative Foundation Models

Shoufa Chen, Chongjian GE, Yuqi Zhang et al.

CVPR 2025highlightarXiv:2502.04896
53
citations
#316

Visual In-Context Prompting

Feng Li, Qing Jiang, Hao Zhang et al.

CVPR 2024arXiv:2311.13601
52
citations
#317

Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection

Jia Guo, Shuai Lu, Weihang Zhang et al.

CVPR 2025arXiv:2405.14325
52
citations
#318

SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

Kiana Ehsani, Tanmay Gupta, Rose Hendrix et al.

CVPR 2024arXiv:2312.02976
52
citations
#319

FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models

Shivangi Aneja, Justus Thies, Angela Dai et al.

CVPR 2024arXiv:2312.08459
52
citations
#320

AvatarGPT: All-in-One Framework for Motion Understanding Planning Generation and Beyond

Zixiang Zhou, Yu Wan, Baoyuan Wang

CVPR 2024
52
citations
#321

A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint

Xiaofeng Cong, Jie Gui, Jing Zhang et al.

CVPR 2024arXiv:2403.18548
52
citations
#322

PointOBB: Learning Oriented Object Detection via Single Point Supervision

Junwei Luo, Xue Yang, Yi Yu et al.

CVPR 2024arXiv:2311.14757
52
citations
#323

Enhancing Multimodal Cooperation via Sample-level Modality Valuation

Yake Wei, Ruoxuan Feng, Zihe Wang et al.

CVPR 2024arXiv:2309.06255
51
citations
#324

Accelerating Diffusion Sampling with Optimized Time Steps

Shuchen Xue, Zhaoqiang Liu, Fei Chen et al.

CVPR 2024arXiv:2402.17376
51
citations
#325

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames

Shuming Liu, Chenlin Zhang, Chen Zhao et al.

CVPR 2024arXiv:2311.17241
51
citations
#326

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

Yining Hong, Zishuo Zheng, Peihao Chen et al.

CVPR 2024arXiv:2401.08577
51
citations
#327

MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures

Zhangyang Xiong, Chenghong Li, Kenkun Liu et al.

CVPR 2024arXiv:2312.02963
51
citations
#328

Describing Differences in Image Sets with Natural Language

Lisa Dunlap, Yuhui Zhang, Xiaohan Wang et al.

CVPR 2024arXiv:2312.02974
51
citations
#329

CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing

Ajian Liu, Shuai Xue, Gan Jianwen et al.

CVPR 2024highlightarXiv:2403.14333
51
citations
#330

Bilateral Propagation Network for Depth Completion

Jie Tang, Fei-Peng Tian, Boshi An et al.

CVPR 2024arXiv:2403.11270
51
citations
#331

Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification

kaijie ren, Lei Zhang

CVPR 2024arXiv:2403.11708
51
citations
#332

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

Zike Wu, Pan Zhou, YI Xuanyu et al.

CVPR 2024arXiv:2401.09050
51
citations
#333

DiffusionLight: Light Probes for Free by Painting a Chrome Ball

Pakkapon Phongthawee, Worameth Chinchuthakun, Nontaphat Sinsunthithet et al.

CVPR 2024arXiv:2312.09168
51
citations
#334

Jack of All Tasks Master of Many: Designing General-Purpose Coarse-to-Fine Vision-Language Model

Shraman Pramanick, Guangxing Han, Rui Hou et al.

CVPR 2024highlightarXiv:2312.12423
50
citations
#335

Discovering and Mitigating Visual Biases through Keyword Explanation

Younghyun Kim, Sangwoo Mo, Minkyu Kim et al.

CVPR 2024highlightarXiv:2301.11104
50
citations
#336

Few-Shot Object Detection with Foundation Models

Guangxing Han, Ser-Nam Lim

CVPR 2024
50
citations
#337

Matching Anything by Segmenting Anything

Siyuan Li, Lei Ke, Martin Danelljan et al.

CVPR 2024highlightarXiv:2406.04221
50
citations
#338

Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network

wenqiao Li, Xiaohao Xu, Yao Gu et al.

CVPR 2024arXiv:2311.14897
50
citations
#339

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Pingping Zhang, Yuhao Wang, Yang Liu et al.

CVPR 2024arXiv:2403.10254
49
citations
#340

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

Chaoyi Zhang, Kevin Lin, Zhengyuan Yang et al.

CVPR 2024highlightarXiv:2311.17435
49
citations
#341

On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?

Maxime Zanella, Ismail Ben Ayed

CVPR 2024arXiv:2405.02266
49
citations
#342

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

Yucheng Suo, Fan Ma, Linchao Zhu et al.

CVPR 2024arXiv:2403.16005
49
citations
#343

SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments

Shibo Zhao, Yuanjun Gao, Tianhao Wu et al.

CVPR 2024arXiv:2307.07607
49
citations
#344

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

Shuai Yang, Yifan Zhou, Ziwei Liu et al.

CVPR 2024arXiv:2403.12962
49
citations
#345

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

Baorui Ma, Huachen Gao, Haoge Deng et al.

CVPR 2025highlightarXiv:2412.06699
49
citations
#346

Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

Yijia Weng, Bowen Wen, Jonathan Tremblay et al.

CVPR 2024arXiv:2404.01440
49
citations
#347

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Jitesh Jain, Jianwei Yang, Humphrey Shi

CVPR 2024arXiv:2312.14233
48
citations
#348

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

Phillip Howard, Avinash Madasu, Tiep Le et al.

CVPR 2024arXiv:2312.00825
48
citations
#349

DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models

Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.

CVPR 2025arXiv:2503.02175
48
citations
#350

DAP: A Dynamic Adversarial Patch for Evading Person Detectors

Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif et al.

CVPR 2024arXiv:2305.11618
48
citations
#351

Neural Markov Random Field for Stereo Matching

Tongfan Guan, Chen Wang, Yun-Hui Liu

CVPR 2024arXiv:2403.11193
48
citations
#352

MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation

Mi Yan, Jiazhao Zhang, Yan Zhu et al.

CVPR 2024arXiv:2401.07745
48
citations
#353

Generating Human Motion in 3D Scenes from Text Descriptions

Zhi Cen, Huaijin Pi, Sida Peng et al.

CVPR 2024arXiv:2405.07784
48
citations
#354

Language-driven All-in-one Adverse Weather Removal

Hao Yang, Liyuan Pan, Yan Yang et al.

CVPR 2024arXiv:2312.01381
48
citations
#355

MatFuse: Controllable Material Generation with Diffusion Models

Giuseppe Vecchio, Renato Sortino, Simone Palazzo et al.

CVPR 2024arXiv:2308.11408
47
citations
#356

OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

Han Liang, Jiacheng Bao, Ruichi Zhang et al.

CVPR 2024arXiv:2312.08985
47
citations
#357

Mosaic-SDF for 3D Generative Models

Lior Yariv, Omri Puny, Oran Gafni et al.

CVPR 2024arXiv:2312.09222
47
citations
#358

JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

Yu Zeng, Vishal M. Patel, Haochen Wang et al.

CVPR 2024arXiv:2407.06187
47
citations
#359

One-Prompt to Segment All Medical Images

Wu, Min Xu

CVPR 2024arXiv:2305.10300
47
citations
#360

SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

JUNSU KIM, Hoseong Cho, Jihyeon Kim et al.

CVPR 2024highlightarXiv:2402.17323
47
citations
#361

Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks

Yuhao Liu, Zhanghan Ke, Fang Liu et al.

CVPR 2024arXiv:2403.00644
47
citations
#362

Digital Life Project: Autonomous 3D Characters with Social Intelligence

Zhongang Cai, Jianping Jiang, Zhongfei Qing et al.

CVPR 2024arXiv:2312.04547
46
citations
#363

PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation

Zhenyu Li, Shariq Bhat, Peter Wonka

CVPR 2024arXiv:2312.02284
46
citations
#364

Grounded Question-Answering in Long Egocentric Videos

Shangzhe Di, Weidi Xie

CVPR 2024arXiv:2312.06505
46
citations
#365

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Rui Chen, Jianfeng Zhang, Yixun Liang et al.

CVPR 2025arXiv:2412.17808
46
citations
#366

VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Enric Corona, Andrei Zanfir, Eduard Gabriel Bazavan et al.

CVPR 2025arXiv:2403.08764
46
citations
#367

M-LLM Based Video Frame Selection for Efficient Video Understanding

Kai Hu, Feng Gao, Xiaohan Nie et al.

CVPR 2025arXiv:2502.19680
46
citations
#368

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models

Fan-Yun Sun, Weiyu Liu, Siyi Gu et al.

CVPR 2025arXiv:2412.02193
46
citations
#369

SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction

Conghao Wong, Beihao Xia, Ziqian Zou et al.

CVPR 2024arXiv:2310.05370
45
citations
#370

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Linke Ouyang, Yuan Qu, Hongbin Zhou et al.

CVPR 2025arXiv:2412.07626
45
citations
#371

OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

Mingjie Pan, Jiyao Zhang, Tianshu Wu et al.

CVPR 2025highlightarXiv:2501.03841
45
citations
#372

Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping

Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti et al.

CVPR 2024arXiv:2312.04521
45
citations
#373

Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners

Keon Hee Park, Kyungwoo Song, Gyeong-Moon Park

CVPR 2024arXiv:2404.02117
45
citations
#374

SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment

Katrin Renz, Long Chen, Elahe Arani et al.

CVPR 2025highlightarXiv:2503.09594
45
citations
#375

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

Minghong Cai, Xiaodong Cun, Xiaoyu Li et al.

CVPR 2025arXiv:2412.18597
45
citations
#376

Point Segment and Count: A Generalized Framework for Object Counting

Zhizhong Huang, Mingliang Dai, Yi Zhang et al.

CVPR 2024arXiv:2311.12386
45
citations
#377

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

Lewei Yao, Renjie Pi, Jianhua Han et al.

CVPR 2024arXiv:2404.09216
45
citations
#378

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update

Zhi Gao, Yuntao Du., Xintong Zhang et al.

CVPR 2024arXiv:2312.10908
45
citations
#379

Improving Image Restoration through Removing Degradations in Textual Representations

Jingbo Lin, Zhilu Zhang, Yuxiang Wei et al.

CVPR 2024arXiv:2312.17334
45
citations
#380

LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry

Weirong Chen, Le Chen, Rui Wang et al.

CVPR 2024arXiv:2401.01887
45
citations
#381

Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

Lingchen Sun, Rongyuan Wu, Zhiyuan Ma et al.

CVPR 2025arXiv:2412.03017
45
citations
#382

GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction

Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang et al.

CVPR 2025arXiv:2412.10373
45
citations
#383

Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles

Rui Song, Chenwei Liang, Hu Cao et al.

CVPR 2024arXiv:2402.07635
45
citations
#384

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Kai Chen, Yunhao Gou, Runhui Huang et al.

CVPR 2025arXiv:2409.18042
44
citations
#385

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Haotong Lin, Sida Peng, Jingxiao Chen et al.

CVPR 2025arXiv:2412.14015
44
citations
#386

Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features

Youngmin Chung, Ji Hun Ha, Kyeong Chan Im et al.

CVPR 2024arXiv:2403.07592
44
citations
#387

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models

Lin Li, Haoyan Guan, Jianing Qiu et al.

CVPR 2024arXiv:2403.01849
44
citations
#388

Bridging Remote Sensors with Multisensor Geospatial Foundation Models

Boran Han, Shuai Zhang, Xingjian Shi et al.

CVPR 2024arXiv:2404.01260
44
citations
#389

S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data

Xuyang Li, Danfeng Hong, Jocelyn Chanussot

CVPR 2024
44
citations
#390

Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges

Tongtong Yuan, Xuange Zhang, Kun Liu et al.

CVPR 2024arXiv:2309.13925
44
citations
#391

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

Jianjian Cao, Peng Ye, Shengze Li et al.

CVPR 2024arXiv:2403.02991
44
citations
#392

LightIt: Illumination Modeling and Control for Diffusion Models

Peter Kocsis, Kalyan Sunkavalli, Julien Philip et al.

CVPR 2024arXiv:2403.10615
44
citations
#393

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision

Yi Yu, Xue Yang, Qingyun Li et al.

CVPR 2024arXiv:2311.14758
43
citations
#394

LEAD: Learning Decomposition for Source-free Universal Domain Adaptation

Sanqing Qu, Tianpei Zou, Lianghua He et al.

CVPR 2024arXiv:2403.03421
43
citations
#395

Posterior Distillation Sampling

Juil Koo, Chanho Park, Minhyuk Sung

CVPR 2024arXiv:2311.13831
43
citations
#396

MET3R: Measuring Multi-View Consistency in Generated Images

Mohammad Asim, Christopher Wewer, Thomas Wimmer et al.

CVPR 2025arXiv:2501.06336
43
citations
#397

DAVE - A Detect-and-Verify Paradigm for Low-Shot Counting

Jer Pelhan, Alan Lukezic, Vitjan Zavrtanik et al.

CVPR 2024arXiv:2404.16622
43
citations
#398

Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation

Zhipeng Du, Miaojing Shi, Jiankang Deng

CVPR 2024arXiv:2312.01220
43
citations
#399

Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring

Xin Gao, Tianheng Qiu, Xinyu Zhang et al.

CVPR 2024arXiv:2401.00027
43
citations
#400

4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations

Wenbo Wang, Hsuan-I Ho, Chen Guo et al.

CVPR 2024highlightarXiv:2404.18630
43
citations