Most Cited CVPR "movingai benchmark" Papers

5,589 papers found • Page 7 of 28

#1201

USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation

Xiaoqi Wang, Wenbin He, Xiwei Xuan et al.

CVPR 2024posterarXiv:2406.05271
13
citations
#1202

UniHuman: A Unified Model For Editing Human Images in the Wild

Nannan Li, Qing Liu, Krishna Kumar Singh et al.

CVPR 2024posterarXiv:2312.14985
13
citations
#1203

ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models

Hao Yin, Guangzong Si, Zilei Wang

CVPR 2025posterarXiv:2503.13107
13
citations
#1204

Scaling Inference Time Compute for Diffusion Models

Nanye Ma, Shangyuan Tong, Haolin Jia et al.

CVPR 2025highlight
13
citations
#1205

FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models

Haokun Chen, Hang Li, Yao Zhang et al.

CVPR 2025posterarXiv:2410.04810
13
citations
#1206

HRAvatar: High-Quality and Relightable Gaussian Head Avatar

Dongbin Zhang, Yunfei Liu, Lijian Lin et al.

CVPR 2025posterarXiv:2503.08224
13
citations
#1207

Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments

Luke Rowe, Roger Girgis, Anthony Gosselin et al.

CVPR 2025posterarXiv:2503.22496
13
citations
#1208

Partial-to-Partial Shape Matching with Geometric Consistency

Viktoria Ehm, Maolin Gao, Paul Roetzer et al.

CVPR 2024posterarXiv:2404.12209
13
citations
#1209

Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D

Mukund Varma T, Peihao Wang, Zhiwen Fan et al.

CVPR 2024posterarXiv:2403.18922
13
citations
#1210

ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression

Wei Jiang, Junru Li, Kai Zhang et al.

CVPR 2025posterarXiv:2410.09706
13
citations
#1211

How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?

Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.

CVPR 2024posterarXiv:2403.07203
13
citations
#1212

SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation

Jiaben Chen, Huaizu Jiang

CVPR 2024posterarXiv:2308.16876
13
citations
#1213

Synthetic Data is an Elegant GIFT for Continual Vision-Language Models

Bin Wu, Wuxuan Shi, Jinqiao Wang et al.

CVPR 2025posterarXiv:2503.04229
13
citations
#1214

MVSAnywhere: Zero-Shot Multi-View Stereo

Sergio Izquierdo, Mohamed Sayed, Michael Firman et al.

CVPR 2025posterarXiv:2503.22430
13
citations
#1215

R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning

Lijun Sheng, Jian Liang, Zilei Wang et al.

CVPR 2025posterarXiv:2504.11195
13
citations
#1216

CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs

Yingji Zhong, Lanqing Hong, Zhenguo Li et al.

CVPR 2024posterarXiv:2403.16885
13
citations
#1217

Novel Class Discovery for Ultra-Fine-Grained Visual Categorization

Qi Jia, Yaqi Cai, Qi Jia et al.

CVPR 2024highlightarXiv:2405.06283
13
citations
#1218

3D Multi-frame Fusion for Video Stabilization

Zhan Peng, Xinyi Ye, Weiyue Zhao et al.

CVPR 2024posterarXiv:2404.12887
13
citations
#1219

PH-Net: Semi-Supervised Breast Lesion Segmentation via Patch-wise Hardness

Siyao Jiang, Huisi Wu, Junyang Chen et al.

CVPR 2024poster
13
citations
#1220

Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis

Yu Yuan, Xijun Wang, Yichen Sheng et al.

CVPR 2025highlightarXiv:2412.02168
13
citations
#1221

VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning

Xueqing Wu, Yuheng Ding, Bingxuan Li et al.

CVPR 2025posterarXiv:2412.02172
13
citations
#1222

Personalized Preference Fine-tuning of Diffusion Models

Meihua Dang, Anikait Singh, Linqi Zhou et al.

CVPR 2025posterarXiv:2501.06655
13
citations
#1223

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

Runhui Huang, Xinpeng Ding, Chunwei Wang et al.

CVPR 2025posterarXiv:2407.08706
13
citations
#1224

LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation

Nisarg Shah, Vibashan VS, Vishal M. Patel

CVPR 2024poster
13
citations
#1225

MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

Zeren Jiang, Chen Guo, Manuel Kaufmann et al.

CVPR 2024posterarXiv:2406.01595
13
citations
#1226

AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification

Huy Nguyen, Kien Nguyen Thanh, Akila Pemasiri et al.

CVPR 2025posterarXiv:2503.08121
13
citations
#1227

DRAWER: Digital Reconstruction and Articulation With Environment Realism

Hongchi Xia, Entong Su, Marius Memmel et al.

CVPR 2025posterarXiv:2504.15278
13
citations
#1228

ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models

Ozgur Kara, Krishna Kumar Singh, Feng Liu et al.

CVPR 2025posterarXiv:2505.07652
13
citations
#1229

Single-View Scene Point Cloud Human Grasp Generation

Yan-Kang Wang, Chengyi Xing, Yi-Lin Wei et al.

CVPR 2024posterarXiv:2404.15815
13
citations
#1230

UniNet: A Contrastive Learning-guided Unified Framework with Feature Selection for Anomaly Detection

Shun Wei, Jielin Jiang, Xiaolong Xu

CVPR 2025poster
13
citations
#1231

ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics

Junchao Zhu, Ruining Deng, Tianyuan Yao et al.

CVPR 2025posterarXiv:2412.03026
13
citations
#1232

Move Anything with Layered Scene Diffusion

Jiawei Ren, Mengmeng Xu, Jui-Chieh Wu et al.

CVPR 2024posterarXiv:2404.07178
13
citations
#1233

F3Loc: Fusion and Filtering for Floorplan Localization

Changan Chen, Rui Wang, Christoph Vogel et al.

CVPR 2024highlight
13
citations
#1234

3D Neural Edge Reconstruction

Lei Li, Songyou Peng, Zehao Yu et al.

CVPR 2024posterarXiv:2405.19295
13
citations
#1235

Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching

Bin Wang, Fan Wu, Linke Ouyang et al.

CVPR 2025posterarXiv:2409.03643
13
citations
#1236

RoDLA: Benchmarking the Robustness of Document Layout Analysis Models

Yufan Chen, Jiaming Zhang, Kunyu Peng et al.

CVPR 2024posterarXiv:2403.14442
13
citations
#1237

Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability Composability and Decomposability from Anatomy via Self Supervision

Mohammad Reza Hosseinzadeh Taher, Michael Gotway, Jianming Liang

CVPR 2024poster
13
citations
#1238

SketchINR: A First Look into Sketches as Implicit Neural Representations

Hmrishav Bandyopadhyay, Ayan Kumar Bhunia, Pinaki Nath Chowdhury et al.

CVPR 2024posterarXiv:2403.09344
13
citations
#1239

Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability

Yingdong Shi, Changming Li, Yifan Wang et al.

CVPR 2025posterarXiv:2503.20483
13
citations
#1240

Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter

Zhengyi Zhong, Weidong Bao, Ji Wang et al.

CVPR 2025posterarXiv:2502.20709
13
citations
#1241

JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups

Simindokht Jahangard, Zhixi Cai, Shiki Wen et al.

CVPR 2024posterarXiv:2404.04458
13
citations
#1242

Event-based Video Super-Resolution via State Space Models

Zeyu Xiao, Xinchao Wang

CVPR 2025poster
13
citations
#1243

The Power of Context: How Multimodality Improves Image Super-Resolution

Kangfu Mei, Vishal M. Patel, Mojtaba Sahraee-Ardakan et al.

CVPR 2025posterarXiv:2503.14503
12
citations
#1244

Completion as Enhancement: A Degradation-Aware Selective Image Guided Network for Depth Completion

Zhiqiang Yan, Zhengxue Wang, Kun Wang et al.

CVPR 2025posterarXiv:2412.19225
12
citations
#1245

MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation

Yukang Lin, Hokit Fung, Jianjin Xu et al.

CVPR 2025posterarXiv:2503.19383
12
citations
#1246

NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting

Yulong Zheng, Zicheng Jiang, Shengfeng He et al.

CVPR 2025highlightarXiv:2503.18794
12
citations
#1247

Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation

Qi Lv, Hao Li, Xiang Deng et al.

CVPR 2025posterarXiv:2503.10743
12
citations
#1248

Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning

Rashindrie Perera, Saman Halgamuge

CVPR 2024posterarXiv:2403.04492
12
citations
#1249

DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations

Ziqiao Peng, Yanbo Fan, Haoyu Wu et al.

CVPR 2025posterarXiv:2505.18096
12
citations
#1250

Scaling Vision Pre-Training to 4K Resolution

Baifeng Shi, Boyi Li, Han Cai et al.

CVPR 2025highlightarXiv:2503.19903
12
citations
#1251

MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting

Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong et al.

CVPR 2025posterarXiv:2501.03714
12
citations
#1252

DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters

Mingze Sun, Junting Dong, Junhao Chen et al.

CVPR 2025posterarXiv:2411.17423
12
citations
#1253

Prototype-Based Image Prompting for Weakly Supervised Histopathological Image Segmentation

Qingchen Tang, Lei Fan, Maurice Pagnucco et al.

CVPR 2025posterarXiv:2503.12068
12
citations
#1254

Mr. DETR: Instructive Multi-Route Training for Detection Transformers

Chang-Bin Zhang, Yujie Zhong, Kai Han

CVPR 2025poster
12
citations
#1255

Federated Online Adaptation for Deep Stereo

Matteo Poggi, Fabio Tosi

CVPR 2024posterarXiv:2405.14873
12
citations
#1256

Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing

Ruiyi Wang, Yushuo Zheng, Zicheng Zhang et al.

CVPR 2025posterarXiv:2503.19262
12
citations
#1257

Generative Powers of Ten

Xiaojuan Wang, Janne Kontkanen, Brian Curless et al.

CVPR 2024highlightarXiv:2312.02149
12
citations
#1258

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

Kun Liu, Qi Liu, Xinchen Liu et al.

CVPR 2025posterarXiv:2503.23715
12
citations
#1259

VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding

Kangsan Kim, Geon Park, Youngwan Lee et al.

CVPR 2025posterarXiv:2412.02186
12
citations
#1260

Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models

Luo Jiayun, Siddhesh Khandelwal, Leonid Sigal et al.

CVPR 2024posterarXiv:2311.17095
12
citations
#1261

Accelerating Neural Field Training via Soft Mining

Shakiba Kheradmand, Daniel Rebain, Gopal Sharma et al.

CVPR 2024posterarXiv:2312.00075
12
citations
#1262

Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains

Eunsu Baek, Keondo Park, Ji-yoon Kim et al.

CVPR 2024posterarXiv:2404.15882
12
citations
#1263

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation

Guy Yariv, Yuval Kirstain, Amit Zohar et al.

CVPR 2025posterarXiv:2501.03059
12
citations
#1264

PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing

Peng Li, Wangguandong Zheng, Yuan Liu et al.

CVPR 2025posterarXiv:2409.10141
12
citations
#1265

Unsupervised Gaze Representation Learning from Multi-view Face Images

Yiwei Bao, Feng Lu

CVPR 2024poster
12
citations
#1266

ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning

Zhenyang Liu, Yikai Wang, Sixiao Zheng et al.

CVPR 2025posterarXiv:2503.23297
12
citations
#1267

VisionArena: 230k Real World User-VLM Conversations with Preference Labels

Christopher Chou, Lisa Dunlap, Wei-Lin Chiang et al.

CVPR 2025posterarXiv:2412.08687
12
citations
#1268

SmartEraser: Remove Anything from Images using Masked-Region Guidance

Longtao Jiang, Zhendong Wang, Jianmin Bao et al.

CVPR 2025posterarXiv:2501.08279
12
citations
#1269

Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion

Eunji Kim, Siwon Kim, Minjun Park et al.

CVPR 2025posterarXiv:2408.12692
12
citations
#1270

HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding

Shehreen Azad, Vibhav Vineet, Yogesh S. Rawat

CVPR 2025posterarXiv:2503.08585
12
citations
#1271

AKiRa: Augmentation Kit on Rays for Optical Video Generation

Xi Wang, Robin Courant, Marc Christie et al.

CVPR 2025posterarXiv:2412.14158
12
citations
#1272

STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models

Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer et al.

CVPR 2025highlightarXiv:2408.16807
12
citations
#1273

Correcting Diffusion Generation through Resampling

Yujian Liu, Yang Zhang, Tommi Jaakkola et al.

CVPR 2024highlightarXiv:2312.06038
12
citations
#1274

EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild

Yumeng Liu, Xiaoxiao Long, Zemin Yang et al.

CVPR 2025posterarXiv:2411.14280
12
citations
#1275

Unsupervised Foundation Model-Agnostic Slide-Level Representation Learning

Tim Lenz, Peter Neidlinger, Marta Ligero et al.

CVPR 2025posterarXiv:2411.13623
12
citations
#1276

Image Generation Diversity Issues and How to Tame Them

Mischa Dombrowski, Weitong Zhang, Hadrien Reynaud et al.

CVPR 2025posterarXiv:2411.16171
12
citations
#1277

Universal Novelty Detection Through Adaptive Contrastive Learning

Hossein Mirzaei, Mojtaba Nafez, Mohammad Jafari et al.

CVPR 2024posterarXiv:2408.10798
12
citations
#1278

Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction

Xiaoyang Lyu, Chirui Chang, Peng Dai et al.

CVPR 2024highlightarXiv:2403.19314
12
citations
#1279

RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation

Mingfei Han, Liang Ma, Kamila Zhumakhanova et al.

CVPR 2025posterarXiv:2412.08591
12
citations
#1280

Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM

Linyu Tang, Lei Zhang

CVPR 2024posterarXiv:2403.11448
12
citations
#1281

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

Wei Chen, Lin Li, Yongqi Yang et al.

CVPR 2025highlightarXiv:2406.10462
12
citations
#1282

Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model

Leheng Zhang, Weiyi You, Kexuan Shi et al.

CVPR 2025posterarXiv:2503.18512
12
citations
#1283

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

Zhixuan Liang, Yao Mu, Yixiao Wang et al.

CVPR 2025posterarXiv:2411.18562
12
citations
#1284

Patient-Level Anatomy Meets Scanning-Level Physics: Personalized Federated Low-Dose CT Denoising Empowered by Large Language Model

Ziyuan Yang, Yingyu Chen, Zhiwen Wang et al.

CVPR 2025posterarXiv:2503.00908
12
citations
#1285

MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities

Bizhu Wu, Jinheng Xie, Keming Shen et al.

CVPR 2025posterarXiv:2504.02478
12
citations
#1286

Boosting Adversarial Training via Fisher-Rao Norm-based Regularization

Xiangyu Yin, Wenjie Ruan

CVPR 2024posterarXiv:2403.17520
12
citations
#1287

MotionPro: A Precise Motion Controller for Image-to-Video Generation

Zhongwei Zhang, Fuchen Long, Zhaofan Qiu et al.

CVPR 2025posterarXiv:2505.20287
12
citations
#1288

Asymmetric Masked Distillation for Pre-Training Small Foundation Models

Zhiyu Zhao, Bingkun Huang, Sen Xing et al.

CVPR 2024posterarXiv:2311.03149
12
citations
#1289

Weakly Supervised Monocular 3D Detection with a Single-View Image

Xueying Jiang, Sheng Jin, Lewei Lu et al.

CVPR 2024posterarXiv:2402.19144
12
citations
#1290

VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation

Saksham Singh Kushwaha, Yapeng Tian

CVPR 2025posterarXiv:2412.10768
12
citations
#1291

Improving Bird's Eye View Semantic Segmentation by Task Decomposition

Tianhao Zhao, Yongcan Chen, Yu Wu et al.

CVPR 2024posterarXiv:2404.01925
12
citations
#1292

TexOct: Generating Textures of 3D Models with Octree-based Diffusion

Jialun Liu, Chenming Wu, Xinqi Liu et al.

CVPR 2024poster
12
citations
#1293

EgoLM: Multi-Modal Language Model of Egocentric Motions

Fangzhou Hong, Vladimir Guzov, Hyo Jin Kim et al.

CVPR 2025posterarXiv:2409.18127
12
citations
#1294

SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing

Seokhyeon Hong, Chaelin Kim, Serin Yoon et al.

CVPR 2025posterarXiv:2503.13836
12
citations
#1295

Discover and Mitigate Multiple Biased Subgroups in Image Classifiers

Zeliang Zhang, Mingqian Feng, Zhiheng Li et al.

CVPR 2024posterarXiv:2403.12777
12
citations
#1296

Zero-Shot Monocular Scene Flow Estimation in the Wild

Yiqing Liang, Abhishek Badki, Hang Su et al.

CVPR 2025posterarXiv:2501.10357
12
citations
#1297

CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers

Shahaf Arica, Or Rubin, Sapir Gershov et al.

CVPR 2024posterarXiv:2403.07700
12
citations
#1298

Functional Diffusion

Biao Zhang, Peter Wonka

CVPR 2024posterarXiv:2311.15435
12
citations
#1299

PointInfinity: Resolution-Invariant Point Diffusion Models

Zixuan Huang, Justin Johnson, Shoubhik Debnath et al.

CVPR 2024posterarXiv:2404.03566
12
citations
#1300

PairAug: What Can Augmented Image-Text Pairs Do for Radiology?

Yutong Xie, Qi Chen, Sinuo Wang et al.

CVPR 2024posterarXiv:2404.04960
12
citations
#1301

Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content

Rohit Kundu, Hao Xiong, Vishal Mohanty et al.

CVPR 2025posterarXiv:2412.12278
12
citations
#1302

METASCENES: Towards Automated Replica Creation for Real-world 3D Scans

Huangyue Yu, Baoxiong Jia, Yixin Chen et al.

CVPR 2025posterarXiv:2505.02388
12
citations
#1303

The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models

Naveen George, Karthik Nandan Dasaraju, Rutheesh Reddy Chittepu et al.

CVPR 2025poster
12
citations
#1304

MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model

Chenjie Cao, Chaohui Yu, Shang Liu et al.

CVPR 2025posterarXiv:2411.16157
12
citations
#1305

StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation

Shangjin Zhai, Zhichao Ye, Jialin Liu et al.

CVPR 2025posterarXiv:2501.05763
12
citations
#1306

Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis

Zicheng Zhang, RUOBING ZHENG, Bonan Li et al.

CVPR 2024posterarXiv:2402.17364
12
citations
#1307

StraightPCF: Straight Point Cloud Filtering

Dasith de Silva Edirimuni, Xuequan Lu, Gang Li et al.

CVPR 2024posterarXiv:2405.08322
11
citations
#1308

Quantifying Task Priority for Multi-Task Optimization

Wooseong Jeong, Kuk-Jin Yoon

CVPR 2024posterarXiv:2406.02996
11
citations
#1309

OmniStyle: Filtering High Quality Style Transfer Data at Scale

Ye Wang, Ruiqi Liu, Jiang Lin et al.

CVPR 2025posterarXiv:2505.14028
11
citations
#1310

Finsler-Laplace-Beltrami Operators with Application to Shape Analysis

Simon Weber, Thomas Dagès, Maolin Gao et al.

CVPR 2024posterarXiv:2404.03999
11
citations
#1311

FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video

Yue Gao, Hong-Xing Yu, Bo Zhu et al.

CVPR 2025posterarXiv:2503.04720
11
citations
#1312

Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models

Jinho Jeong, Sangmin Han, Jinwoo Kim et al.

CVPR 2025posterarXiv:2503.18446
11
citations
#1313

4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video

Qiang Hu, Zihan Zheng, Houqiang Zhong et al.

CVPR 2025posterarXiv:2503.18421
11
citations
#1314

TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

Jianchuan Chen, Jingchuan Hu, Gaige Wang et al.

CVPR 2025highlightarXiv:2503.17032
11
citations
#1315

nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark

Yanfeng Zhou, Lingrui Li, Le Lu et al.

CVPR 2025poster
11
citations
#1316

Rectified Diffusion Guidance for Conditional Generation

Mengfei Xia, Nan Xue, Yujun Shen et al.

CVPR 2025posterarXiv:2410.18737
11
citations
#1317

From Activation to Initialization: Scaling Insights for Optimizing Neural Fields

Hemanth Saratchandran, Sameera Ramasinghe, Simon Lucey

CVPR 2024highlightarXiv:2403.19205
11
citations
#1318

ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting

Shaofei Cai, Zihao Wang, Kewei Lian et al.

CVPR 2025posterarXiv:2410.17856
11
citations
#1319

Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World

Wen Yin, Jian Lou, Pan Zhou et al.

CVPR 2024posterarXiv:2404.19417
11
citations
#1320

From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing

Jingxuan Wei, Cheng Tan, Qi Chen et al.

CVPR 2025highlightarXiv:2411.11916
11
citations
#1321

LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

Enis Simsar, Thomas Hofmann, Federico Tombari et al.

CVPR 2025posterarXiv:2412.09622
11
citations
#1322

VladVA: Discriminative Fine-tuning of LVLMs

Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.

CVPR 2025posterarXiv:2412.04378
11
citations
#1323

SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation

Duc-Hai Pham, Tung Do, Phong Nguyen et al.

CVPR 2025posterarXiv:2411.18229
11
citations
#1324

ExpertAF: Expert Actionable Feedback from Video

Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos et al.

CVPR 2025posterarXiv:2408.00672
11
citations
#1325

Semantic and Sequential Alignment for Referring Video Object Segmentation

Feiyu Pan, Hao Fang, Fangkai Li et al.

CVPR 2025poster
11
citations
#1326

CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning

Hyuck Lee, Heeyoung Kim

CVPR 2024posterarXiv:2403.10391
11
citations
#1327

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

Trong-Tung Nguyen, Quang Nguyen, Khoi Nguyen et al.

CVPR 2025posterarXiv:2412.04301
11
citations
#1328

MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation

Huaize Liu, WenZhang Sun, Donglin Di et al.

CVPR 2025posterarXiv:2501.01808
11
citations
#1329

Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images

JungEun Kim, Hangyul Yoon, Geondo Park et al.

CVPR 2024posterarXiv:2404.01464
11
citations
#1330

SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Yangliu Hu, Zikai Song, Na Feng et al.

CVPR 2025posterarXiv:2504.07745
11
citations
#1331

VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

Hanyang Wang, Fangfu Liu, Jiawei Chi et al.

CVPR 2025highlightarXiv:2504.01956
11
citations
#1332

ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models

Junzhe Chen, Tianshu Zhang, Shiyu Huang et al.

CVPR 2025posterarXiv:2411.15268
11
citations
#1333

GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors

An Li, Zhe Zhu, Mingqiang Wei

CVPR 2025posterarXiv:2502.19896
11
citations
#1334

NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction

Wenyuan Zhang, Emily Yue-ting Jia, Junsheng Zhou et al.

CVPR 2025highlightarXiv:2503.18361
11
citations
#1335

GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting

Shujuan Li, Yu-Shen Liu, Zhizhong Han

CVPR 2025highlightarXiv:2503.19458
11
citations
#1336

Consistent and Controllable Image Animation with Motion Diffusion Models

Xin Ma, Yaohui Wang, Gengyun Jia et al.

CVPR 2025posterarXiv:2407.15642
11
citations
#1337

SpiritSight Agent: Advanced GUI Agent with One Look

Zhiyuan Huang, Ziming Cheng, Junting Pan et al.

CVPR 2025posterarXiv:2503.03196
11
citations
#1338

SIGNeRF: Scene Integrated Generation for Neural Radiance Fields

Jan-Niklas Dihlmann, Andreas Engelhardt, Hendrik Lensch

CVPR 2024posterarXiv:2401.01647
11
citations
#1339

One-for-More: Continual Diffusion Model for Anomaly Detection

Xiaofan Li, Xin Tan, Zhuo Chen et al.

CVPR 2025posterarXiv:2502.19848
11
citations
#1340

SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos

Changan Chen, Kumar Ashutosh, Rohit Girdhar et al.

CVPR 2024posterarXiv:2404.05206
11
citations
#1341

Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes

Lihan Jiang, Kerui Ren, Mulin Yu et al.

CVPR 2025posterarXiv:2412.01745
11
citations
#1342

NEAT: Distilling 3D Wireframes from Neural Attraction Fields

Nan Xue, Bin Tan, Yuxi Xiao et al.

CVPR 2024posterarXiv:2307.10206
11
citations
#1343

Instance Tracking in 3D Scenes from Egocentric Videos

Yunhan Zhao, Haoyu Ma, Shu Kong et al.

CVPR 2024posterarXiv:2312.04117
11
citations
#1344

GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting

Yangming Zhang, Wenqi Jia, Wei Niu et al.

CVPR 2025posterarXiv:2411.06019
11
citations
#1345

Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion

Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.

CVPR 2025posterarXiv:2504.00430
11
citations
#1346

Bridging the Gap Between End-to-End and Two-Step Text Spotting

Mingxin Huang, Hongliang Li, Yuliang Liu et al.

CVPR 2024posterarXiv:2404.04624
11
citations
#1347

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation

Haoyu Guo, He Zhu, Sida Peng et al.

CVPR 2025posterarXiv:2503.14483
11
citations
#1348

DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering

Yexing Xu, Longguang Wang, Minglin Chen et al.

CVPR 2025posterarXiv:2504.09491
11
citations
#1349

HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator

Fan Yang, Ru Zhen, Jianing Wang et al.

CVPR 2025posterarXiv:2411.17261
11
citations
#1350

BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers

Hui Zhang, Tingwei Gao, Jie Shao et al.

CVPR 2025posterarXiv:2503.15927
11
citations
#1351

Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning

Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto

CVPR 2025posterarXiv:2412.18219
11
citations
#1352

Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball

Simon Weber, Barış Zöngür, Nikita Araslanov et al.

CVPR 2024posterarXiv:2404.03778
11
citations
#1353

LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models

Jian Liang, Wenke Huang, Guancheng Wan et al.

CVPR 2025posterarXiv:2503.16843
11
citations
#1354

DeIL: Direct-and-Inverse CLIP for Open-World Few-Shot Learning

Shuai Shao, Yu Bai, Yan WANG et al.

CVPR 2024poster
11
citations
#1355

BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations

Weixi Feng, Chao Liu, Sifei Liu et al.

CVPR 2025posterarXiv:2501.07647
11
citations
#1356

Lifting Motion to the 3D World via 2D Diffusion

Jiaman Li, Karen Liu, Jiajun Wu

CVPR 2025highlightarXiv:2411.18808
11
citations
#1357

Instant Adversarial Purification with Adversarial Consistency Distillation

Chun Tong Lei, Hon Ming Yam, Zhongliang Guo et al.

CVPR 2025posterarXiv:2408.17064
11
citations
#1358

DREAM: Diffusion Rectification and Estimation-Adaptive Models

Jinxin Zhou, Tianyu Ding, Tianyi Chen et al.

CVPR 2024posterarXiv:2312.00210
11
citations
#1359

Causal Composition Diffusion Model for Closed-loop Traffic Generation

Haohong Lin, Xin Huang, Tung Phan-Minh et al.

CVPR 2025posterarXiv:2412.17920
11
citations
#1360

PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor

Jaewon Jung, Hongsun Jang, Jaeyong Song et al.

CVPR 2024posterarXiv:2403.06668
11
citations
#1361

EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis

Sheng Miao, Jiaxin Huang, Dongfeng Bai et al.

CVPR 2025posterarXiv:2503.20168
11
citations
#1362

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers

Haoyu Ma, Shahin Mahdizadehaghdam, Bichen Wu et al.

CVPR 2024posterarXiv:2312.12468
11
citations
#1363

Privacy-Preserving Optics for Enhancing Protection in Face De-Identification

Jhon Lopez, Carlos Hinojosa, Henry Arguello et al.

CVPR 2024posterarXiv:2404.00777
11
citations
#1364

Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution

Qingping Zheng, Ling Zheng, Yuanfan Guo et al.

CVPR 2024posterarXiv:2403.16643
11
citations
#1365

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion

Xiaomeng Chu, Jiajun Deng, Guoliang You et al.

CVPR 2025posterarXiv:2412.12725
11
citations
#1366

Audio-Visual Instance Segmentation

Ruohao Guo, Xianghua Ying, Yaru Chen et al.

CVPR 2025posterarXiv:2310.18709
11
citations
#1367

NoT: Federated Unlearning via Weight Negation

Yasser Khalil, Leo Maxime Brunswic, Soufiane Lamghari et al.

CVPR 2025posterarXiv:2503.05657
11
citations
#1368

DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution

Xingyuan Li, Zirui Wang, Yang Zou et al.

CVPR 2025posterarXiv:2503.01187
11
citations
#1369

Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting

Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen

CVPR 2025posterarXiv:2504.01957
11
citations
#1370

FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning

Gaojian Wang, Feng Lin, Tong Wu et al.

CVPR 2025posterarXiv:2412.12032
11
citations
#1371

BiPer: Binary Neural Networks using a Periodic Function

Edwin Vargas, Claudia Correa, Carlos Hinojosa et al.

CVPR 2024posterarXiv:2404.01278
11
citations
#1372

TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model

Meilong Xu, Saumya Gupta, Xiaoling Hu et al.

CVPR 2025posterarXiv:2412.06011
11
citations
#1373

Towards Understanding and Improving Adversarial Robustness of Vision Transformers

Samyak Jain, Tanima Dutta

CVPR 2024poster
11
citations
#1374

UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures

Mingyuan Zhou, Rakib Hyder, Ziwei Xuan et al.

CVPR 2024posterarXiv:2401.11078
11
citations
#1375

Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection

Jiahao Xu, Zikai Zhang, Rui Hu

CVPR 2025highlightarXiv:2503.07978
11
citations
#1376

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

Muzhi Zhu, Yuzhuo Tian, Hao Chen et al.

CVPR 2025posterarXiv:2503.08625
11
citations
#1377

Robust Depth Enhancement via Polarization Prompt Fusion Tuning

Kei IKEMURA, Yiming Huang, Felix Heide et al.

CVPR 2024posterarXiv:2404.04318
11
citations
#1378

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong et al.

CVPR 2024posterarXiv:2401.14405
11
citations
#1379

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning

Jiange Yang, Haoyi Zhu, Yating Wang et al.

CVPR 2025posterarXiv:2411.14519
11
citations
#1380

Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs

Yingji Zhong, Zhihao Li, Dave Zhenyu Chen et al.

CVPR 2025highlightarXiv:2503.05082
11
citations
#1381

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

Shiyao Li, Yingchun Hu, Xuefei Ning et al.

CVPR 2025posterarXiv:2412.19509
10
citations
#1382

HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution

Yuxuan Jiang, Ho Man Kwan, jasmine peng et al.

CVPR 2025posterarXiv:2412.03748
10
citations
#1383

Single Mesh Diffusion Models with Field Latents for Texture Generation

Thomas W. Mitchel, Carlos Esteves, Ameesh Makadia

CVPR 2024posterarXiv:2312.09250
10
citations
#1384

Spectrum AUC Difference (SAUCD): Human-aligned 3D Shape Evaluation

Tianyu Luan, Zhong Li, Lele Chen et al.

CVPR 2024posterarXiv:2403.01619
10
citations
#1385

Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks

Tiago Novello, Diana Aldana Moreno, André Araujo et al.

CVPR 2025highlightarXiv:2407.21121
10
citations
#1386

Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation

Ting Liu, Siyuan Li

CVPR 2025posterarXiv:2504.00356
10
citations
#1387

Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances

Yi Yu, Botao Ren, Peiyuan Zhang et al.

CVPR 2025posterarXiv:2502.04268
10
citations
#1388

Task-Agnostic Guided Feature Expansion for Class-Incremental Learning

Bowen Zheng, Da-Wei Zhou, Han-Jia Ye et al.

CVPR 2025posterarXiv:2503.00823
10
citations
#1389

Towards Automated Movie Trailer Generation

Dawit Argaw Argaw, Mattia Soldan, Alejandro Pardo et al.

CVPR 2024posterarXiv:2404.03477
10
citations
#1390

Splatter-360: Generalizable 360 Gaussian Splatting for Wide-baseline Panoramic Images

Zheng Chen, Chenming Wu, Zhelun Shen et al.

CVPR 2025poster
10
citations
#1391

Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation

Lior Talker, Aviad Cohen, Erez Yosef et al.

CVPR 2024posterarXiv:2212.05315
10
citations
#1392

ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention

Jiawei Wang, Changjian Li

CVPR 2024posterarXiv:2311.16682
10
citations
#1393

Disentangled Pre-training for Human-Object Interaction Detection

Zhuolong Li, Xingao Li, Changxing Ding et al.

CVPR 2024posterarXiv:2404.01725
10
citations
#1394

ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding

Zhenxing Zhang, Yaxiong Wang, Lechao Cheng et al.

CVPR 2025posterarXiv:2412.12718
10
citations
#1395

Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning

Tung Le, Khai Nguyen, Shanlin Sun et al.

CVPR 2024posterarXiv:2403.01781
10
citations
#1396

GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration

Yuchen Sun, Shanhui Zhao, Tao Yu et al.

CVPR 2025posterarXiv:2503.17709
10
citations
#1397

SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

Mingxuan Liu, Tyler Hayes, Elisa Ricci et al.

CVPR 2024highlightarXiv:2405.10053
10
citations
#1398

FedMIA: An Effective Membership Inference Attack Exploiting "All for One" Principle in Federated Learning

Gongxi Zhu, Donghao Li, Hanlin Gu et al.

CVPR 2025poster
10
citations
#1399

MemoNav: Working Memory Model for Visual Navigation

Hongxin Li, Zeyu Wang, Xu Yang et al.

CVPR 2024highlightarXiv:2402.19161
10
citations
#1400

Unsigned Orthogonal Distance Fields: An Accurate Neural Implicit Representation for Diverse 3D Shapes

YuJie Lu, Long Wan, Nayu Ding et al.

CVPR 2024posterarXiv:2403.01414
10
citations