Most Cited CVPR "sliding window inference" Papers

5,589 papers found • Page 12 of 28

#2201

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

Tiehan Fan, Kepan Nan, Rui Xie et al.

CVPR 2025arXiv:2412.09283
14
citations
#2202

LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition

Zhonglin Sun, Chen Feng, Ioannis Patras et al.

CVPR 2024arXiv:2403.08161
14
citations
#2203

Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance

Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros Potamias et al.

CVPR 2025arXiv:2501.05379
14
citations
#2204

Unifying Automatic and Interactive Matting with Pretrained ViTs

Zixuan Ye, Wenze Liu, He Guo et al.

CVPR 2024
14
citations
#2205

DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations

Ziqiao Peng, Yanbo Fan, Haoyu Wu et al.

CVPR 2025arXiv:2505.18096
14
citations
#2206

X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition

Shuofeng Sun, Yongming Rao, Jiwen Lu et al.

CVPR 2024arXiv:2404.15010
14
citations
#2207

Prototype-Based Image Prompting for Weakly Supervised Histopathological Image Segmentation

Qingchen Tang, Lei Fan, Maurice Pagnucco et al.

CVPR 2025arXiv:2503.12068
14
citations
#2208

Bootstrapping Autonomous Driving Radars with Self-Supervised Learning

Yiduo Hao, Sohrab Madani, Junfeng Guan et al.

CVPR 2024arXiv:2312.04519
14
citations
#2209

ChatGarment: Garment Estimation, Generation and Editing via Large Language Models

Siyuan Bian, Chenghao Xu, Yuliang Xiu et al.

CVPR 2025arXiv:2412.17811
14
citations
#2210

MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

Zeren Jiang, Chen Guo, Manuel Kaufmann et al.

CVPR 2024arXiv:2406.01595
14
citations
#2211

L2B: Learning to Bootstrap Robust Models for Combating Label Noise

Yuyin Zhou, Xianhang li, Fengze Liu et al.

CVPR 2024arXiv:2202.04291
14
citations
#2212

In2SET: Intra-Inter Similarity Exploiting Transformer for Dual-Camera Compressive Hyperspectral Imaging

Xin Wang, Lizhi Wang, Xiangtian Ma et al.

CVPR 2024arXiv:2312.13319
14
citations
#2213

Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space

Yifan Zhou, Zeqi Xiao, Shuai Yang et al.

CVPR 2025arXiv:2503.09419
14
citations
#2214

Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video

David Yifan Yao, Albert J. Zhai, Shenlong Wang

CVPR 2025highlightarXiv:2503.21761
14
citations
#2215

From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing

Jingxuan Wei, Cheng Tan, Qi Chen et al.

CVPR 2025highlightarXiv:2411.11916
14
citations
#2216

Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models

Shitian Zhao, Zhuowan Li, YadongLu et al.

CVPR 2024highlightarXiv:2312.06685
14
citations
#2217

Unseen Visual Anomaly Generation

HAN SUN, Yunkang Cao, Hao Dong et al.

CVPR 2025arXiv:2406.01078
14
citations
#2218

Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving

Mozhgan Pourkeshavarz, Mohammad Sabokrou, Amir Rasouli

CVPR 2024arXiv:2306.15755
14
citations
#2219

DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction

Junwen Xiong, Peng Zhang, Tao You et al.

CVPR 2024arXiv:2403.01226
14
citations
#2220

MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation

Shuwei Shi, Biao Gong, Xi Chen et al.

CVPR 2025arXiv:2412.05848
14
citations
#2221

DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters

Mingze Sun, Junting Dong, Junhao Chen et al.

CVPR 2025arXiv:2411.17423
14
citations
#2222

MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Jielin Qiu, Jiacheng Zhu, William Han et al.

CVPR 2024highlightarXiv:2306.04216
14
citations
#2223

Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown Domains

Bang-Dang Pham, Phong Tran, Anh Tran et al.

CVPR 2024arXiv:2403.16205
14
citations
#2224

PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar

Tzofi Klinghoffer, Xiaoyu Xiang, Siddharth Somasundaram et al.

CVPR 2024arXiv:2312.14239
14
citations
#2225

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

Kun Liu, Qi Liu, Xinchen Liu et al.

CVPR 2025arXiv:2503.23715
14
citations
#2226

Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation

Taeyoung Yun, Dinghuai Zhang, Jinkyoo Park et al.

CVPR 2025arXiv:2502.11477
14
citations
#2227

Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis

M. Hamza Mughal, Rishabh Dabral, Merel CJ Scholman et al.

CVPR 2025arXiv:2412.06786
14
citations
#2228

USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation

Xiaoqi Wang, Wenbin He, Xiwei Xuan et al.

CVPR 2024arXiv:2406.05271
14
citations
#2229

Memory-based Adapters for Online 3D Scene Perception

Xiuwei Xu, Chong Xia, Ziwei Wang et al.

CVPR 2024arXiv:2403.06974
14
citations
#2230

Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning

Rui Zhao, Bin Shi, Jianfei Ruan et al.

CVPR 2024arXiv:2405.05714
14
citations
#2231

Unified Entropy Optimization for Open-Set Test-Time Adaptation

Zhengqing Gao, Xu-Yao Zhang, Cheng-Lin Liu

CVPR 2024arXiv:2404.06065
14
citations
#2232

Hyperbolic Learning with Synthetic Captions for Open-World Detection

Fanjie Kong, Yanbei Chen, Jiarui Cai et al.

CVPR 2024arXiv:2404.05016
14
citations
#2233

HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions

Hao Xu, Li Haipeng, Yinqiao Wang et al.

CVPR 2024arXiv:2403.18575
14
citations
#2234

Scaling Inference Time Compute for Diffusion Models

Nanye Ma, Shangyuan Tong, Haolin Jia et al.

CVPR 2025highlight
14
citations
#2235

LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes

Xiang Xu, Lingdong Kong, hui shuai et al.

CVPR 2025arXiv:2501.04004
14
citations
#2236

3D Neural Edge Reconstruction

Lei Li, Songyou Peng, Zehao Yu et al.

CVPR 2024arXiv:2405.19295
14
citations
#2237

ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting

Guo Junfu, Yu Xin, Gaoyi Liu et al.

CVPR 2025arXiv:2503.08135
14
citations
#2238

GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding

Yawen Shao, Wei Zhai, Yuhang Yang et al.

CVPR 2025arXiv:2411.19626
14
citations
#2239

Few-shot Learner Parameterization by Diffusion Time-steps

Zhongqi Yue, Pan Zhou, Richang Hong et al.

CVPR 2024arXiv:2403.02649
14
citations
#2240

OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation

Jisoo Jeong, Hong Cai, Risheek Garrepalli et al.

CVPR 2024arXiv:2403.18092
14
citations
#2241

RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation

Oded Bialer, Yuval Haitman

CVPR 2024arXiv:2404.18150
14
citations
#2242

Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption

Buzhen Huang, Chen Li, Chongyang Xu et al.

CVPR 2024arXiv:2404.11291
14
citations
#2243

Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling

Liwen Wu, Sai Bi, Zexiang Xu et al.

CVPR 2024highlightarXiv:2405.14847
14
citations
#2244

Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis

Feng Zhou, Ruiyang Liu, chen liu et al.

CVPR 2025arXiv:2412.08603
14
citations
#2245

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Jian Yang, Dacheng Yin, Yizhou Zhou et al.

CVPR 2025arXiv:2410.10798
14
citations
#2246

SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

Tao Wu, Runyu He, Gangshan Wu et al.

CVPR 2024arXiv:2404.04565
14
citations
#2247

UniHuman: A Unified Model For Editing Human Images in the Wild

Nannan Li, Qing Liu, Krishna Kumar Singh et al.

CVPR 2024arXiv:2312.14985
14
citations
#2248

SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Aleksei Bokhovkin, Quan Meng, Shubham Tulsiani et al.

CVPR 2025arXiv:2412.01801
14
citations
#2249

Region-Based Representations Revisited

Michal Shlapentokh-Rothman, Ansel Blume, Yao Xiao et al.

CVPR 2024arXiv:2402.02352
14
citations
#2250

Learning Structure-from-Motion with Graph Attention Networks

Lucas Brynte, José Pedro Iglesias, Carl Olsson et al.

CVPR 2024arXiv:2308.15984
14
citations
#2251

SVDinsTN: A Tensor Network Paradigm for Efficient Structure Search from Regularized Modeling Perspective

Yu-Bang Zheng, Xile Zhao, Junhua Zeng et al.

CVPR 2024highlightarXiv:2305.14912
14
citations
#2252

Open-Canopy: Towards Very High Resolution Forest Monitoring

Fajwel Fogel, Yohann PERRON, Nikola Besic et al.

CVPR 2025highlightarXiv:2407.09392
14
citations
#2253

PH-Net: Semi-Supervised Breast Lesion Segmentation via Patch-wise Hardness

Siyao Jiang, Huisi Wu, Junyang Chen et al.

CVPR 2024
13
citations
#2254

MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections

mude hui, Zihao Wei, Hongru Zhu et al.

CVPR 2024arXiv:2403.10815
13
citations
#2255

Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks

Peng Xie, Yequan Bie, Jianda Mao et al.

CVPR 2025arXiv:2411.15720
13
citations
#2256

OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition

Tongjia Chen, Hongshan Yu, Zhengeng Yang et al.

CVPR 2024arXiv:2312.00096
13
citations
#2257

Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective

Duowang Zhu, Xiaohu Huang, Haiyan Huang et al.

CVPR 2025highlightarXiv:2503.18803
13
citations
#2258

SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation

Jiaben Chen, Huaizu Jiang

CVPR 2024arXiv:2308.16876
13
citations
#2259

Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D

Mukund Varma T, Peihao Wang, Zhiwen Fan et al.

CVPR 2024arXiv:2403.18922
13
citations
#2260

Pathways on the Image Manifold: Image Editing via Video Generation

Noam Rotstein, Gal Yona, Daniel Silver et al.

CVPR 2025arXiv:2411.16819
13
citations
#2261

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

Tongtian Yue, Jie Cheng, Longteng Guo et al.

CVPR 2024arXiv:2403.13263
13
citations
#2262

CAD: Photorealistic 3D Generation via Adversarial Distillation

Ziyu Wan, Despoina Paschalidou, Ian Huang et al.

CVPR 2024arXiv:2312.06663
13
citations
#2263

SmartEraser: Remove Anything from Images using Masked-Region Guidance

Longtao Jiang, Zhendong Wang, Jianmin Bao et al.

CVPR 2025arXiv:2501.08279
13
citations
#2264

AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting

Chung-Ho Wu, Yang-Jung Chen, Ying-Huan Chen et al.

CVPR 2025arXiv:2502.05176
13
citations
#2265

RORem: Training a Robust Object Remover with Human-in-the-Loop

Ruibin Li, Tao Yang, Song Guo et al.

CVPR 2025arXiv:2501.00740
13
citations
#2266

Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion

Eunji Kim, Siwon Kim, Minjun Park et al.

CVPR 2025arXiv:2408.12692
13
citations
#2267

TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing

Sherry X. Chen, Yaron Vaxman, Elad Ben Baruch et al.

CVPR 2024arXiv:2404.11120
13
citations
#2268

F3Loc: Fusion and Filtering for Floorplan Localization

Changan Chen, Rui Wang, Christoph Vogel et al.

CVPR 2024highlight
13
citations
#2269

Object Recognition as Next Token Prediction

Kaiyu Yue, Bor-Chun Chen, Jonas Geiping et al.

CVPR 2024highlightarXiv:2312.02142
13
citations
#2270

Brain Decodes Deep Nets

Huzheng Yang, James Gee, Jianbo Shi

CVPR 2024highlightarXiv:2312.01280
13
citations
#2271

Consistent and Controllable Image Animation with Motion Diffusion Models

Xin Ma, Yaohui Wang, Gengyun Jia et al.

CVPR 2025arXiv:2407.15642
13
citations
#2272

Unsupervised Foundation Model-Agnostic Slide-Level Representation Learning

Tim Lenz, Peter Neidlinger, Marta Ligero et al.

CVPR 2025arXiv:2411.13623
13
citations
#2273

ProAPO: Progressively Automatic Prompt Optimization for Visual Classification

Xiangyan Qu, Gaopeng Gou, Jiamin Zhuang et al.

CVPR 2025arXiv:2502.19844
13
citations
#2274

Event-based Video Super-Resolution via State Space Models

Zeyu Xiao, Xinchao Wang

CVPR 2025
13
citations
#2275

Single-View Scene Point Cloud Human Grasp Generation

Yan-Kang Wang, Chengyi Xing, Yi-Lin Wei et al.

CVPR 2024arXiv:2404.15815
13
citations
#2276

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Jianzong Wu, Chao Tang, Jingbo Wang et al.

CVPR 2025arXiv:2412.07589
13
citations
#2277

Generating Enhanced Negatives for Training Language-Based Object Detectors

Shiyu Zhao, Long Zhao, Vijay Kumar BG et al.

CVPR 2024arXiv:2401.00094
13
citations
#2278

Real-World Mobile Image Denoising Dataset with Efficient Baselines

Roman Flepp, Andrey Ignatov, Radu Timofte et al.

CVPR 2024
13
citations
#2279

MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments

Ege Özsoy, Chantal Pellegrini, Tobias Czempiel et al.

CVPR 2025arXiv:2503.02579
13
citations
#2280

Image Generation Diversity Issues and How to Tame Them

Mischa Dombrowski, Weitong Zhang, Hadrien Reynaud et al.

CVPR 2025arXiv:2411.16171
13
citations
#2281

Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis

Yu Yuan, Xijun Wang, Yichen Sheng et al.

CVPR 2025highlightarXiv:2412.02168
13
citations
#2282

CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers

Shahaf Arica, Or Rubin, Sapir Gershov et al.

CVPR 2024arXiv:2403.07700
13
citations
#2283

Segment Any Motion in Videos

Nan Huang, Wenzhao Zheng, Chenfeng Xu et al.

CVPR 2025arXiv:2503.22268
13
citations
#2284

Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution

Qingping Zheng, Ling Zheng, Yuanfan Guo et al.

CVPR 2024arXiv:2403.16643
13
citations
#2285

Move Anything with Layered Scene Diffusion

Jiawei Ren, Mengmeng Xu, Jui-Chieh Wu et al.

CVPR 2024arXiv:2404.07178
13
citations
#2286

Hearing Anything Anywhere

Mason Wang, Ryosuke Sawata, Samuel Clarke et al.

CVPR 2024arXiv:2406.07532
13
citations
#2287

Zero-Shot Monocular Scene Flow Estimation in the Wild

Yiqing Liang, Abhishek Badki, Hang Su et al.

CVPR 2025arXiv:2501.10357
13
citations
#2288

ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models

Ozgur Kara, Krishna Kumar Singh, Feng Liu et al.

CVPR 2025arXiv:2505.07652
13
citations
#2289

VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning

Xueqing Wu, Yuheng Ding, Bingxuan Li et al.

CVPR 2025arXiv:2412.02172
13
citations
#2290

Temporally Consistent Object-Centric Learning by Contrasting Slots

Anna Manasyan, Maximilian Seitzer, Filip Radovic et al.

CVPR 2025arXiv:2412.14295
13
citations
#2291

Causal Composition Diffusion Model for Closed-loop Traffic Generation

Haohong Lin, Xin Huang, Tung Phan-Minh et al.

CVPR 2025arXiv:2412.17920
13
citations
#2292

MotionPro: A Precise Motion Controller for Image-to-Video Generation

Zhongwei Zhang, Fuchen Long, Zhaofan Qiu et al.

CVPR 2025arXiv:2505.20287
13
citations
#2293

UniNet: A Contrastive Learning-guided Unified Framework with Feature Selection for Anomaly Detection

Shun Wei, Jielin Jiang, Xiaolong Xu

CVPR 2025
13
citations
#2294

RoDLA: Benchmarking the Robustness of Document Layout Analysis Models

Yufan Chen, Jiaming Zhang, Kunyu Peng et al.

CVPR 2024arXiv:2403.14442
13
citations
#2295

3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions

Weijia Li, Haote Yang, Zhenghao Hu et al.

CVPR 2024arXiv:2404.04823
13
citations
#2296

Instant Adversarial Purification with Adversarial Consistency Distillation

Chun Tong Lei, Hon Ming Yam, Zhongliang Guo et al.

CVPR 2025arXiv:2408.17064
13
citations
#2297

Privacy-Preserving Optics for Enhancing Protection in Face De-Identification

Jhon Lopez, Carlos Hinojosa, Henry Arguello et al.

CVPR 2024arXiv:2404.00777
13
citations
#2298

Towards More Unified In-context Visual Understanding

Dianmo Sheng, Dongdong Chen, Zhentao Tan et al.

CVPR 2024arXiv:2312.02520
13
citations
#2299

Model Inversion Robustness: Can Transfer Learning Help?

Sy-Tuyen Ho, Koh Jun Hao, Keshigeyan Chandrasegaran et al.

CVPR 2024arXiv:2405.05588
13
citations
#2300

METASCENES: Towards Automated Replica Creation for Real-world 3D Scans

Huangyue Yu, Baoxiong Jia, Yixin Chen et al.

CVPR 2025arXiv:2505.02388
13
citations
#2301

Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability Composability and Decomposability from Anatomy via Self Supervision

Mohammad Reza Hosseinzadeh Taher, Michael Gotway, Jianming Liang

CVPR 2024
13
citations
#2302

DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution

Xingyuan Li, Zirui Wang, Yang Zou et al.

CVPR 2025arXiv:2503.01187
13
citations
#2303

Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content

Rohit Kundu, Hao Xiong, Vishal Mohanty et al.

CVPR 2025arXiv:2412.12278
13
citations
#2304

LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation

Nisarg Shah, Vibashan VS, Vishal M. Patel

CVPR 2024
13
citations
#2305

Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching

Bin Wang, Fan Wu, Linke Ouyang et al.

CVPR 2025arXiv:2409.03643
13
citations
#2306

Audio-Visual Instance Segmentation

Ruohao Guo, Xianghua Ying, Yaru Chen et al.

CVPR 2025arXiv:2310.18709
13
citations
#2307

Effective Video Mirror Detection with Inconsistent Motion Cues

Alex Warren, Ke Xu, Jiaying Lin et al.

CVPR 2024
13
citations
#2308

From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective

Chen Zhao, Zhizhou Chen, Yunzhe Xu et al.

CVPR 2025arXiv:2503.13165
13
citations
#2309

Progressive Focused Transformer for Single Image Super-Resolution

Wei Long, Xingyu Zhou, Leheng Zhang et al.

CVPR 2025arXiv:2503.20337
13
citations
#2310

SeMoLi: What Moves Together Belongs Together

Jenny Seidenschwarz, Aljoša Ošep, Francesco Ferroni et al.

CVPR 2024arXiv:2402.19463
13
citations
#2311

3D Multi-frame Fusion for Video Stabilization

Zhan Peng, Xinyi Ye, Weiyue Zhao et al.

CVPR 2024arXiv:2404.12887
13
citations
#2312

Federated Online Adaptation for Deep Stereo

Matteo Poggi, Fabio Tosi

CVPR 2024arXiv:2405.14873
13
citations
#2313

PoNQ: a Neural QEM-based Mesh Representation

Nissim Maruani, Maks Ovsjanikov, Pierre Alliez et al.

CVPR 2024arXiv:2403.12870
13
citations
#2314

HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding

Shehreen Azad, Vibhav Vineet, Yogesh S. Rawat

CVPR 2025arXiv:2503.08585
13
citations
#2315

MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting

Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong et al.

CVPR 2025arXiv:2501.03714
13
citations
#2316

Boosting Adversarial Training via Fisher-Rao Norm-based Regularization

Xiangyu Yin, Wenjie Ruan

CVPR 2024arXiv:2403.17520
13
citations
#2317

Completion as Enhancement: A Degradation-Aware Selective Image Guided Network for Depth Completion

Zhiqiang Yan, Zhengxue Wang, Kun Wang et al.

CVPR 2025arXiv:2412.19225
13
citations
#2318

DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering

Yexing Xu, Longguang Wang, Minglin Chen et al.

CVPR 2025arXiv:2504.09491
13
citations
#2319

TANGO: Training-free Embodied AI Agents for Open-world Tasks

Filippo Ziliotto, Tommaso Campari, Luciano Serafini et al.

CVPR 2025arXiv:2412.10402
13
citations
#2320

ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics

Junchao Zhu, Ruining Deng, Tianyuan Yao et al.

CVPR 2025arXiv:2412.03026
13
citations
#2321

GDA: Generalized Diffusion for Robust Test-time Adaptation

Yun-Yun Tsai, Fu-Chen Chen, Albert Chen et al.

CVPR 2024arXiv:2404.00095
13
citations
#2322

Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball

Simon Weber, Barış Zöngür, Nikita Araslanov et al.

CVPR 2024arXiv:2404.03778
13
citations
#2323

ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention

Jiawei Wang, Changjian Li

CVPR 2024arXiv:2311.16682
13
citations
#2324

MAFA: Managing False Negatives for Vision-Language Pre-training

Jaeseok Byun, Dohoon Kim, Taesup Moon

CVPR 2024arXiv:2312.06112
13
citations
#2325

Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering

Zhaohe Liao, Jiangtong Li, Li Niu et al.

CVPR 2024arXiv:2407.03008
13
citations
#2326

StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation

Shangjin Zhai, Zhichao Ye, Jialin Liu et al.

CVPR 2025arXiv:2501.05763
13
citations
#2327

OHTA: One-shot Hand Avatar via Data-driven Implicit Priors

Xiaozheng Zheng, Chao Wen, Zhuo Su et al.

CVPR 2024arXiv:2402.18969
13
citations
#2328

Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation

Qi Lv, Hao Li, Xiang Deng et al.

CVPR 2025arXiv:2503.10743
13
citations
#2329

FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models

LIn Zhao, Tianchen Zhao, Zinan Lin et al.

CVPR 2024arXiv:2403.16379
13
citations
#2330

MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices

Jianwen Jiang, Gaojie Lin, Zhengkun Rong et al.

CVPR 2025arXiv:2407.05712
13
citations
#2331

POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation

Lanyun Zhu, Tianrun Chen, Qianxiong Xu et al.

CVPR 2025arXiv:2504.00640
13
citations
#2332

CoLLM: A Large Language Model for Composed Image Retrieval

Chuong Huynh, Jinyu Yang, Ashish Tawari et al.

CVPR 2025arXiv:2503.19910
13
citations
#2333

Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models

Luo Jiayun, Siddhesh Khandelwal, Leonid Sigal et al.

CVPR 2024arXiv:2311.17095
13
citations
#2334

CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs

Yingji Zhong, Lanqing Hong, Zhenguo Li et al.

CVPR 2024arXiv:2403.16885
13
citations
#2335

HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution

Yuxuan Jiang, Ho Man Kwan, jasmine peng et al.

CVPR 2025arXiv:2412.03748
13
citations
#2336

ObjectMover: Generative Object Movement with Video Prior

Xin Yu, Tianyu Wang, Soo Ye Kim et al.

CVPR 2025arXiv:2503.08037
13
citations
#2337

AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification

Huy Nguyen, Kien Nguyen Thanh, Akila Pemasiri et al.

CVPR 2025arXiv:2503.08121
13
citations
#2338

Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing

Ruiyi Wang, Yushuo Zheng, Zicheng Zhang et al.

CVPR 2025arXiv:2503.19262
13
citations
#2339

An Empirical Study of the Generalization Ability of Lidar 3D Object Detectors to Unseen Domains

George Eskandar

CVPR 2024arXiv:2402.17562
13
citations
#2340

DefMamba: Deformable Visual State Space Model

Leiye Liu, Miao Zhang, Jihao Yin et al.

CVPR 2025arXiv:2504.05794
13
citations
#2341

ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning

Zhenyang Liu, Yikai Wang, Sixiao Zheng et al.

CVPR 2025arXiv:2503.23297
12
citations
#2342

Enhancing Visual Continual Learning with Language-Guided Supervision

Bolin Ni, Hongbo Zhao, Chenghao Zhang et al.

CVPR 2024arXiv:2403.16124
12
citations
#2343

ACL: Activating Capability of Linear Attention for Image Restoration

Yubin Gu, Yuan Meng, Jiayi Ji et al.

CVPR 2025
12
citations
#2344

Discover and Mitigate Multiple Biased Subgroups in Image Classifiers

Zeliang Zhang, Mingqian Feng, Zhiheng Li et al.

CVPR 2024arXiv:2403.12777
12
citations
#2345

Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World

Wen Yin, Jian Lou, Pan Zhou et al.

CVPR 2024arXiv:2404.19417
12
citations
#2346

Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

Rui Li, Tobias Fischer, Mattia Segu et al.

CVPR 2024arXiv:2404.03658
12
citations
#2347

Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers

Haoran You, Connelly Barnes, Yuqian Zhou et al.

CVPR 2025arXiv:2412.16822
12
citations
#2348

Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation

Byung Hyun Lee, Sungjin Lim, Se Young Chun

CVPR 2025arXiv:2503.12356
12
citations
#2349

Towards Training-free Anomaly Detection with Vision and Language Foundation Models

Jinjin Zhang, Guodong Wang, yizhou jin et al.

CVPR 2025arXiv:2503.18325
12
citations
#2350

Instance Tracking in 3D Scenes from Egocentric Videos

Yunhan Zhao, Haoyu Ma, Shu Kong et al.

CVPR 2024arXiv:2312.04117
12
citations
#2351

BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis

Weiguang Zhao, Rui Zhang, Qiufeng Wang et al.

CVPR 2025arXiv:2503.12539
12
citations
#2352

Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning

Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto

CVPR 2025arXiv:2412.18219
12
citations
#2353

Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains

Eunsu Baek, Keondo Park, Ji-yoon Kim et al.

CVPR 2024arXiv:2404.15882
12
citations
#2354

Post-pre-training for Modality Alignment in Vision-Language Foundation Models

Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.

CVPR 2025arXiv:2504.12717
12
citations
#2355

Neuro-3D: Towards 3D Visual Decoding from EEG Signals

Zhanqiang Guo, Jiamin Wu, Yonghao Song et al.

CVPR 2025arXiv:2411.12248
12
citations
#2356

PointInfinity: Resolution-Invariant Point Diffusion Models

Zixuan Huang, Justin Johnson, Shoubhik Debnath et al.

CVPR 2024arXiv:2404.03566
12
citations
#2357

Improving Bird's Eye View Semantic Segmentation by Task Decomposition

Tianhao Zhao, Yongcan Chen, Yu Wu et al.

CVPR 2024arXiv:2404.01925
12
citations
#2358

Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM

Linyu Tang, Lei Zhang

CVPR 2024arXiv:2403.11448
12
citations
#2359

ScribbleLight: Single Image Indoor Relighting with Scribbles

Jun Myeong Choi, Annie N. Wang, Pieter Peers et al.

CVPR 2025arXiv:2411.17696
12
citations
#2360

DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation

Hongbin Lin, Zilu Guo, Yifan Zhang et al.

CVPR 2025arXiv:2503.11122
12
citations
#2361

EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision

Yiming Zhao, Taein Kwon, Paul Streli et al.

CVPR 2025highlightarXiv:2409.02224
12
citations
#2362

OmniStyle: Filtering High Quality Style Transfer Data at Scale

Ye Wang, Ruiqi Liu, Jiang Lin et al.

CVPR 2025arXiv:2505.14028
12
citations
#2363

Multi-Attribute Interactions Matter for 3D Visual Grounding

Can Xu, Yuehui Han, Rui Xu et al.

CVPR 2024
12
citations
#2364

Online Video Understanding: OVBench and VideoChat-Online

Zhenpeng Huang, Xinhao Li, Jiaqi Li et al.

CVPR 2025arXiv:2501.00584
12
citations
#2365

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

Wei Chen, Lin Li, Yongqi Yang et al.

CVPR 2025highlightarXiv:2406.10462
12
citations
#2366

CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning

Hyuck Lee, Heeyoung Kim

CVPR 2024arXiv:2403.10391
12
citations
#2367

BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers

Hui Zhang, Tingwei Gao, Jie Shao et al.

CVPR 2025arXiv:2503.15927
12
citations
#2368

Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization

Yujia Liu, Chenxi Yang, Dingquan Li et al.

CVPR 2024arXiv:2403.11397
12
citations
#2369

RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation

Mingfei Han, Liang Ma, Kamila Zhumakhanova et al.

CVPR 2025arXiv:2412.08591
12
citations
#2370

NoT: Federated Unlearning via Weight Negation

Yasser Khalil, Leo Maxime Brunswic, Soufiane Lamghari et al.

CVPR 2025arXiv:2503.05657
12
citations
#2371

Correcting Diffusion Generation through Resampling

Yujian Liu, Yang Zhang, Tommi Jaakkola et al.

CVPR 2024highlightarXiv:2312.06038
12
citations
#2372

Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes

Lihan Jiang, Kerui Ren, Mulin Yu et al.

CVPR 2025arXiv:2412.01745
12
citations
#2373

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong et al.

CVPR 2024arXiv:2401.14405
12
citations
#2374

BIMBA: Selective-Scan Compression for Long-Range Video Question Answering

Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang et al.

CVPR 2025arXiv:2503.09590
12
citations
#2375

Asymmetric Masked Distillation for Pre-Training Small Foundation Models

Zhiyu Zhao, Bingkun Huang, Sen Xing et al.

CVPR 2024arXiv:2311.03149
12
citations
#2376

Low-power Continuous Remote Behavioral Localization with Event Cameras

Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez et al.

CVPR 2024arXiv:2312.03799
12
citations
#2377

Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning

Rashindrie Perera, Saman Halgamuge

CVPR 2024arXiv:2403.04492
12
citations
#2378

Weakly Supervised Monocular 3D Detection with a Single-View Image

Xueying Jiang, Sheng Jin, Lewei Lu et al.

CVPR 2024arXiv:2402.19144
12
citations
#2379

NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting

Yulong Zheng, Zicheng Jiang, Shengfeng He et al.

CVPR 2025highlightarXiv:2503.18794
12
citations
#2380

DemoCaricature: Democratising Caricature Generation with a Rough Sketch

Dar-Yen Chen, Ayan Kumar Bhunia, Subhadeep Koley et al.

CVPR 2024arXiv:2312.04364
12
citations
#2381

MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities

Bizhu Wu, Jinheng Xie, Keming Shen et al.

CVPR 2025arXiv:2504.02478
12
citations
#2382

Seeing the Unseen: Visual Common Sense for Semantic Placement

Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra et al.

CVPR 2024arXiv:2401.07770
12
citations
#2383

Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing

Dongyoung Kim, Jinwoo Kim, Junsang Yu et al.

CVPR 2024arXiv:2402.18277
12
citations
#2384

Universal Novelty Detection Through Adaptive Contrastive Learning

Hossein Mirzaei, Mojtaba Nafez, Mohammad Jafari et al.

CVPR 2024arXiv:2408.10798
12
citations
#2385

D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection

Dinh Phat Do, Taehoon Kim, JAEMIN NA et al.

CVPR 2024arXiv:2403.09359
12
citations
#2386

LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models

Jian Liang, Wenke Huang, Guancheng Wan et al.

CVPR 2025arXiv:2503.16843
12
citations
#2387

DocVLM: Make Your VLM an Efficient Reader

Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz et al.

CVPR 2025arXiv:2412.08746
12
citations
#2388

Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

Henghui Du, Guangyao Li, Chang Zhou et al.

CVPR 2025arXiv:2503.13068
12
citations
#2389

EgoLM: Multi-Modal Language Model of Egocentric Motions

Fangzhou Hong, Vladimir Guzov, Hyo Jin Kim et al.

CVPR 2025arXiv:2409.18127
12
citations
#2390

One-for-More: Continual Diffusion Model for Anomaly Detection

Xiaofan Li, Xin Tan, Zhuo Chen et al.

CVPR 2025arXiv:2502.19848
12
citations
#2391

Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction

Xiaoyang Lyu, Chirui Chang, Peng Dai et al.

CVPR 2024highlightarXiv:2403.19314
12
citations
#2392

Image Neural Field Diffusion Models

Yinbo Chen, Oliver Wang, Richard Zhang et al.

CVPR 2024highlightarXiv:2406.07480
12
citations
#2393

Data-Efficient Multimodal Fusion on a Single GPU

Noël Vouitsis, Zhaoyan Liu, Satya Krishna Gorti et al.

CVPR 2024highlightarXiv:2312.10144
12
citations
#2394

Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

Zihua Zhao, Mengxi Chen, Tianjie Dai et al.

CVPR 2024arXiv:2405.16996
12
citations
#2395

Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning

Leonardo Iurada, Marco Ciccone, Tatiana Tommasi

CVPR 2024arXiv:2406.01820
12
citations
#2396

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion

Xiaomeng Chu, Jiajun Deng, Guoliang You et al.

CVPR 2025arXiv:2412.12725
12
citations
#2397

Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting

Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen

CVPR 2025arXiv:2504.01957
12
citations
#2398

Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection

Jikang Cheng, Zhiyuan Yan, Ying Zhang et al.

CVPR 2025arXiv:2411.11396
12
citations
#2399

FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video

Yue Gao, Hong-Xing Yu, Bo Zhu et al.

CVPR 2025arXiv:2503.04720
12
citations
#2400

FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders

Soumen Basu, Mayuna Gupta, Chetan Madan et al.

CVPR 2024arXiv:2403.08848
12
citations