Most Cited CVPR "autoregressive action modeling" Papers

5,589 papers found • Page 17 of 28

#3201

SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks

Yaxu Xie, Alain Pagani, Didier Stricker

CVPR 2024arXiv:2403.19474
7
citations
#3202

Epistemic Uncertainty Quantification For Pre-Trained Neural Networks

Hanjing Wang, Qiang Ji

CVPR 2024arXiv:2404.10124
7
citations
#3203

Federated Learning with Domain Shift Eraser

Zheng Wang, Zihui Wang, Zheng Wang et al.

CVPR 2025arXiv:2503.13063
7
citations
#3204

Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models

Zichen Miao, WEI CHEN, Qiang Qiu

CVPR 2025highlightarXiv:2503.18337
7
citations
#3205

Can Generative Video Models Help Pose Estimation?

Ruojin Cai, Jason Y. Zhang, Philipp Henzler et al.

CVPR 2025highlightarXiv:2412.16155
7
citations
#3206

The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation

Bingjie Gao, Xinyu Gao, Xiaoxue Wu et al.

CVPR 2025arXiv:2504.11739
7
citations
#3207

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

Lirui Zhao, Yue Yang, Kaipeng Zhang et al.

CVPR 2024arXiv:2404.01342
7
citations
#3208

SuperPrimitive: Scene Reconstruction at a Primitive Level

Kirill Mazur, Gwangbin Bae, Andrew J. Davison

CVPR 2024arXiv:2312.05889
7
citations
#3209

Gaussian Splatting for Efficient Satellite Image Photogrammetry

Luca Savant Aira, Gabriele Facciolo, Thibaud Ehret

CVPR 2025arXiv:2412.13047
7
citations
#3210

FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training

Anjia Cao, Xing Wei, Zhiheng Ma

CVPR 2025arXiv:2411.11927
7
citations
#3211

RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations

Peter Sushko, Ayana Bharadwaj, Zhi Yang Lim et al.

CVPR 2025arXiv:2502.03629
7
citations
#3212

FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

Alice Heiman, Xiaoman Zhang, Emma Chen et al.

CVPR 2025arXiv:2411.18672
7
citations
#3213

Detail-Preserving Latent Diffusion for Stable Shadow Removal

Jiamin Xu, Yuxin Zheng, Zelong Li et al.

CVPR 2025arXiv:2412.17630
7
citations
#3214

Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation

Pu Cao, Feng Zhou, Lu Yang et al.

CVPR 2025arXiv:2312.08195
7
citations
#3215

Classifier-Free Guidance Inside the Attraction Basin May Cause Memorization

Anubhav Jain, Yuya Kobayashi, Takashi Shibuya et al.

CVPR 2025arXiv:2411.16738
7
citations
#3216

Makeup Prior Models for 3D Facial Makeup Estimation and Applications

Xingchao Yang, Takafumi Taketomi, Yuki Endo et al.

CVPR 2024arXiv:2403.17761
7
citations
#3217

Video Recognition in Portrait Mode

Mingfei Han, Linjie Yang, Xiaojie Jin et al.

CVPR 2024arXiv:2312.13746
7
citations
#3218

Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems

Song Xia, Yi Yu, Wenhan Yang et al.

CVPR 2025highlightarXiv:2503.00383
7
citations
#3219

PICD: Versatile Perceptual Image Compression with Diffusion Rendering

Tongda Xu, Jiahao Li, Bin Li et al.

CVPR 2025arXiv:2505.05853
7
citations
#3220

FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion

Haosen Yang, Adrian Bulat, Isma Hadji et al.

CVPR 2025arXiv:2411.18552
7
citations
#3221

APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers

Zhuguanyu Wu, Jiayi Zhang, Jiaxin Chen et al.

CVPR 2025arXiv:2504.02508
7
citations
#3222

Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions

He Zhu, Quyu Kong, Kechun Xu et al.

CVPR 2025arXiv:2504.04744
7
citations
#3223

Progress-Aware Video Frame Captioning

Zihui Xue, Joungbin An, Xitong Yang et al.

CVPR 2025arXiv:2412.02071
7
citations
#3224

Efficient Stitchable Task Adaptation

Haoyu He, Zizheng Pan, Jing Liu et al.

CVPR 2024arXiv:2311.17352
7
citations
#3225

Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts

Dominik Scheuble, Chenyang Lei, Mario Bijelic et al.

CVPR 2024arXiv:2406.03461
7
citations
#3226

Geometric Knowledge-Guided Localized Global Distribution Alignment for Federated Learning

Yanbiao Ma, Wei Dai, Wenke Huang et al.

CVPR 2025arXiv:2503.06457
7
citations
#3227

Physics-Aware Hand-Object Interaction Denoising

Haowen Luo, Yunze Liu, Li Yi

CVPR 2024arXiv:2405.11481
7
citations
#3228

Temporal Alignment-Free Video Matching for Few-shot Action Recognition

SuBeen Lee, WonJun Moon, Hyun Seok Seong et al.

CVPR 2025arXiv:2504.05956
7
citations
#3229

UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation

Lunhao Duan, Shanshan Zhao, Wenjun Yan et al.

CVPR 2025arXiv:2412.18928
7
citations
#3230

ROICtrl: Boosting Instance Control for Visual Generation

Yuchao Gu, Yipin Zhou, Yunfan Ye et al.

CVPR 2025arXiv:2411.17949
7
citations
#3231

U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation

You Wu, Kean Liu, Xiaoyue Mi et al.

CVPR 2024arXiv:2403.20231
7
citations
#3232

DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting

Seungjun Lee, Gim Hee Lee

CVPR 2025arXiv:2503.24210
7
citations
#3233

Image Quality Assessment: From Human to Machine Preference

Chunyi Li, Yuan Tian, Xiaoyue Ling et al.

CVPR 2025highlightarXiv:2503.10078
7
citations
#3234

Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion

Vitor Guizilini, Muhammad Zubair Irshad, Dian Chen et al.

CVPR 2025arXiv:2501.18804
7
citations
#3235

FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification

Zhengrui Guo, Conghao Xiong, Jiabo MA et al.

CVPR 2025arXiv:2411.14743
7
citations
#3236

UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image

Xingyu Liu, Gu Wang, Ruida Zhang et al.

CVPR 2025arXiv:2411.16106
7
citations
#3237

ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling

Zikang Zhou, Hengjian Zhou, Haibo Hu et al.

CVPR 2025arXiv:2411.11911
7
citations
#3238

StdGEN: Semantic-Decomposed 3D Character Generation from Single Images

Yuze He, Yanning Zhou, Wang Zhao et al.

CVPR 2025arXiv:2411.05738
7
citations
#3239

Fun with Flags: Robust Principal Directions via Flag Manifolds

Tolga Birdal, Nathan Mankovich

CVPR 2024arXiv:2401.04071
7
citations
#3240

Towards a Perceptual Evaluation Framework for Lighting Estimation

Justine Giroux, Mohammad Reza Karimi Dastjerdi, Yannick Hold-Geoffroy et al.

CVPR 2024arXiv:2312.04334
7
citations
#3241

MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval

bowen zhang, Xiaojie Jin, Weibo Gong et al.

CVPR 2024arXiv:2301.07868
7
citations
#3242

ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary

Zeqi Gu, Yin Cui, Max Li et al.

CVPR 2025arXiv:2506.00742
7
citations
#3243

MOS: Modeling Object-Scene Associations in Generalized Category Discovery

Zhengyuan Peng, Jinpeng Ma, Zhimin Sun et al.

CVPR 2025arXiv:2503.12035
7
citations
#3244

Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs

Youyi Zhan, Tianjia Shao, Yin Yang et al.

CVPR 2025highlightarXiv:2504.12909
7
citations
#3245

Reconstructing Humans with a Biomechanically Accurate Skeleton

Yan Xia, Xiaowei Zhou, Etienne Vouga et al.

CVPR 2025arXiv:2503.21751
7
citations
#3246

Spectral and Polarization Vision: Spectro-polarimetric Real-world Dataset

Yujin Jeon, Eunsue Choi, Youngchan Kim et al.

CVPR 2024highlightarXiv:2311.17396
7
citations
#3247

PEER Pressure: Model-to-Model Regularization for Single Source Domain Generalization

Dongkyu Cho, Inwoo Hwang, Sanghack Lee

CVPR 2025arXiv:2505.12745
6
citations
#3248

Parametric Point Cloud Completion for Polygonal Surface Reconstruction

Zhaiyu Chen, Yuqing Wang, Liangliang Nan et al.

CVPR 2025arXiv:2503.08363
6
citations
#3249

FluxSpace: Disentangled Semantic Editing in Rectified Flow Models

Yusuf Dalva, Kavana Venkatesh, Pinar Yanardag

CVPR 2025
6
citations
#3250

Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion

Jona Ballé, Luca Versari, Emilien Dupont et al.

CVPR 2025highlightarXiv:2412.00505
6
citations
#3251

EVOS: Efficient Implicit Neural Training via EVOlutionary Selector

Weixiang Zhang, Shuzhao Xie, Chengwei Ren et al.

CVPR 2025arXiv:2412.10153
6
citations
#3252

4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians

Hidenobu Matsuki, Gwangbin Bae, Andrew J. Davison

CVPR 2025arXiv:2505.22859
6
citations
#3253

Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions

Chan Hur, Jeong-hun Hong, Dong-hun Lee et al.

CVPR 2025arXiv:2503.05186
6
citations
#3254

Scene-Centric Unsupervised Panoptic Segmentation

Oliver Hahn, Christoph Reich, Nikita Araslanov et al.

CVPR 2025highlightarXiv:2504.01955
6
citations
#3255

HandOS: 3D Hand Reconstruction in One Stage

Xingyu Chen, Zhuheng Song, Xiaoke Jiang et al.

CVPR 2025arXiv:2412.01537
6
citations
#3256

RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting

Qiyu Dai, Xingyu Ni, Qianfan Shen et al.

CVPR 2025arXiv:2503.21442
6
citations
#3257

BF-STVSR: B-Splines and Fourier---Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution

Eunjin Kim, HYEONJIN KIM, Kyong Hwan Jin et al.

CVPR 2025arXiv:2501.11043
6
citations
#3258

UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models

Yuning Han, Bingyin Zhao, Rui Chu et al.

CVPR 2025highlightarXiv:2412.11441
6
citations
#3259

COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Jinqi Xiao, Shen Sang, Tiancheng Zhi et al.

CVPR 2025arXiv:2412.00071
6
citations
#3260

Memories of Forgotten Concepts

Matan Rusanovsky, Shimon Malnick, Amir Jevnisek et al.

CVPR 2025highlightarXiv:2412.00782
6
citations
#3261

Augmented Deep Contexts for Spatially Embedded Video Coding

Yifan Bian, Chuanbo Tang, Li Li et al.

CVPR 2025highlightarXiv:2505.05309
6
citations
#3262

Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification

S P Sharan, Minkyu Choi, Sahil Shah et al.

CVPR 2025arXiv:2411.16718
6
citations
#3263

IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement

Zhihao Shi, Dong Huo, Yuhongze Zhou et al.

CVPR 2025arXiv:2503.04501
6
citations
#3264

RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance

Yuheng Jiang, Zhehao Shen, Chengcheng Guo et al.

CVPR 2025arXiv:2503.12242
6
citations
#3265

Relation3D : Enhancing Relation Modeling for Point Cloud Instance Segmentation

Edward LOO, Jiacheng Deng

CVPR 2025arXiv:2506.17891
6
citations
#3266

Golden Cudgel Network for Real-Time Semantic Segmentation

Guoyu Yang, Yuan Wang, Daming Shi et al.

CVPR 2025arXiv:2503.03325
6
citations
#3267

Logits DeConfusion with CLIP for Few-Shot Learning

Shuo Li, Fang Liu, Zehua Hao et al.

CVPR 2025arXiv:2504.12104
6
citations
#3268

RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments

Haisheng Su, Feixiang Song, CONG MA et al.

CVPR 2025arXiv:2408.15503
6
citations
#3269

InteractionMap: Improving Online Vectorized HDMap Construction with Interaction

Kuang Wu, Chuan Yang, Zhanbin Li

CVPR 2025arXiv:2503.21659
6
citations
#3270

Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space

Yi Liu, Wengen Li, Jihong Guan et al.

CVPR 2025arXiv:2503.23717
6
citations
#3271

Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction

Cecilia Curreli, Dominik Muhle, Abhishek Saroha et al.

CVPR 2025arXiv:2501.06035
6
citations
#3272

ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

Guangda Ji, Silvan Weder, Francis Engelmann et al.

CVPR 2025arXiv:2410.13924
6
citations
#3273

Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)

Tomer Garber, Tom Tirer

CVPR 2025arXiv:2412.20596
6
citations
#3274

Realistic Test-Time Adaptation of Vision-Language Models

Maxime Zanella, Clément Fuchs, Christophe De Vleeschouwer et al.

CVPR 2025highlightarXiv:2501.03729
6
citations
#3275

Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding

Yan Wang, Baoxiong Jia, Ziyu Zhu et al.

CVPR 2025arXiv:2504.19500
6
citations
#3276

Hyperbolic Safety-Aware Vision-Language Models

Tobia Poppi, Tejaswi Kasarla, Pascal Mettes et al.

CVPR 2025highlightarXiv:2503.12127
6
citations
#3277

3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation

Gyeongrok Oh, Sung June Kim, Heeju Ko et al.

CVPR 2025arXiv:2503.15185
6
citations
#3278

DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction

Ben Kaye, Tomas Jakab, Shangzhe Wu et al.

CVPR 2025highlightarXiv:2412.04464
6
citations
#3279

Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval

Davide Caffagni, Sara Sarto, Marcella Cornia et al.

CVPR 2025arXiv:2503.01980
6
citations
#3280

Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild

Junhyeong Cho, Kim Youwang, Hunmin Yang et al.

CVPR 2025arXiv:2403.14539
6
citations
#3281

Multitwine: Multi-Object Compositing with Text and Layout Control

Gemma Canet Tarrés, Zhe Lin, Zhifei Zhang et al.

CVPR 2025highlightarXiv:2502.05165
6
citations
#3282

GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping

Jinfeng Liu, Lingtong Kong, Bo Li et al.

CVPR 2025arXiv:2503.10143
6
citations
#3283

SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer

Hongda Liu, Longguang Wang, Ye Zhang et al.

CVPR 2025highlightarXiv:2503.15934
6
citations
#3284

Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers

Efstathios Karypidis, Ioannis Kakogeorgiou, Spyros Gidaris et al.

CVPR 2025arXiv:2501.08303
6
citations
#3285

Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models

Itay Benou, Tammy Riklin Raviv

CVPR 2025highlightarXiv:2502.20134
6
citations
#3286

HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting

Jingyu Lin, Jiaqi Gu, Lubin Fan et al.

CVPR 2025arXiv:2412.03844
6
citations
#3287

FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation

Zhuguanyu Wu, Shihe Wang, Jiayi Zhang et al.

CVPR 2025highlightarXiv:2506.11543
6
citations
#3288

OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection

Max Gutbrod, David Rauber, Danilo Weber Nunes et al.

CVPR 2025arXiv:2503.16247
6
citations
#3289

From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting

Zhiwei Huang, Hailin Yu, Yichun Shentu et al.

CVPR 2025arXiv:2503.19358
6
citations
#3290

Blurred LiDAR for Sharper 3D: Robust Handheld 3D Scanning with Diffuse LiDAR and RGB

Nikhil Behari, Aaron Young, Siddharth Somasundaram et al.

CVPR 2025highlightarXiv:2411.19474
6
citations
#3291

AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration

Jiong Lin, Lechen Zhang, Kwansoo Lee et al.

CVPR 2025arXiv:2412.05507
6
citations
#3292

GIFStream: 4D Gaussian-based Immersive Video with Feature Stream

Hao Li, Sicheng Li, Xiang Gao et al.

CVPR 2025arXiv:2505.07539
6
citations
#3293

Functionality Understanding and Segmentation in 3D Scenes

Jaime Corsetti, Francesco Giuliari, Alice Fasoli et al.

CVPR 2025highlightarXiv:2411.16310
6
citations
#3294

NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images

Lingen Li, Zhaoyang Zhang, Yaowei Li et al.

CVPR 2025arXiv:2412.03517
6
citations
#3295

Multi-View Pose-Agnostic Change Localization with Zero Labels

Chamuditha Jayanga Galappaththige, Jason Lai, Lloyd Windrim et al.

CVPR 2025arXiv:2412.03911
6
citations
#3296

ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions

Tomas Soucek, Prajwal Gatti, Michael Wray et al.

CVPR 2025arXiv:2412.01987
6
citations
#3297

Hearing Anywhere in Any Environment

Xiulong Liu, Anurag Kumar, Paul Calamia et al.

CVPR 2025arXiv:2504.10746
6
citations
#3298

GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting

Zixuan Chen, Guangcong Wang, Jiahao Zhu et al.

CVPR 2025arXiv:2411.19895
6
citations
#3299

CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model

Ziyu Yao, Xuxin Cheng, Zhiqi Huang et al.

CVPR 2025arXiv:2503.17690
6
citations
#3300

LaTexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending

Jian Jin, Zhenbo Yu, Yang Shen et al.

CVPR 2025highlightarXiv:2503.06956
6
citations
#3301

Point Clouds Meets Physics: Dynamic Acoustic Field Fitting Network for Point Cloud Understanding

Changshuo Wang, Shuting He, Xiang Fang et al.

CVPR 2025
6
citations
#3302

MATCHA: Towards Matching Anything

Fei Xue, Sven Elflein, Laura Leal-Taixe et al.

CVPR 2025highlight
6
citations
#3303

PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model

Mingju Gao, Yike Pan, Huan-ang Gao et al.

CVPR 2025arXiv:2503.19913
6
citations
#3304

AlphaPre: Amplitude-Phase Disentanglement Model for Precipitation Nowcasting

Kenghong Lin, Baoquan Zhang, Demin Yu et al.

CVPR 2025
6
citations
#3305

DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection

Jaewoo Song, Daemin Park, Kanghyun Baek et al.

CVPR 2025highlightarXiv:2503.13985
6
citations
#3306

From Elements to Design: A Layered Approach for Automatic Graphic Design Composition

Jiawei Lin, Shizhao Sun, Danqing Huang et al.

CVPR 2025arXiv:2412.19712
6
citations
#3307

CoMatcher: Multi-View Collaborative Feature Matching

Jintao Zhang, Zimin Xia, Mingyue Dong et al.

CVPR 2025arXiv:2504.01872
6
citations
#3308

Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts

Qizhou Chen, Chengyu Wang, Dakan Wang et al.

CVPR 2025arXiv:2411.15432
6
citations
#3309

ReWind: Understanding Long Videos with Instructed Learnable Memory

Anxhelo Diko, Tinghuai Wang, Wassim Swaileh et al.

CVPR 2025arXiv:2411.15556
6
citations
#3310

Distilled Prompt Learning for Incomplete Multimodal Survival Prediction

Yingxue Xu, Fengtao ZHOU, Chenyu Zhao et al.

CVPR 2025arXiv:2503.01653
6
citations
#3311

Imputation-free and Alignment-free: Incomplete Multi-view Clustering Driven by Consensus Semantic Learning

yuzhuo dai, Jiaqi Jin, Zhibin Dong et al.

CVPR 2025arXiv:2505.11182
6
citations
#3312

TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation

Abduljalil Radman, Jorma Laaksonen

CVPR 2025
6
citations
#3313

On Denoising Walking Videos for Gait Recognition

Dongyang Jin, Chao Fan, Jingzhe Ma et al.

CVPR 2025arXiv:2505.18582
6
citations
#3314

Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models

Quan Zhang, Jinwei Fang, Rui Yuan et al.

CVPR 2025arXiv:2411.08466
6
citations
#3315

VasTSD: Learning 3D Vascular Tree-state Space Diffusion Model for Angiography Synthesis

Zhifeng Wang, Renjiao Yi, Xin Wen et al.

CVPR 2025arXiv:2503.12758
6
citations
#3316

Towards Realistic Example-based Modeling via 3D Gaussian Stitching

Xinyu Gao, Ziyi Yang, Bingchen Gong et al.

CVPR 2025arXiv:2408.15708
6
citations
#3317

Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images

Jie Mei, Chenyu Lin, Yu Qiu et al.

CVPR 2025arXiv:2503.17261
6
citations
#3318

Towards In-the-wild 3D Plane Reconstruction from a Single Image

Jiachen Liu, Rui Yu, Sili Chen et al.

CVPR 2025highlightarXiv:2506.02493
6
citations
#3319

SparseAlign: a Fully Sparse Framework for Cooperative Object Detection

Yunshuang Yuan, Yan Xia, Daniel Cremers et al.

CVPR 2025arXiv:2503.12982
6
citations
#3320

KAC: Kolmogorov-Arnold Classifier for Continual Learning

Yusong Hu, Zichen Liang, Fei Yang et al.

CVPR 2025highlightarXiv:2503.21076
6
citations
#3321

Dense-SfM: Structure from Motion with Dense Consistent Matching

JongMin Lee, Sungjoo Yoo

CVPR 2025arXiv:2501.14277
6
citations
#3322

Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis

Arpita Chowdhury, Dipanjyoti Paul, Zheda Mai et al.

CVPR 2025arXiv:2501.09333
6
citations
#3323

Large Self-Supervised Models Bridge the Gap in Domain Adaptive Object Detection

Marc-Antoine Lavoie, Anas Mahmoud, Steven L. Waslander

CVPR 2025arXiv:2503.23220
6
citations
#3324

CDI: Copyrighted Data Identification in Diffusion Models

Jan Dubiński, Antoni Kowalczuk, Franziska Boenisch et al.

CVPR 2025arXiv:2411.12858
6
citations
#3325

Learning from Streaming Video with Orthogonal Gradients

Tengda Han, Dilara Gokay, Joseph Heyward et al.

CVPR 2025arXiv:2504.01961
6
citations
#3326

Motion Modes: What Could Happen Next?

Karran Pandey, Yannick Hold-Geoffroy, Matheus Gadelha et al.

CVPR 2025arXiv:2412.00148
6
citations
#3327

Rotation-Equivariant Self-Supervised Method in Image Denoising

Hanze Liu, Jiahong Fu, Qi Xie et al.

CVPR 2025arXiv:2505.19618
6
citations
#3328

ReNeg: Learning Negative Embedding with Reward Guidance

Xiaomin Li, yixuan liu, Takashi Isobe et al.

CVPR 2025highlightarXiv:2412.19637
6
citations
#3329

Improving Gaussian Splatting with Localized Points Management

Haosen Yang, Chenhao Zhang, Wenqing Wang et al.

CVPR 2025highlightarXiv:2406.04251
6
citations
#3330

VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment

Darshana Saravanan, Varun Gupta, Darshan Singh S et al.

CVPR 2025arXiv:2406.10889
6
citations
#3331

CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model

Xiaoding Yuan, Shitao Tang, Kejie Li et al.

CVPR 2025arXiv:2407.07174
6
citations
#3332

EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance

Yang Yue, Yulin Wang, Haojun Jiang et al.

CVPR 2025arXiv:2504.13065
6
citations
#3333

Uncertain Multimodal Intention and Emotion Understanding in the Wild

Qu Yang, QingHongYa Shi, Tongxin Wang et al.

CVPR 2025
6
citations
#3334

CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation

Reza Abbasi, Ali Nazari, Aminreza Sefid et al.

CVPR 2025arXiv:2502.19842
6
citations
#3335

Order-Robust Class Incremental Learning: Graph-Driven Dynamic Similarity Grouping

Guannan Lai, Yujie Li, Xiangkun Wang et al.

CVPR 2025arXiv:2502.20032
6
citations
#3336

Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion Model

Jian Zhu, He Wang, Yang Xu et al.

CVPR 2025arXiv:2505.11800
6
citations
#3337

Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation

Yueru Jia, Jiaming Liu, Sixiang Chen et al.

CVPR 2025
6
citations
#3338

Exploring Simple Open-Vocabulary Semantic Segmentation

Zihang Lai

CVPR 2025arXiv:2401.12217
6
citations
#3339

Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models

Shuyang Hao, Bryan Hooi, Jun Liu et al.

CVPR 2025arXiv:2411.18000
6
citations
#3340

SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction

Zhengyuan Li, Kai Cheng, Anindita Ghosh et al.

CVPR 2025arXiv:2503.18211
6
citations
#3341

One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception

Yuchen Xia, Quan Yuan, Guiyang Luo et al.

CVPR 2025arXiv:2411.16799
6
citations
#3342

Denoising Functional Maps: Diffusion Models for Shape Correspondence

Aleksei Zhuravlev, Zorah Lähner, Vladislav Golyanik

CVPR 2025arXiv:2503.01845
6
citations
#3343

Language-Guided Audio-Visual Learning for Long-Term Sports Assessment

Huangbiao Xu, Xiao Ke, Huanqi Wu et al.

CVPR 2025
6
citations
#3344

Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis

Zixuan Wang, DUO PENG, Feng Chen et al.

CVPR 2025arXiv:2504.01515
6
citations
#3345

Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration

Haipeng Fang, Sheng Tang, Juan Cao et al.

CVPR 2025arXiv:2505.11707
6
citations
#3346

Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution

Huan Zheng, Wencheng Han, Jianbing Shen

CVPR 2025arXiv:2411.03239
6
citations
#3347

Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception

ruotian peng, Haiying He, Yake Wei et al.

CVPR 2025arXiv:2504.06666
6
citations
#3348

Towards RAW Object Detection in Diverse Conditions

Zhong-Yu Li, Xin Jin, Bo-Yuan Sun et al.

CVPR 2025highlightarXiv:2411.15678
6
citations
#3349

DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models

Yongqi Huang, Peng Ye, Chenyu Huang et al.

CVPR 2025arXiv:2503.01359
6
citations
#3350

TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation

Ruineng Li, Daitao Xing, Huiming Sun et al.

CVPR 2025arXiv:2504.08181
6
citations
#3351

GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation

Ruihai Wu, Ziyu Zhu, Yuran Wang et al.

CVPR 2025arXiv:2503.09243
6
citations
#3352

UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

Ziyi Wang, Yanran Zhang, Jie Zhou et al.

CVPR 2025arXiv:2506.09952
6
citations
#3353

MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking

Xinqi Liu, Li Zhou, Zikun Zhou et al.

CVPR 2025highlightarXiv:2411.15459
6
citations
#3354

D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation

Weinan Jia, Mengqi Huang, Nan Chen et al.

CVPR 2025
6
citations
#3355

Satellite Observations Guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution

Siwei Tu, Ben Fei, Weidong Yang et al.

CVPR 2025highlightarXiv:2502.07814
6
citations
#3356

Unified Dense Prediction of Video Diffusion

Lehan Yang, Lu Qi, Xiangtai Li et al.

CVPR 2025arXiv:2503.09344
6
citations
#3357

A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation

Andrew Z Wang, Songwei Ge, Tero Karras et al.

CVPR 2025arXiv:2506.08210
6
citations
#3358

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

Yiyang Du, Xiaochen Wang, Chi Chen et al.

CVPR 2025arXiv:2503.23733
6
citations
#3359

SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction

ZaiPeng Duan, Xuzhong Hu, Pei An et al.

CVPR 2025arXiv:2507.17083
6
citations
#3360

Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes

Aodi Li, Liansheng Zhuang, Xiao Long et al.

CVPR 2025arXiv:2412.13573
6
citations
#3361

Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization

Feifei Li, Mi Zhang, Yiming Sun et al.

CVPR 2025arXiv:2503.15197
6
citations
#3362

Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning

Debora Caldarola, Pietro Cagnasso, Barbara Caputo et al.

CVPR 2025arXiv:2412.03752
6
citations
#3363

Navigating Image Restoration with VAR’s Distribution Alignment Prior

Siyang Wang, Naishan Zheng, Jie Huang et al.

CVPR 2025arXiv:2412.21063
6
citations
#3364

UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping

Aashish Rai, Dilin Wang, Mihir Jain et al.

CVPR 2025arXiv:2502.01846
6
citations
#3365

StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer

ruojun xu, Weijie Xi, Xiaodi Wang et al.

CVPR 2025highlightarXiv:2501.11319
6
citations
#3366

Generative Sparse-View Gaussian Splatting

Hanyang Kong, Xingyi Yang, Xinchao Wang

CVPR 2025
6
citations
#3367

Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition

Zhiyuan Chen, Keyi Li, Yifan Jia et al.

CVPR 2025arXiv:2505.05829
6
citations
#3368

Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence

Haolin Liu, Xiaohang Zhan, Zizheng Yan et al.

CVPR 2025arXiv:2503.21766
6
citations
#3369

RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings

Aayush Dhakal, Srikumar Sastry, Subash Khanal et al.

CVPR 2025arXiv:2502.19781
6
citations
#3370

Rethinking Spiking Self-Attention Mechanism: Implementing α-XNOR Similarity Calculation in Spiking Transformers

Yichen Xiao, Shuai Wang, Dehao Zhang et al.

CVPR 2025
6
citations
#3371

Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input

Jian Wang, Rishabh Dabral, Diogo Luvizon et al.

CVPR 2025arXiv:2504.08449
6
citations
#3372

PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation

Zidong Cao, Jinjing Zhu, Weiming Zhang et al.

CVPR 2025arXiv:2406.13378
6
citations
#3373

TCFG: Tangential Damping Classifier-free Guidance

Mingi Kwon, Shin seong Kim, Jaeseok Jeong et al.

CVPR 2025arXiv:2503.18137
6
citations
#3374

Real-IAD D³: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection

wenbing zhu, Lidong Wang, Ziqing Zhou et al.

CVPR 2025
6
citations
#3375

DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition

Caoshuo Li, Tanzhe Li, Xiaobin Hu et al.

CVPR 2025arXiv:2503.14867
6
citations
#3376

Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation

Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos et al.

CVPR 2025arXiv:2503.21780
6
citations
#3377

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding

Yudong Han, Qingpei Guo, Liyuan Pan et al.

CVPR 2025arXiv:2411.12355
6
citations
#3378

Keyframe-Guided Creative Video Inpainting

Yuwei Guo, Ceyuan Yang, Anyi Rao et al.

CVPR 2025
6
citations
#3379

POT: Prototypical Optimal Transport for Weakly Supervised Semantic Segmentation

Jian Wang, Tianhong Dai, Bingfeng Zhang et al.

CVPR 2025
6
citations
#3380

3D-MVP: 3D Multiview Pretraining for Manipulation

Shengyi Qian, Kaichun Mo, Valts Blukis et al.

CVPR 2025
6
citations
#3381

Visual Persona: Foundation Model for Full-Body Human Customization

Jisu Nam, Soowon Son, Zhan Xu et al.

CVPR 2025arXiv:2503.15406
6
citations
#3382

Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation

Fengfan Zhou, Bangjie Yin, Hefei Ling et al.

CVPR 2025arXiv:2411.15555
6
citations
#3383

MIRE: Matched Implicit Neural Representations

Dhananjaya Jayasundara, Heng Zhao, Demetrio Labate et al.

CVPR 2025
6
citations
#3384

U-Know-DiffPAN: An Uncertainty-aware Knowledge Distillation Diffusion Framework with Details Enhancement for PAN-Sharpening

Sungpyo Kim, Jeonghyeok Do, Jaehyup Lee et al.

CVPR 2025arXiv:2412.06243
6
citations
#3385

Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment

Yang Bai, Yucheng Ji, Min Cao et al.

CVPR 2025
6
citations
#3386

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

Shivam Duggal, Yushi Hu, Oscar Michel et al.

CVPR 2025arXiv:2504.18509
6
citations
#3387

Unsupervised Template-assisted Point Cloud Shape Correspondence Network

Jiacheng Deng, Jiahao Lu, Tianzhu Zhang

CVPR 2024arXiv:2403.16412
6
citations
#3388

In-Context Matting

He Guo, Zixuan Ye, Zhiguo Cao et al.

CVPR 2024highlightarXiv:2403.15789
6
citations
#3389

S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes

Xingyi Li, Zhiguo Cao, Yizheng Wu et al.

CVPR 2024arXiv:2403.06205
6
citations
#3390

Cross-view and Cross-pose Completion for 3D Human Understanding

Matthieu Armando, Salma Galaaoui, Fabien Baradel et al.

CVPR 2024arXiv:2311.09104
6
citations
#3391

CoG-DQA: Chain-of-Guiding Learning with Large Language Models for Diagram Question Answering

Shaowei Wang, Lingling Zhang, Longji Zhu et al.

CVPR 2024
6
citations
#3392

GLID: Pre-training a Generalist Encoder-Decoder Vision Model

Jihao Liu, Jinliang Zheng, Yu Liu et al.

CVPR 2024arXiv:2404.07603
6
citations
#3393

CNC-Net: Self-Supervised Learning for CNC Machining Operations

Mohsen Yavartanoo, Sangmin Hong, Reyhaneh Neshatavar et al.

CVPR 2024arXiv:2312.09925
6
citations
#3394

Flexible Depth Completion for Sparse and Varying Point Densities

Jinhyung Park, Yu-Jhe Li, Kris Kitani

CVPR 2024
6
citations
#3395

Bayesian Exploration of Pre-trained Models for Low-shot Image Classification

Yibo Miao, Yu lei, Feng Zhou et al.

CVPR 2024arXiv:2404.00312
6
citations
#3396

3D-Aware Face Editing via Warping-Guided Latent Direction Learning

Yuhao Cheng, Zhuo Chen, Xingyu Ren et al.

CVPR 2024
6
citations
#3397

CMA: A Chromaticity Map Adapter for Robust Detection of Screen-Recapture Document Images

Changsheng Chen, Liangwei Lin, Yongqi Chen et al.

CVPR 2024
6
citations
#3398

OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition

Yuchen Pan, Junjun Jiang, Kui Jiang et al.

CVPR 2024arXiv:2402.18786
6
citations
#3399

Fully Geometric Panoramic Localization

Junho Kim, Jiwon Jeong, Young Min Kim

CVPR 2024arXiv:2403.19904
6
citations
#3400

Learning to Rank Patches for Unbiased Image Redundancy Reduction

Yang Luo, Zhineng Chen, Peng Zhou et al.

CVPR 2024arXiv:2404.00680
6
citations