Most Cited CVPR "autoregressive visual generation" Papers

5,589 papers found • Page 27 of 28

#5201

PERSE: Personalized 3D Generative Avatars from A Single Portrait

Hyunsoo Cha, Inhee Lee, Hanbyul Joo

CVPR 2025posterarXiv:2412.21206
#5202

Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment

Chen Liu, Peike Li, Liying Yang et al.

CVPR 2025posterarXiv:2503.12847
#5203

RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds

Kang You, Tong Chen, Dandan Ding et al.

CVPR 2025posterarXiv:2503.12382
#5204

Sketchy Bounding-box Supervision for 3D Instance Segmentation

qian deng, Le Hui, Jin Xie et al.

CVPR 2025posterarXiv:2505.16399
#5205

Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Wenbin An, Feng Tian, Sicong Leng et al.

CVPR 2025posterarXiv:2406.12718
#5206

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

Shengqu Cai, Eric Ryan Chan, Yunzhi Zhang et al.

CVPR 2025posterarXiv:2411.18616
#5207

RASP: Revisiting 3D Anamorphic Art for Shadow-Guided Packing of Irregular Objects

Soumyaratna Debnath, Ashish Tiwari, Kaustubh Sadekar et al.

CVPR 2025posterarXiv:2504.02465
#5208

EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling

Songpengcheng Xia, Yu Zhang, Zhuo Su et al.

CVPR 2025posterarXiv:2412.10235
#5209

Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation

Chuandong Liu, Xingxing Weng, Shuguo Jiang et al.

CVPR 2025posterarXiv:2408.11280
#5210

Revisiting Audio-Visual Segmentation with Vision-Centric Transformer

Shaofei Huang, Rui Ling, Tianrui Hui et al.

CVPR 2025posterarXiv:2506.23623
#5211

MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks

Yifei Liu, Zhihang Zhong, Yifan Zhan et al.

CVPR 2025posterarXiv:2412.20522
#5212

Forensics Adapter: Adapting CLIP for Generalizable Face Forgery Detection

Xinjie Cui, Yuezun Li, Ao Luo et al.

CVPR 2025poster
#5213

Less is More: Efficient Image Vectorization with Adaptive Parameterization

Kaibo Zhao, Liang Bao, Yufei Li et al.

CVPR 2025poster
#5214

Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks

Han Wang, Gang Wang, Huan Zhang

CVPR 2025posterarXiv:2411.16721
#5215

PEER Pressure: Model-to-Model Regularization for Single Source Domain Generalization

Dongkyu Cho, Inwoo Hwang, Sanghack Lee

CVPR 2025posterarXiv:2505.12745
#5216

Co-Speech Gesture Video Generation with Implicit Motion-Audio Entanglement

Xinjie Li, Ziyi Chen, Xinlu Yu et al.

CVPR 2025poster
#5217

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Chengyue Wu, Xiaokang Chen, Zhiyu Wu et al.

CVPR 2025posterarXiv:2410.13848
#5218

FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video

Jiawei Zhang, Zijian Wu, Zhiyang Liang et al.

CVPR 2025posterarXiv:2411.15604
#5219

Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking

chaocan xue, Bineng Zhong, Qihua Liang et al.

CVPR 2025posterarXiv:2503.06625
#5220

Unsupervised Template-assisted Point Cloud Shape Correspondence Network

Jiacheng Deng, Jiahao Lu, Tianzhu Zhang

CVPR 2024posterarXiv:2403.16412
#5221

Unified Dense Prediction of Video Diffusion

Lehan Yang, Lu Qi, Xiangtai Li et al.

CVPR 2025posterarXiv:2503.09344
#5222

Joint Scheduling of Causal Prompts and Tasks for Multi-Task Learning

Chaoyang Li, Jianyang Qin, Jinhao Cui et al.

CVPR 2025poster
#5223

Spectral and Polarization Vision: Spectro-polarimetric Real-world Dataset

Yujin Jeon, Eunsue Choi, Youngchan Kim et al.

CVPR 2024highlightarXiv:2311.17396
#5224

Efficient Model Stealing Defense with Noise Transition Matrix

Dong-Dong Wu, Chilin Fu, Weichang Wu et al.

CVPR 2024poster
#5225

HOIAnimator: Generating Text-prompt Human-object Animations using Novel Perceptive Diffusion Models

Wenfeng Song, Xinyu Zhang, Shuai Li et al.

CVPR 2024poster
#5226

MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval

bowen zhang, Xiaojie Jin, Weibo Gong et al.

CVPR 2024posterarXiv:2301.07868
#5227

Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation

Yiping Wang, Xuehai He, Kuan Wang et al.

CVPR 2025posterarXiv:2412.16211
#5228

Diffusion Models Without Attention

Jing Nathan Yan, Jiatao Gu, Alexander Rush

CVPR 2024posterarXiv:2311.18257
#5229

DynScene: Scalable Generation of Dynamic Robotic Manipulation Scenes for Embodied AI

Sangmin Lee, Sungyong Park, Heewon Kim

CVPR 2025poster
#5230

HDQMF: Holographic Feature Decomposition Using Quantum Algorithms

Prathyush Poduval, Zhuowen Zou, Mohsen Imani

CVPR 2024poster
#5231

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan et al.

CVPR 2024posterarXiv:2312.07920
#5232

DefMamba: Deformable Visual State Space Model

Leiye Liu, Miao Zhang, Jihao Yin et al.

CVPR 2025posterarXiv:2504.05794
#5233

H-ViT: A Hierarchical Vision Transformer for Deformable Image Registration

Morteza Ghahremani, Mohammad Khateri, Bailiang Jian et al.

CVPR 2024highlight
#5234

SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Aleksei Bokhovkin, Quan Meng, Shubham Tulsiani et al.

CVPR 2025posterarXiv:2412.01801
#5235

VideoGEM: Training-free Action Grounding in Videos

Felix Vogel, Walid Bousselham, Anna Kukleva et al.

CVPR 2025posterarXiv:2503.20348
#5236

Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models

Huimin Huang, Yawen Huang, Lanfen Lin et al.

CVPR 2024poster
#5237

ProReflow: Progressive Reflow with Decomposed Velocity

Lei Ke, Haohang Xu, Xuefei Ning et al.

CVPR 2025posterarXiv:2503.04824
#5238

MR-VNet: Media Restoration using Volterra Networks

Siddharth Roheda, Amit Unde, Loay Rashid

CVPR 2024poster
#5239

OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition

Jianqiang Wan, Sibo Song, Wenwen Yu et al.

CVPR 2024posterarXiv:2403.19128
#5240

PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization

Xu Peng, Junwei Zhu, Boyuan Jiang et al.

CVPR 2024posterarXiv:2312.06354
#5241

Event-Equalized Dense Video Captioning

Kangyi Wu, Pengna Li, Jingwen Fu et al.

CVPR 2025poster
#5242

MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation

Petru-Daniel Tudosiu, Yongxin Yang, Shifeng Zhang et al.

CVPR 2024posterarXiv:2404.02790
#5243

Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-Training via Differentiable Rendering of Line Segments

Yusuke Takimoto, Hikari Takehara, Hiroyuki Sato et al.

CVPR 2024highlightarXiv:2403.17496
#5244

CroSel: Cross Selection of Confident Pseudo Labels for Partial-Label Learning

Shiyu Tian, Hongxin Wei, Yiqun Wang et al.

CVPR 2024posterarXiv:2303.10365
#5245

PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild

Kun Yuan, Hongbo Liu, Mading Li et al.

CVPR 2024posterarXiv:2405.17765
#5246

GazeGene: Large-scale Synthetic Gaze Dataset with 3D Eyeball Annotations

Yiwei Bao, Zhiming Wang, Feng Lu

CVPR 2025poster
#5247

Mudslide: A Universal Nuclear Instance Segmentation Method

Jun Wang

CVPR 2024highlight
#5248

Feature Information Driven Position Gaussian Distribution Estimation for Tiny Object Detection

Jinghao Bian, Mingtao Feng, Weisheng Dong et al.

CVPR 2025poster
#5249

PRaDA: Projective Radial Distortion Averaging

Daniil Sinitsyn, Linus Härenstam-Nielsen, Daniel Cremers

CVPR 2025posterarXiv:2504.16499
#5250

Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline

Anas Al-lahham, Muhammad Zaigham Zaheer, Nurbek Tastan et al.

CVPR 2024posterarXiv:2404.00847
#5251

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

Felix Wimbauer, Bichen Wu, Edgar Schoenfeld et al.

CVPR 2024posterarXiv:2312.03209
#5252

Rewrite the Stars

Xu Ma, Xiyang Dai, Yue Bai et al.

CVPR 2024posterarXiv:2403.19967
#5253

Virtual Immunohistochemistry Staining for Histological Images Assisted by Weakly-supervised Learning

Jiahan Li, Jiuyang Dong, Shenjin Huang et al.

CVPR 2024poster
#5254

Structure from Collision

Takuhiro Kaneko

CVPR 2025highlightarXiv:2505.21335
#5255

3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features

Chenfeng Xu, Huan Ling, Sanja Fidler et al.

CVPR 2024posterarXiv:2311.04391
#5256

Model Adaptation for Time Constrained Embodied Control

Jaehyun Song, Minjong Yoo, Honguk Woo

CVPR 2024posterarXiv:2406.11128
#5257

Inference-Scale Complexity in ANN-SNN Conversion for High-Performance and Low-Power Applications

Tong Bu, Maohua Li, Zhaofei Yu

CVPR 2025posterarXiv:2409.03368
#5258

Language-Guided Salient Object Ranking

Fang Liu, Yuhao Liu, Ke Xu et al.

CVPR 2025poster
#5259

DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data

Chengxiang Fan, Muzhi Zhu, Hao Chen et al.

CVPR 2024posterarXiv:2405.10185
#5260

SPAD: Spatially Aware Multi-View Diffusers

Yash Kant, Aliaksandr Siarohin, Ziyi Wu et al.

CVPR 2024poster
#5261

SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation

Kejia Yin, Varshanth Rao, Ruowei Jiang et al.

CVPR 2024posterarXiv:2405.18322
#5262

DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation

Chenyang Wang, Zerong Zheng, Tao Yu et al.

CVPR 2024poster
#5263

SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction

Pin Tang, Zhongdao Wang, Guoqing Wang et al.

CVPR 2024posterarXiv:2404.09502
#5264

Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization

You Shen, Zhipeng Zhang, Xinyang Li et al.

CVPR 2025posterarXiv:2503.00881
#5265

Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion

Litu Rout, Yujia Chen, Abhishek Kumar et al.

CVPR 2024posterarXiv:2312.00852
#5266

Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training

Arun Reddy, William Paul, Corban Rivera et al.

CVPR 2024posterarXiv:2312.02914
#5267

MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations

Ziyang Zhang, Yang Yu, Yucheng Chen et al.

CVPR 2025posterarXiv:2503.01019
#5268

Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation

Pu Cao, Feng Zhou, Lu Yang et al.

CVPR 2025posterarXiv:2312.08195
#5269

RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection

Zhiwei Lin, Zhe Liu, Zhongyu Xia et al.

CVPR 2024posterarXiv:2403.16440
#5270

FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment

Jinglin Xu, Sibo Yin, Guohao Zhao et al.

CVPR 2024posterarXiv:2405.06887
#5271

SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes

Alexandros Delitzas, Ayça Takmaz, Federico Tombari et al.

CVPR 2024poster
#5272

MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding

Xu Cao, Tong Zhou, Yunsheng Ma et al.

CVPR 2024poster
#5273

Do Vision and Language Encoders Represent the World Similarly?

Mayug Maniparambil, Raiymbek Akshulakov, YASSER ABDELAZIZ DAHOU DJILALI et al.

CVPR 2024posterarXiv:2401.05224
#5274

Beyond Generation: A Diffusion-based Low-level Feature Extractor for Detecting AI-generated Images

Nan Zhong, Haoyu Chen, Yiran Xu et al.

CVPR 2025poster
#5275

S2D-LFE: Sparse-to-Dense Light Field Event Generation

Yutong Liu, Wenming Weng, Yueyi Zhang et al.

CVPR 2025poster
#5276

Weakly Supervised Point Cloud Semantic Segmentation via Artificial Oracle

Hyeokjun Kweon, Jihun Kim, Kuk-Jin Yoon

CVPR 2024poster
#5277

Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training

Runze He, Shaofei Huang, Xuecheng Nie et al.

CVPR 2024posterarXiv:2312.01663
#5278

Construct to Associate: Cooperative Context Learning for Domain Adaptive Point Cloud Segmentation

Guangrui Li

CVPR 2024poster
#5279

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

Hao Li, Xue Yang, Zhaokai Wang et al.

CVPR 2024posterarXiv:2312.09238
#5280

Wavelet-based Fourier Information Interaction with Frequency Diffusion Adjustment for Underwater Image Restoration

Chen Zhao, Weiling Cai, Chenyu Dong et al.

CVPR 2024posterarXiv:2311.16845
#5281

Generating Content for HDR Deghosting from Frequency View

Tao Hu, Qingsen Yan, Yuankai Qi et al.

CVPR 2024posterarXiv:2404.00849
#5282

Open-Canopy: Towards Very High Resolution Forest Monitoring

Fajwel Fogel, Yohann PERRON, Nikola Besic et al.

CVPR 2025highlightarXiv:2407.09392
#5283

Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion

Yuanxun Lu, Jingyang Zhang, Shiwei Li et al.

CVPR 2024posterarXiv:2311.15980
#5284

G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation

Tianxing Chen, Yao Mu, Zhixuan Liang et al.

CVPR 2025posterarXiv:2411.18369
#5285

Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transfomers

Sheng Yang, Jiawang Bai, Kuofeng Gao et al.

CVPR 2024poster
#5286

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning

Sijin Chen, Xin Chen, Chi Zhang et al.

CVPR 2024poster
#5287

DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh

Jingyu Zhuang, Di Kang, Linchao Bao et al.

CVPR 2025posterarXiv:2411.15205
#5288

GenTron: Diffusion Transformers for Image and Video Generation

Shoufa Chen, Mengmeng Xu, Jiawei Ren et al.

CVPR 2024posterarXiv:2312.04557
#5289

Map-Relative Pose Regression for Visual Re-Localization

Shuai Chen, Tommaso Cavallari, Victor Adrian Prisacariu et al.

CVPR 2024highlightarXiv:2404.09884
#5290

Gradient-based Parameter Selection for Efficient Fine-Tuning

Zhi Zhang, Qizhe Zhang, Zijun Gao et al.

CVPR 2024posterarXiv:2312.10136
#5291

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov et al.

CVPR 2024highlightarXiv:2402.14797
#5292

Backpropagation-free Network for 3D Test-time Adaptation

YANSHUO WANG, Ali Cheraghian, Zeeshan Hayder et al.

CVPR 2024posterarXiv:2403.18442
#5293

TransNeXt: Robust Foveal Visual Perception for Vision Transformers

Dai Shi

CVPR 2024posterarXiv:2311.17132
#5294

MTADiffusion: Mask Text Alignment Diffusion Model for Object Inpainting

jun huang, Ting Liu, Yihang Wu et al.

CVPR 2025posterarXiv:2506.23482
#5295

ShiftwiseConv: Small Convolutional Kernel with Large Kernel Effect

Dachong Li, li li, zhuangzhuang chen et al.

CVPR 2025posterarXiv:2401.12736
#5296

Reversing Flow for Image Restoration

Haina Qin, Wenyang Luo, Bing Li et al.

CVPR 2025posterarXiv:2506.16961
#5297

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

Zigang Geng, Binxin Yang, Tiankai Hang et al.

CVPR 2024posterarXiv:2309.03895
#5298

HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation

Linglin Jing, Yiming Ding, Yunpeng Gao et al.

CVPR 2024posterarXiv:2403.16788
#5299

Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences

Minyoung Hwang, Luca Weihs, Chanwoo Park et al.

CVPR 2024posterarXiv:2312.09337
#5300

PolarNeXt: Rethink Instance Segmentation with Polar Representation

Jiacheng Sun, Xinghong Zhou, Yiqiang Wu et al.

CVPR 2025poster
#5301

Fourier Priors-Guided Diffusion for Zero-Shot Joint Low-Light Enhancement and Deblurring

Xiaoqian Lv, Shengping Zhang, Chenyang Wang et al.

CVPR 2024poster
#5302

Towards General Robustness Verification of MaxPool-based Convolutional Neural Networks via Tightening Linear Approximation

Yuan Xiao, Shiqing Ma, Juan Zhai et al.

CVPR 2024posterarXiv:2406.00699
#5303

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

Lingteng Qiu, Guanying Chen, Xiaodong Gu et al.

CVPR 2024highlightarXiv:2311.16918
#5304

Implicit Bias Injection Attacks against Text-to-Image Diffusion Models

Huayang Huang, Xiangye Jin, Jiaxu Miao et al.

CVPR 2025posterarXiv:2504.01819
#5305

Robust Synthetic-to-Real Transfer for Stereo Matching

Jiawei Zhang, Jiahe Li, Lei Huang et al.

CVPR 2024posterarXiv:2403.07705
#5306

Quaffure: Real-Time Quasi-Static Neural Hair Simulation

Tuur Stuyck, Gene Wei-Chin Lin, Egor Larionov et al.

CVPR 2025posterarXiv:2412.10061
#5307

Understanding and Improving Source-free Domain Adaptation from a Theoretical Perspective

Yu Mitsuzumi, Akisato Kimura, Hisashi Kashima

CVPR 2024poster
#5308

From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

Yonglu Li, Xiaoqian Wu, Xinpeng Liu et al.

CVPR 2024highlightarXiv:2304.00553
#5309

Label Shift Meets Online Learning: Ensuring Consistent Adaptation with Universal Dynamic Regret

Yucong Dai, Shilin Gu, Ruidong Fan et al.

CVPR 2025highlight
#5310

LowRankOcc: Tensor Decomposition and Low-Rank Recovery for Vision-based 3D Semantic Occupancy Prediction

Linqing Zhao, Xiuwei Xu, Ziwei Wang et al.

CVPR 2024poster
#5311

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

Wanhua Li, Renping Zhou, Jiawei Zhou et al.

CVPR 2025posterarXiv:2503.10437
#5312

Overcoming Generic Knowledge Loss with Selective Parameter Update

Wenxuan Zhang, Paul Janson, Rahaf Aljundi et al.

CVPR 2024posterarXiv:2308.12462
#5313

Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction

Wenke Xia, Ruoxuan Feng, Dong Wang et al.

CVPR 2025posterarXiv:2504.14588
#5314

ROLL: Robust Noisy Pseudo-label Learning for Multi-View Clustering with Noisy Correspondence

Yuan Sun, Yongxiang Li, Zhenwen Ren et al.

CVPR 2025highlight
#5315

Boosting Domain Incremental Learning: Selecting the Optimal Parameters is All You Need

Qiang Wang, Xiang Song, Yuhang He et al.

CVPR 2025posterarXiv:2505.23744
#5316

BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning

Ruyang Liu, Chen Li, Yixiao Ge et al.

CVPR 2024posterarXiv:2309.15785
#5317

Video Frame Interpolation via Direct Synthesis with the Event-based Reference

Yuhan Liu, Yongjian Deng, Hao Chen et al.

CVPR 2024poster
#5318

Lane2Seq: Towards Unified Lane Detection via Sequence Generation

Kunyang Zhou

CVPR 2024posterarXiv:2402.17172
#5319

CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation

Bo-Yuan Sun, Yuqi Yang, Le Zhang et al.

CVPR 2024posterarXiv:2306.04300
#5320

Rethinking Boundary Discontinuity Problem for Oriented Object Detection

Hang Xu, Xinyuan Liu, Haonan Xu et al.

CVPR 2024posterarXiv:2305.10061
#5321

Tiled Diffusion

Or Madar, Ohad Fried

CVPR 2025posterarXiv:2412.15185
#5322

Bridging Viewpoint Gaps: Geometric Reasoning Boosts Semantic Correspondence

Qiyang Qian, Hansheng Chen, Masayoshi Tomizuka et al.

CVPR 2025poster
#5323

MCNet: Rethinking the Core Ingredients for Accurate and Efficient Homography Estimation

Haokai Zhu, Si-Yuan Cao, Jianxin Hu et al.

CVPR 2024poster
#5324

SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

Chunlin Yu, Hanqing Wang, Ye Shi et al.

CVPR 2025posterarXiv:2412.01550
#5325

UniDepth: Universal Monocular Metric Depth Estimation

Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis et al.

CVPR 2024highlightarXiv:2403.18913
#5326

Diffusion Model Alignment Using Direct Preference Optimization

Bram Wallace, Meihua Dang, Rafael Rafailov et al.

CVPR 2024posterarXiv:2311.12908
#5327

SD4Match: Learning to Prompt Stable Diffusion Model for Semantic Matching

Xinghui Li, Jingyi Lu, Kai Han et al.

CVPR 2024posterarXiv:2310.17569
#5328

Uncertainty-Guided Never-Ending Learning to Drive

Lei Lai, Eshed Ohn-Bar, Sanjay Arora et al.

CVPR 2024poster
#5329

Feedback-Guided Autonomous Driving

Jimuyang Zhang, Zanming Huang, Arijit Ray et al.

CVPR 2024highlight
#5330

Small Steps and Level Sets: Fitting Neural Surface Models with Point Guidance

Chamin Hewa Koneputugodage, Yizhak Ben-Shabat, Dylan Campbell et al.

CVPR 2024poster
#5331

Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration

Shihao Zhou, Duosheng Chen, Jinshan Pan et al.

CVPR 2024poster
#5332

3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos

Jiakai Sun, Han Jiao, Guangyuan Li et al.

CVPR 2024highlightarXiv:2403.01444
#5333

LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-time Rendering

Jaehoon Choi, Rajvi Shah, Qinbo Li et al.

CVPR 2024poster
#5334

Asynchronous Collaborative Graph Representation for Frames and Events

Dianze Li, Jianing Li, Xu Liu et al.

CVPR 2025poster
#5335

Geometry Transfer for Stylizing Radiance Fields

Hyunyoung Jung, Seonghyeon Nam, Nikolaos Sarafianos et al.

CVPR 2024posterarXiv:2402.00863
#5336

3D Human Pose Perception from Egocentric Stereo Videos

Hiroyasu Akada, Jian Wang, Vladislav Golyanik et al.

CVPR 2024highlightarXiv:2401.00889
#5337

Theory-Inspired Deep Multi-View Multi-Label Learning with Incomplete Views and Noisy Labels

Quanjiang Li, Tingjin Luo, Jiahui Liao

CVPR 2025poster
#5338

QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction

Ishak Ayad, Nicolas Larue, Mai K. Nguyen

CVPR 2024posterarXiv:2402.17951
#5339

Check Locate Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

Biao Gong, Siteng Huang, Yutong Feng et al.

CVPR 2024poster
#5340

Prompt3D: Random Prompt Assisted Weakly-Supervised 3D Object Detection

Xiaohong Zhang, Huisheng Ye, Jingwen Li et al.

CVPR 2024poster
#5341

Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation

Keonhee Han, Dominik Muhle, Felix Wimbauer et al.

CVPR 2024posterarXiv:2404.07933
#5342

Move-in-2D: 2D-Conditioned Human Motion Generation

Hsin-Ping Huang, Yang Zhou, Jui-Hsien Wang et al.

CVPR 2025posterarXiv:2412.13185
#5343

Volumetric Environment Representation for Vision-Language Navigation

Liu, Wenguan Wang, Yi Yang

CVPR 2024highlightarXiv:2403.14158
#5344

CrossKD: Cross-Head Knowledge Distillation for Object Detection

JiaBao Wang, yuming chen, Zhaohui Zheng et al.

CVPR 2024posterarXiv:2306.11369
#5345

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

Jiaming Liu, Ran Xu, Senqiao Yang et al.

CVPR 2024posterarXiv:2312.12480
#5346

TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing

Sherry X. Chen, Yaron Vaxman, Elad Ben Baruch et al.

CVPR 2024posterarXiv:2404.11120
#5347

Leveraging Camera Triplets for Efficient and Accurate Structure-from-Motion

Lalit Manam, Venu Madhav Govindu

CVPR 2024poster
#5348

CG-HOI: Contact-Guided 3D Human-Object Interaction Generation

Christian Diller, Angela Dai

CVPR 2024posterarXiv:2311.16097
#5349

Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation

Sayak Nag, Udita Ghosh, Calvin-Khang Ta et al.

CVPR 2025posterarXiv:2503.13947
#5350

Improving Semi-Supervised Semantic Segmentation with Sliced-Wasserstein Feature Alignment and Uniformity

Chen Yi Lu, Kasra Derakhshandeh, Somali Chaterji

CVPR 2025poster
#5351

Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?

Hanxin Zhu, Tianyu He, Xin Li et al.

CVPR 2024posterarXiv:2403.06092
#5352

Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning

Dipam Goswami, Albin Soutif, Yuyang Liu et al.

CVPR 2024posterarXiv:2405.19074
#5353

DIEM: Decomposition-Integration Enhancing Multimodal Insights

Xinyi Jiang, Guoming Wang, Junhao Guo et al.

CVPR 2024poster
#5354

Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters

Jiazuo Yu, Yunzhi Zhuge, Lu Zhang et al.

CVPR 2024posterarXiv:2403.11549
#5355

Hierarchical Adaptive Filtering Network for Text Image Specular Highlight Removal

Zhi Jiang, Jingbo Hu, Ling Zhang et al.

CVPR 2025poster
#5356

HOI-M^3: Capture Multiple Humans and Objects Interaction within Contextual Environment

Juze Zhang, Jingyan Zhang, Zining Song et al.

CVPR 2024highlight
#5357

CORES: Convolutional Response-based Score for Out-of-distribution Detection

Keke Tang, Chao Hou, Weilong Peng et al.

CVPR 2024poster
#5358

Equivariant Multi-Modality Image Fusion

Zixiang Zhao, Haowen Bai, Jiangshe Zhang et al.

CVPR 2024posterarXiv:2305.11443
#5359

FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis

Jiangtong Tan, Hu Yu, Jie Huang et al.

CVPR 2025highlightarXiv:2505.01172
#5360

PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation

Jinfeng Xu, Siyuan Yang, Xianzhi Li et al.

CVPR 2024posterarXiv:2404.00979
#5361

NeISF: Neural Incident Stokes Field for Geometry and Material Estimation

Chenhao Li, Taishi Ono, Takeshi Uemori et al.

CVPR 2024highlightarXiv:2311.13187
#5362

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

Zheng Li, Xiang Li, xinyi fu et al.

CVPR 2024posterarXiv:2403.02781
#5363

HERA: Hybrid Explicit Representation for Ultra-Realistic Head Avatars

Hongrui Cai, Yuting Xiao, Xuan Wang et al.

CVPR 2025poster
#5364

DeMatch: Deep Decomposition of Motion Field for Two-View Correspondence Learning

Shihua Zhang, Zizhuo Li, Yuan Gao et al.

CVPR 2024poster
#5365

Domain Gap Embeddings for Generative Dataset Augmentation

Yinong Oliver Wang, Younjoon Chung, Chen Henry Wu et al.

CVPR 2024poster
#5366

Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation

Zhekai Du, Xinyao Li, Fengling Li et al.

CVPR 2024posterarXiv:2403.02899
#5367

TransLoc4D: Transformer-based 4D Radar Place Recognition

Guohao Peng, Heshan Li, Yangyang Zhao et al.

CVPR 2024poster
#5368

Higher-order Relational Reasoning for Pedestrian Trajectory Prediction

Sungjune Kim, Hyung-gun Chi, Hyerin Lim et al.

CVPR 2024poster
#5369

Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation

Jingyun Wang, Guoliang Kang

CVPR 2024posterarXiv:2408.06747
#5370

Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers

Haoran You, Connelly Barnes, Yuqian Zhou et al.

CVPR 2025posterarXiv:2412.16822
#5371

SGSST: Scaling Gaussian Splatting Style Transfer

Bruno Galerne, Jianling WANG, Lara Raad et al.

CVPR 2025poster
#5372

Unified Medical Lesion Segmentation via Self-referring Indicator

Shijie Chang, Xiaoqi Zhao, Lihe Zhang et al.

CVPR 2025poster
#5373

Absolute Pose from One or Two Scaled and Oriented Features

Jonathan Ventura, Zuzana Kukelova, Torsten Sattler et al.

CVPR 2024highlight
#5374

Draw Step by Step: Reconstructing CAD Construction Sequences from Point Clouds via Multimodal Diffusion.

Weijian Ma, Shuaiqi Chen, Yunzhong Lou et al.

CVPR 2024poster
#5375

DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation

Zeeshan Hayder, Xuming He

CVPR 2024posterarXiv:2403.14886
#5376

Generative Modeling of Class Probability for Multi-Modal Representation Learning

JungKyoo Shin, Bumsoo Kim, Eunwoo Kim

CVPR 2025highlightarXiv:2503.17417
#5377

Open-Vocabulary 3D Semantic Segmentation with Foundation Models

Li Jiang, Shaoshuai Shi, Bernt Schiele

CVPR 2024highlight
#5378

Training Vision Transformers for Semi-Supervised Semantic Segmentation

Xinting Hu, Li Jiang, Bernt Schiele

CVPR 2024poster
#5379

APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

Weizhao He, Yang Zhang, Wei Zhuo et al.

CVPR 2024posterarXiv:2406.08372
#5380

Design2Cloth: 3D Cloth Generation from 2D Masks

Jiali Zheng, Rolandos Alexandros Potamias, Stefanos Zafeiriou

CVPR 2024posterarXiv:2404.02686
#5381

S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes

Xingyi Li, Zhiguo Cao, Yizheng Wu et al.

CVPR 2024posterarXiv:2403.06205
#5382

SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation

Aysim Toker, Marvin Eisenberger, Daniel Cremers et al.

CVPR 2024posterarXiv:2403.16605
#5383

Dual-Consistency Model Inversion for Non-Exemplar Class Incremental Learning

Zihuan Qiu, Yi Xu, Fanman Meng et al.

CVPR 2024poster
#5384

GRAE-3DMOT: Geometry Relation-Aware Encoder for Online 3D Multi-Object Tracking

Hyunseop Kim, Hyo-Jun Lee, Yonguk Lee et al.

CVPR 2025poster
#5385

DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes

Hao Yan, Zhihui Ke, Xiaobo Zhou et al.

CVPR 2024posterarXiv:2403.15679
#5386

Rolling Shutter Correction with Intermediate Distortion Flow Estimation

Mingdeng Cao, Sidi Yang, Yujiu Yang et al.

CVPR 2024posterarXiv:2404.06350
#5387

Towards Transferable Targeted 3D Adversarial Attack in the Physical World

Yao Huang, Yinpeng Dong, Shouwei Ruan et al.

CVPR 2024posterarXiv:2312.09558
#5388

Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching

Lennart Bastian, Yizheng Xie, Nassir Navab et al.

CVPR 2024posterarXiv:2312.03678
#5389

Class Tokens Infusion for Weakly Supervised Semantic Segmentation

Sung-Hoon Yoon, Hoyong Kwon, Hyeonseong Kim et al.

CVPR 2024poster
#5390

SFOD: Spiking Fusion Object Detector

Yimeng Fan, Wei Zhang, Changsong Liu et al.

CVPR 2024posterarXiv:2403.15192
#5391

AnyDoor: Zero-shot Object-level Image Customization

Xi Chen, Lianghua Huang, Yu Liu et al.

CVPR 2024posterarXiv:2307.09481
#5392

Gazing at Rewards: Eye Movements as a Lens into Human and AI Decision-Making in Hybrid Visual Foraging

Bo Wang, Dingwei Tan, Yen-Ling Kuo et al.

CVPR 2025posterarXiv:2411.09176
#5393

SeD: Semantic-Aware Discriminator for Image Super-Resolution

Bingchen Li, Xin Li, Hanxin Zhu et al.

CVPR 2024posterarXiv:2402.19387
#5394

InstanceDiffusion: Instance-level Control for Image Generation

XuDong Wang, Trevor Darrell, Sai Saketh Rambhatla et al.

CVPR 2024posterarXiv:2402.03290
#5395

Robust Emotion Recognition in Context Debiasing

Dingkang Yang, Kun Yang, Mingcheng Li et al.

CVPR 2024posterarXiv:2403.05963
#5396

RefPose: Leveraging Reference Geometric Correspondences for Accurate 6D Pose Estimation of Unseen Objects

Jaeguk Kim, Jaewoo Park, Keuntek Lee et al.

CVPR 2025posterarXiv:2505.10841
#5397

Improving Training Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architecture

Huijie Zhang, Yifu Lu, Ismail Alkhouri et al.

CVPR 2024poster
#5398

Balancing Act: Distribution-Guided Debiasing in Diffusion Models

Rishubh Parihar, Abhijnya Bhat, Abhipsa Basu et al.

CVPR 2024posterarXiv:2402.18206
#5399

Sieve: Multimodal Dataset Pruning using Image Captioning Models

Anas Mahmoud, Mostafa Elhoushi, Amro Abbas et al.

CVPR 2024posterarXiv:2310.02110
#5400

Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation

Song Wang, Jiawei Yu, Wentong Li et al.

CVPR 2024posterarXiv:2404.11958