Most Cited CVPR &quot;utxo-based design&quot; Papers

CVPR 2024posterarXiv:2403.11380

#2002

Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach

Beichen Zhang, Xiaoxing Wang, Xiaohan Qin et al.

CVPR 2025highlightarXiv:2503.13985

#2003

DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection

Jaewoo Song, Daemin Park, Kanghyun Baek et al.

CVPR 2025posterarXiv:2506.08210

#2004

A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation

Andrew Z Wang, Songwei Ge, Tero Karras et al.

CVPR 2024posterarXiv:2403.16258

#2005

Laplacian-guided Entropy Model in Neural Codec with Blur-dissipated Synthesis

Atefeh Khoshkhahtinat, Ali Zafari, Piyush Mehta et al.

CVPR 2025posterarXiv:2501.11043

#2006

BF-STVSR: B-Splines and Fourier---Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution

Eunjin Kim, HYEONJIN KIM, Kyong Hwan Jin et al.

CVPR 2025posterarXiv:2503.20172

#2007

Guiding Human-Object Interactions with Rich Geometry and Relations

Mengqing Xue, Yifei Liu, Ling Guo et al.

CVPR 2025highlightarXiv:2501.11319

#2008

StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer

ruojun xu, Weijie Xi, Xiaodi Wang et al.

#2009

Generative Sparse-View Gaussian Splatting

Hanyang Kong, Xingyi Yang, Xinchao Wang

CVPR 2025posterarXiv:2503.22201

#2010

Multi-modal Knowledge Distillation-based Human Trajectory Forecasting

Jaewoo Jeong, Seohee Lee, Daehee Park et al.

CVPR 2025posterarXiv:2507.17083

#2011

SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction

ZaiPeng Duan, Xuzhong Hu, Pei An et al.

CVPR 2024posterarXiv:2404.15263

#2012

Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization

Lahav Lipson, Jia Deng

#2013

AlphaPre: Amplitude-Phase Disentanglement Model for Precipitation Nowcasting

Kenghong Lin, Baoquan Zhang, Demin Yu et al.

CVPR 2025posterarXiv:2503.19913

#2014

PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model

Mingju Gao, Yike Pan, Huan-ang Gao et al.

CVPR 2025highlightarXiv:2502.07814

#2015

Satellite Observations Guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution

Siwei Tu, Ben Fei, Weidong Yang et al.

CVPR 2025posterarXiv:2503.13063

#2016

Federated Learning with Domain Shift Eraser

Zheng Wang, Zihui Wang, Zheng Wang et al.

CVPR 2024posterarXiv:2403.18469

#2017

Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds

Zhimin Yuan, Wankang Zeng, Yanfei Su et al.

#2018

D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation

Weinan Jia, Mengqi Huang, Nan Chen et al.

CVPR 2025posterarXiv:2407.13772

#2019

GroupMamba: Efficient Group-Based Visual State Space Model

Abdelrahman Shaker, Syed Talal Wasim, Salman Khan et al.

CVPR 2025posterarXiv:2504.04744

#2020

Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions

He Zhu, Quyu Kong, Kechun Xu et al.

CVPR 2025highlightarXiv:2504.12284

#2021

How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions

Aditya Prakash, Benjamin E Lundell, Dmitry Andreychuk et al.

CVPR 2024posterarXiv:2403.17638

#2022

Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency

Xu Yingjie, Bangzhen Liu, Hao Tang et al.

CVPR 2024posterarXiv:2210.05248

#2023

Self-supervised Debiasing Using Low Rank Regularization

Geon Yeong Park, Chanyong Jung, Sangmin Lee et al.

CVPR 2025posterarXiv:2410.15980

#2024

Learning from Neighbors: Category Extrapolation for Long-Tail Learning

Shizhen Zhao, Xin Wen, Jiahui Liu et al.

CVPR 2025posterarXiv:2505.22079

#2025

Bringing CLIP to the Clinic: Dynamic Soft Labels and Negation-Aware Learning for Medical Analysis

Hanbin Ko, Chang Min Park

#2026

Audio-Visual Semantic Graph Network for Audio-Visual Event Localization

Liang Liu, Shuaiyong Li, Yongqiang Zhu

#2027

Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data Engine

Zhaohu Xing, Lihao Liu, Yijun Yang et al.

#2028

AniMo: Species-Aware Model for Text-Driven Animal Motion Generation

Xuan Wang, Kai Ruan, Xing Zhang et al.

CVPR 2025posterarXiv:2412.07696

#2029

SimVS: Simulating World Inconsistencies for Robust View Synthesis

Alex Trevithick, Roni Paiss, Philipp Henzler et al.

CVPR 2024posterarXiv:2404.12322

#2030

Generalizable Face Landmarking Guided by Conditional Face Warping

Jiayi Liang, Haotian Liu, Hongteng Xu et al.

CVPR 2025highlightarXiv:2505.21943

#2031

Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting

Wei Lin, Chenyang ZHAO, Antoni B. Chan

CVPR 2024posterarXiv:2404.00252

#2032

Learned Scanpaths Aid Blind Panoramic Video Quality Assessment

Kanglong FAN, Wen Wen, Mu Li et al.

CVPR 2024posterarXiv:2310.12153

#2033

Probabilistic Sampling of Balanced K-Means using Adiabatic Quantum Computing

Jan-Nico Zaech, Martin Danelljan, Tolga Birdal et al.

CVPR 2025posterarXiv:2503.09243

#2034

GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation

Ruihai Wu, Ziyu Zhu, Yuran Wang et al.

CVPR 2025posterarXiv:2504.03639

#2035

Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions

Ting-Hsuan Liao, Yi Zhou, Yu Shen et al.

#2036

Zero-shot 3D Question Answering via Voxel-based Dynamic Token Compression

Hsiang-Wei Huang, Fu-Chen Chen, Wenhao Chai et al.

#2037

EdgeDiff: Edge-aware Diffusion Network for Building Reconstruction from Point Clouds

Yujun Liu, Ruisheng Wang, Shangfeng Huang et al.

#2038

Boosting Adversarial Transferability through Augmentation in Hypothesis Space

Yu Guo, Weiquan Liu, Qingshan Xu et al.

#2039

HeMoRa: Unsupervised Heuristic Consensus Sampling for Robust Point Cloud Registration

Shaocheng Yan, Yiming Wang, Kaiyan Zhao et al.

CVPR 2024posterarXiv:2311.18695

#2040

Seg2Reg: Differentiable 2D Segmentation to 1D Regression Rendering for 360 Room Layout Reconstruction

Cheng Sun, Wei-En Tai, Yu-Lin Shih et al.

CVPR 2025posterarXiv:2412.16915

#2041

FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation

Tianyun Zhong, Chao Liang, Jianwen Jiang et al.

CVPR 2024posterarXiv:2211.14309

#2042

FutureHuman3D: Forecasting Complex Long-Term 3D Human Behavior from Video Observations

Christian Diller, Thomas Funkhouser, Angela Dai

CVPR 2025posterarXiv:2503.16572

#2043

Efficient ANN-Guided Distillation: Aligning Rate-based Features of Spiking Neural Networks through Hybrid Block-wise Replacement

Shu Yang, Chengting Yu, Lei Liu et al.

CVPR 2025posterarXiv:2504.06827

#2044

IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments

Can Zhang, Gim Hee Lee

CVPR 2025posterarXiv:2503.20824

#2045

Exploiting Temporal State Space Sharing for Video Semantic Segmentation

Hesham Syed, Yun Liu, Guolei Sun et al.

CVPR 2025highlightarXiv:2411.15678

#2046

Towards RAW Object Detection in Diverse Conditions

Zhong-Yu Li, Xin Jin, Bo-Yuan Sun et al.

CVPR 2025posterarXiv:2412.16645

#2047

Complementary Advantages: Exploiting Cross-Field Frequency Correlation for NIR-Assisted Image Denoising

Yuchen Wang, Hongyuan Wang, Lizhi Wang et al.

CVPR 2024posterarXiv:2403.10335

#2048

NECA: Neural Customizable Human Avatar

Junjin Xiao, Qing Zhang, Zhan Xu et al.

CVPR 2025posterarXiv:2502.19894

#2049

High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model

Mingtao Guo, Guanyu Xing, Yanli Liu

CVPR 2025posterarXiv:2411.19756

#2050

DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering

Yihao Wang, Marcus Klasson, Matias Turkulainen et al.

CVPR 2025posterarXiv:2501.01589

#2051

D^3-Human: Dynamic Disentangled Digital Human from Monocular Video

Honghu Chen, Bo Peng, Yunfan Tao et al.

CVPR 2025posterarXiv:2504.01515

#2052

Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis

Zixuan Wang, DUO PENG, Feng Chen et al.

CVPR 2024posterarXiv:2308.10638

#2053

SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes

Soubhik Sanyal, Partha Ghosh, Jinlong Yang et al.

CVPR 2025posterarXiv:2411.12951

#2054

On the Consistency of Video Large Language Models in Temporal Comprehension

Minjoon Jung, Junbin Xiao, Byoung-Tak Zhang et al.

CVPR 2025highlightarXiv:2504.05046

#2055

MotionPRO: Exploring the Role of Pressure in Human MoCap and Beyond

Shenghao Ren, Yi Lu, Jiayi Huang et al.

#2056

A Polarization-Aided Transformer for Image Deblurring via Motion Vector Decomposition

Duosheng Chen, Shihao Zhou, Jinshan Pan et al.

CVPR 2025highlight

#2057

ShapeWalk: Compositional Shape Editing Through Language-Guided Chains

Habib Slim, Mohamed Elhoseiny

CVPR 2025highlightarXiv:2504.05838

#2058

Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking

Junxi Chen, Junhao Dong, Xiaohua Xie

CVPR 2024posterarXiv:2404.19384

#2059

Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection

Zhanwei Zhang, Minghao Chen, Shuai Xiao et al.

CVPR 2025posterarXiv:2411.16799

#2060

One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception

Yuchen Xia, Quan Yuan, Guiyang Luo et al.

CVPR 2025posterarXiv:2503.12745

#2061

ProtoDepth: Unsupervised Continual Depth Completion with Prototypes

Patrick Rim, Hyoungseob Park, Suchisrit Gangopadhyay et al.

CVPR 2025posterarXiv:2503.00591

#2062

AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models

Sohan Patnaik, Rishabh Jain, Balaji Krishnamurthy et al.

CVPR 2024posterarXiv:2405.07481

#2063

Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis

Tianci Bi, Xiaoyi Zhang, Zhizheng Zhang et al.

CVPR 2025posterarXiv:2504.10857

#2064

ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping

Shun Iwase, Muhammad Zubair Irshad, Katherine Liu et al.

CVPR 2025highlightarXiv:2412.20651

#2065

Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis

Yousef Yeganeh, Ioannis Charisiadis, Marta Hasny et al.

CVPR 2025posterarXiv:2502.06029

#2066

DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations

Krishna Sri Ipsit Mantri, Carola-Bibiane Schönlieb, Bruno Ribeiro et al.

CVPR 2025posterarXiv:2505.07539

#2067

GIFStream: 4D Gaussian-based Immersive Video with Feature Stream

Hao Li, Sicheng Li, Xiang Gao et al.

CVPR 2024highlightarXiv:2405.02581

#2068

Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements

Niccolò Biondi, Federico Pernici, Simone Ricci et al.

CVPR 2025posterarXiv:2505.11800

#2069

Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion Model

Jian Zhu, He Wang, Yang Xu et al.

CVPR 2024posterarXiv:2404.02585

#2070

Unsegment Anything by Simulating Deformation

Jiahao Lu, Xingyi Yang, Xinchao Wang

CVPR 2024posterarXiv:2403.06846

#2071

DiaLoc: An Iterative Approach to Embodied Dialog Localization

Chao Zhang, Mohan Li, Ignas Budvytis et al.

CVPR 2024posterarXiv:2311.14097

#2072

ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models

Fei Kong, Jinhao Duan, Lichao Sun et al.

CVPR 2025posterarXiv:2504.07095

#2073

Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning

Chenjie Hao, Weyl Lu, Yifan Xu et al.

CVPR 2024posterarXiv:2403.17761

#2074

Makeup Prior Models for 3D Facial Makeup Estimation and Applications

Xingchao Yang, Takafumi Taketomi, Yuki Endo et al.

CVPR 2025posterarXiv:2406.16321

#2075

Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning

Jing Zhu, Yuhang Zhou, Shengyi Qian et al.

CVPR 2025posterarXiv:2506.07865

#2076

FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity

Jinxi Li, Ziyang Song, Siyuan Zhou et al.

CVPR 2025posterarXiv:2405.16414

#2077

Robust Message Embedding via Attention Flow-Based Steganography

Huayuan Ye, Shenzhuo Zhang, Shiqi Jiang et al.

#2078

Intensity-Robust Autofocus for Spike Camera

Changqing Su, Zhiyuan Ye, Yongsheng Xiao et al.

CVPR 2025highlightarXiv:2410.10604

#2079

Multi-modal Vision Pre-training for Medical Image Analysis

Shaohao Rui, Lingzhi Chen, Zhenyu Tang et al.

CVPR 2025highlightarXiv:2412.03937

#2080

AIpparel: A Multimodal Foundation Model for Digital Garments

Kiyohiro Nakayama, Jan Ackermann, Timur Levent Kesdogan et al.

CVPR 2025posterarXiv:2503.20418

#2081

ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On

Ji Woo Hong, Tri Ton, Trung X. Pham et al.

#2082

UHD-processer: Unified UHD Image Restoration with Progressive Frequency Learning and Degradation-aware Prompts

Yidi Liu, Dong Li, Xueyang Fu et al.

CVPR 2025posterarXiv:2503.00063

#2083

NoPain: No-box Point Cloud Attack via Optimal Transport Singular Boundary

Zezeng Li, Xiaoyu Du, Na Lei et al.

#2084

Revisiting Source-Free Domain Adaptation: Insights into Representativeness, Generalization, and Variety

Ronghang Zhu, Mengxuan Hu, Weiming Zhuang et al.

CVPR 2025posterarXiv:2505.02148

#2085

Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving

Alexey Nekrasov, Malcolm Burdorf, Stewart Worrall et al.

CVPR 2025posterarXiv:2411.16173

#2086

SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis

Junho Kim, Hyunjun Kim, Hosu Lee et al.

CVPR 2025highlightarXiv:2412.19637

#2087

ReNeg: Learning Negative Embedding with Reward Guidance

Xiaomin Li, yixuan liu, Takashi Isobe et al.

CVPR 2025posterarXiv:2504.17813

#2088

CLOC: Contrastive Learning for Ordinal Classification with Multi-Margin N-pair Loss

Dileepa Pitawela, Gustavo Carneiro, Hsiang-Ting Chen

CVPR 2025posterarXiv:2412.01537

#2089

HandOS: 3D Hand Reconstruction in One Stage

Xingyu Chen, Zhuheng Song, Xiaoke Jiang et al.

CVPR 2025posterarXiv:2503.21659

#2090

InteractionMap: Improving Online Vectorized HDMap Construction with Interaction

Kuang Wu, Chuan Yang, Zhanbin Li

CVPR 2025posterarXiv:2408.15503

#2091

RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments

Haisheng Su, Feixiang Song, CONG MA et al.

CVPR 2025posterarXiv:2503.12077

#2092

V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents

Zhengrong Yue, Shaobin Zhuang, Kunchang Li et al.

#2093

Binarized Neural Network for Multi-spectral Image Fusion

Junming Hou, Xiaoyu Chen, Ran Ran et al.

#2094

Enhancing Testing-Time Robustness for Trusted Multi-View Classification in the Wild

Wei Liu, Yufei Chen, Xiaodong Yue

CVPR 2024posterarXiv:2401.10219

#2095

Edit One for All: Interactive Batch Image Editing

Thao Nguyen, Utkarsh Ojha, Yuheng Li et al.

#2096

Pos3R: 6D Pose Estimation for Unseen Objects Made Easy

Weijian Deng, Dylan Campbell, Chunyi Sun et al.

CVPR 2025posterarXiv:2412.01792

#2097

CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion

Kai He, Chin-Hsuan Wu, Igor Gilitschenski

CVPR 2025posterarXiv:2503.04565

#2098

Omnidirectional Multi-Object Tracking

Kai Luo, Hao Shi, Sheng Wu et al.

CVPR 2024posterarXiv:2403.11812

#2099

Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery

Yuqi Zhang, Guanying Chen, Jiaxing Chen et al.

CVPR 2025posterarXiv:2506.10966

#2100

GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation

Ning Gao, Yilun Chen, Shuai Yang et al.

CVPR 2025posterarXiv:2501.08303

#2101

Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers

Efstathios Karypidis, Ioannis Kakogeorgiou, Spyros Gidaris et al.

#2102

Efficient Hyperparameter Optimization with Adaptive Fidelity Identification

Jiantong Jiang, Zeyi Wen, Atif Mansoor et al.

CVPR 2025posterarXiv:2412.10084

#2103

ProbeSDF: Light Field Probes For Neural Surface Reconstruction

Briac Toussaint, Diego Thomas, Jean-Sébastien Franco

#2104

Dynamic Stereotype Theory Induced Micro-expression Recognition with Oriented Deformation

Bohao Zhang, Xuejiao Wang, Changbo Wang et al.

CVPR 2025posterarXiv:2503.19191

#2105

FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing

Yufan Ren, Zicong Jiang, Tong Zhang et al.

CVPR 2025posterarXiv:2503.02101

#2106

Generalized Diffusion Detector: Mining Robust Features from Diffusion Models for Domain-Generalized Detection

Boyong He, Yuxiang Ji, Qianwen Ye et al.

CVPR 2025posterarXiv:2503.18695

#2107

OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad

Luyao Tang, Chaoqi Chen, Yuxuan Yuan et al.

CVPR 2025posterarXiv:2412.07767

#2108

Learning Visual Generative Priors without Text

Shuailei Ma, Kecheng Zheng, Ying Wei et al.

CVPR 2025highlightarXiv:2412.06767

#2109

MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views

Antoine Guédon, Tomoki Ichikawa, Kohei Yamashita et al.

#2110

HUSH: Holistic Panoramic 3D Scene Understanding using Spherical Harmonics

Jongsung Lee, HARIN PARK, Byeong-Uk Lee et al.

CVPR 2024posterarXiv:2311.16304

#2111

Robust Self-calibration of Focal Lengths from the Fundamental Matrix

Viktor Kocur, Daniel Kyselica, Zuzana Kukelova

CVPR 2025posterarXiv:2502.19739

#2112

LUCAS: Layered Universal Codec Avatars

Di Liu, Teng Deng, Giljoo Nam et al.

CVPR 2025posterarXiv:2503.15404

#2113

Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement

Yuchen Ren, Zhengyu Zhao, Chenhao Lin et al.

CVPR 2024highlightarXiv:2401.13296

#2114

Visual Objectification in Films: Towards a New AI Task for Video Interpretation

Julie Tores, Lucile Sassatelli, Hui-Yin Wu et al.

#2115

Detection-Friendly Nonuniformity Correction: A Union Framework for Infrared UAV Target Detection

Houzhang Fang, Xiaolin Wang, Zengyang Li et al.

CVPR 2025highlight

CVPR 2024posterarXiv:2403.20031

#2116

A Unified Framework for Human-centric Point Cloud Video Understanding

Yiteng Xu, Kecheng Ye, xiao han et al.

CVPR 2025posterarXiv:2503.18359

#2117

Context-Enhanced Memory-Refined Transformer for Online Action Detection

Zhanzhong Pang, Fadime Sener, Angela Yao

CVPR 2025highlightarXiv:2503.09962

#2118

Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification

Jiayu Jiang, Changxing Ding, Wentao Tan et al.

CVPR 2025posterarXiv:2503.12042

#2119

Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing

Zhedong Zhang, Liang Li, Chenggang Yan et al.

#2120

On the Zero-shot Adversarial Robustness of Vision-Language Models: A Truly Zero-shot and Training-free Approach

Baoshun Tong, Hanjiang Lai, Yan Pan et al.

CVPR 2025posterarXiv:2412.18565

#2121

3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement

Yihang Luo, Shangchen Zhou, Yushi Lan et al.

CVPR 2025posterarXiv:2412.01244

#2122

Concept Replacer: Replacing Sensitive Concepts in Diffusion Models via Precision Localization

lingyun zhang, Yu Xie, Yanwei Fu et al.

#2123

Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space

Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.

CVPR 2025posterarXiv:2504.04156

#2124

CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation

Kai Fang, Anqi Zhang, Guangyu Gao et al.

CVPR 2025highlightarXiv:2503.04459

#2125

Question-Aware Gaussian Experts for Audio-Visual Question Answering

Hongyeob Kim, Inyoung Jung, Dayoon Suh et al.

CVPR 2025posterarXiv:2502.19754

#2126

Finding Local Diffusion Schrödinger Bridge using Kolmogorov-Arnold Network

Xingyu Qiu, Mengying Yang, Xinghua Ma et al.

CVPR 2025posterarXiv:2503.01309

#2127

OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging

Yijie Tang, Jiazhao Zhang, Yuqing Lan et al.

CVPR 2025posterarXiv:2506.18557

#2128

Object-aware Sound Source Localization via Audio-Visual Scene Understanding

Sung Jin Um, Dongjin Kim, Sangmin Lee et al.

CVPR 2025posterarXiv:2406.13378

#2129

PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation

Zidong Cao, Jinjing Zhu, Weiming Zhang et al.

CVPR 2025posterarXiv:2408.07790

#2130

Cropper: Vision-Language Model for Image Cropping through In-Context Learning

Seung Hyun Lee, Jijun jiang, Yiran Xu et al.

CVPR 2025posterarXiv:2503.22168

#2131

Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis

Woojung Han, Yeonkyung Lee, Chanyoung Kim et al.

CVPR 2025posterarXiv:2503.19377

#2132

Interpretable Generative Models through Post-hoc Concept Bottlenecks

Akshay R. Kulkarni, Ge Yan, Chung-En Sun et al.

CVPR 2025posterarXiv:2503.01653

#2133

Distilled Prompt Learning for Incomplete Multimodal Survival Prediction

Yingxue Xu, Fengtao ZHOU, Chenyu Zhao et al.

CVPR 2025posterarXiv:2504.08449

#2134

Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input

Jian Wang, Rishabh Dabral, Diogo Luvizon et al.

CVPR 2025posterarXiv:2411.15231

#2135

IterIS: Iterative Inference-Solving Alignment for LoRA Merging

Hongxu chen, Zhen Wang, Runshi Li et al.

CVPR 2024posterarXiv:2401.14349

#2136

Learning to Navigate Efficiently and Precisely in Real Environments

Guillaume Bono, Hervé Poirier, Leonid Antsfeld et al.

CVPR 2025posterarXiv:2411.18654

#2137

AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward

Haonan Han, Xiangzuo Wu, Huan Liao et al.

CVPR 2025posterarXiv:2412.02993

#2138

EchoONE: Segmenting Multiple Echocardiography Planes in One Model

Jiongtong Hu, Wei Zhuo, Jun Cheng et al.

CVPR 2025posterarXiv:2503.17080

#2139

Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection

Gensheng Pei, Tao Chen, Yujia Wang et al.

CVPR 2025posterarXiv:2503.24382

#2140

Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views

Chong Bao, Xiyu Zhang, Zehao Yu et al.

CVPR 2025posterarXiv:2502.20208

#2141

4Deform: Neural Surface Deformation for Robust Shape Interpolation

Lu Sang, Zehranaz Canfes, Dongliang Cao et al.

CVPR 2025posterarXiv:2503.05484

#2142

DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction

Miaowei Wang, Yibo Zhang, Rui Ma et al.

CVPR 2025posterarXiv:2505.01428

#2143

Multi-party Collaborative Attention Control for Image Customization

Han Yang, Chuanguang Yang, Qiuli Wang et al.

CVPR 2025posterarXiv:2310.11439

#2144

From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport

Quentin Bouniot, Ievgen Redko, Anton Mallasto et al.

CVPR 2025highlightarXiv:2503.18682

#2145

Hardware-Rasterized Ray-Based Gaussian Splatting

Samuel Rota Bulò, Lorenzo Porzi, Nemanja Bartolovic et al.

CVPR 2025posterarXiv:2404.14414

#2146

Removing Reflections from RAW Photos

Eric Kee, Adam Pikielny, Kevin Blackburn-Matzen et al.

CVPR 2025posterarXiv:2411.15432

#2147

Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts

Qizhou Chen, Chengyu Wang, Dakan Wang et al.

CVPR 2024posterarXiv:2406.18540

#2148

Fully Exploiting Every Real Sample: SuperPixel Sample Gradient Model Stealing

Yunlong Zhao, Xiaoheng Deng, Yijing Liu et al.

#2149

Learning Heterogeneous Tissues with Mixture of Experts for Gigapixel Whole Slide Images

Junxian Wu, Minheng Chen, Xinyi Ke et al.

CVPR 2025posterarXiv:2310.14356

#2150

Semantic and Expressive Variations in Image Captions Across Languages

Andre Ye, Sebastin Santy, Jena D. Hwang et al.

CVPR 2025highlightarXiv:2412.11441

#2151

UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models

Yuning Han, Bingyin Zhao, Rui Chu et al.

CVPR 2025posterarXiv:2504.02244

#2152

SocialGesture: Delving into Multi-person Gesture Understanding

Xu Cao, Pranav Virupaksha, Wenqi Jia et al.

#2153

TFCustom: Customized Image Generation with Time-Aware Frequency Feature Guidance

Mushui Liu, Dong She, Qihan Huang et al.

CVPR 2025highlight

CVPR 2025posterarXiv:2412.01316

#2154

Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation

Xin Yan, Yuxuan Cai, Qiuyue Wang et al.

CVPR 2024posterarXiv:2405.14934

#2155

Universal Robustness via Median Randomized Smoothing for Real-World Super-Resolution

Zakariya Chaouai, Mohamed Tamaazousti

CVPR 2025posterarXiv:2412.15050

#2156

Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion

ZhiFei Chen, Tianshuo Xu, Wenhang Ge et al.

CVPR 2025posterarXiv:2411.15553

#2157

Improving Transferable Targeted Attacks with Feature Tuning Mixup

Kaisheng Liang, Xuelong Dai, Yanjie Li et al.

CVPR 2025posterarXiv:2412.19712

#2158

From Elements to Design: A Layered Approach for Automatic Graphic Design Composition

Jiawei Lin, Shizhao Sun, Danqing Huang et al.

CVPR 2025posterarXiv:2503.13957

#2159

DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation

Mu Chen, Liulei Li, Wenguan Wang et al.

#2160

Open-World Objectness Modeling Unifies Novel Object Detection

Shan Zhang, Yao Ni, Jinhao Du et al.

#2161

Flow-Guided Online Stereo Rectification for Wide Baseline Stereo

Anush Kumar, Fahim Mannan, Omid Hosseini Jafari et al.

CVPR 2025posterarXiv:2505.04109

#2162

One2Any: One-Reference 6D Pose Estimation for Any Object

Mengya Liu, Siyuan Li, Ajad Chhatkuli et al.

#2163

JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems

Yifan Wang, Jian Zhao, Zhaoxin Fan et al.

CVPR 2025posterarXiv:2412.16158

#2164

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

Chenxin Tao, Shiqian Su, Xizhou Zhu et al.

CVPR 2025posterarXiv:2412.05278

#2165

Birth and Death of a Rose

Chen Geng, Yunzhi Zhang, Shangzhe Wu et al.

CVPR 2025highlightarXiv:2502.20126

#2166

FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute

Sotiris Anagnostidis, Gregor Bachmann, Yeongmin Kim et al.

CVPR 2024posterarXiv:2405.19646

#2167

FaceLift: Semi-supervised 3D Facial Landmark Localization

David Ferman, Pablo Garrido, Gaurav Bharaj

CVPR 2024posterarXiv:2403.07560

#2168

Unleashing Network Potentials for Semantic Scene Completion

Fengyun Wang, Qianru Sun, Dong Zhang et al.

#2169

Adversarial Domain Prompt Tuning and Generation for Single Domain Generalization

Zhipeng Xu, De Cheng, XINYANG JIANG et al.

CVPR 2025posterarXiv:2504.18856

#2170

Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation

Shahad Albastaki, Anabia Sohail, IYYAKUTTI IYAPPAN GANAPATHI et al.

CVPR 2024posterarXiv:2312.02136

#2171

BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation

Qihang Zhang, Yinghao Xu, Yujun Shen et al.

CVPR 2024posterarXiv:2404.05675

#2172

Normalizing Flows on the Product Space of SO(3) Manifolds for Probabilistic Human Pose Modeling

Olaf Dünkel, Tim Salzmann, Florian Pfaff

#2173

NightAdapter: Learning a Frequency Adapter for Generalizable Night-time Scene Segmentation

Qi Bi, Jingjun Yi, Huimin Huang et al.

CVPR 2025posterarXiv:2411.09998

#2174

Adaptive Non-Uniform Timestep Sampling for Accelerating Diffusion Model Training

Myunsoo Kim, Donghyeon Ki, Seong-Woong Shim et al.

CVPR 2024posterarXiv:2404.02242

#2175

Towards Robust 3D Pose Transfer with Adversarial Learning

Haoyu Chen, Hao Tang, Ehsan Adeli et al.

CVPR 2025posterarXiv:2503.18150

#2176

LongDiff: Training-Free Long Video Generation in One Go

Zhuoling Li, Hossein Rahmani, Qiuhong Ke et al.

#2177

ABC-Former: Auxiliary Bimodal Cross-domain Transformer with Interactive Channel Attention for White Balance

Yu-Cheng Chiu, GUAN-RONG CHEN, Zihao Chen et al.

#2178

Projecting Trackable Thermal Patterns for Dynamic Computer Vision

Mark Sheinin, Aswin C. Sankaranarayanan, Srinivasa G. Narasimhan

CVPR 2025posterarXiv:2411.07765

#2179

Novel View Synthesis with Pixel-Space Diffusion Models

Noam Elata, Bahjat Kawar, Yaron Ostrovsky-Berman et al.

CVPR 2025highlightarXiv:2412.03968

#2180

Exact: Exploring Space-Time Perceptive Clues for Weakly Supervised Satellite Image Time Series Semantic Segmentation

Hao Zhu, Yan Zhu, Jiayu Xiao et al.

CVPR 2025posterarXiv:2503.23733

#2181

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

Yiyang Du, Xiaochen Wang, Chi Chen et al.

CVPR 2024posterarXiv:2308.12831

#2182

EFormer: Enhanced Transformer towards Semantic-Contour Features of Foreground for Portraits Matting

Zitao Wang, Qiguang Miao, Yue Xi et al.

CVPR 2024posterarXiv:2403.16937

#2183

Hyperspherical Classification with Dynamic Label-to-Prototype Assignment

Mohammad Saadabadi Saadabadi, Ali Dabouei, Sahar Rahimi Malakshan et al.

CVPR 2025posterarXiv:2505.06218

#2184

Let Humanoids Hike! Integrative Skill Development on Complex Trails

Kwan-Yee Lin, Stella X. Yu

CVPR 2025highlightarXiv:2504.10676

#2185

H-MoRe: Learning Human-centric Motion Representation for Action Analysis

Zhanbo Huang, Xiaoming Liu, Yu Kong

CVPR 2025posterarXiv:2503.21150

#2186

The Devil is in Low-Level Features for Cross-Domain Few-Shot Segmentation

Yuhan Liu, Yixiong Zou, Yuhua Li et al.

CVPR 2025posterarXiv:2503.18055

#2187

PolarFree: Polarization-based Reflection-Free Imaging

Mingde Yao, Menglu Wang, King Man Tam et al.

CVPR 2025posterarXiv:2503.04006

#2188

DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation

Amin Karimi, Charalambos Poullis

CVPR 2024posterarXiv:2403.19811

#2189

X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization

Anna Kukleva, Fadime Sener, Edoardo Remelli et al.

CVPR 2024posterarXiv:2405.19819

#2190

Gated Fields: Learning Scene Reconstruction from Gated Videos

Andrea Ramazzina, Stefanie Walz, Pragyan Dahal et al.

#2191

Simplification Is All You Need against Out-of-Distribution Overconfidence

Keke Tang, Chao Hou, Weilong Peng et al.

CVPR 2025highlightarXiv:2503.15005

#2192

Universal Scene Graph Generation

Shengqiong Wu, Hao Fei, Tat-seng Chua

CVPR 2025posterarXiv:2411.13549

#2193

Generating 3D-Consistent Videos from Unposed Internet Photos

Gene Chou, Kai Zhang, Sai Bi et al.

CVPR 2024posterarXiv:2403.19220

#2194

GeoAuxNet: Towards Universal 3D Representation Learning for Multi-sensor Point Clouds

Shengjun Zhang, Xin Fei, Yueqi Duan

CVPR 2025posterarXiv:2412.05161

#2195

DNF: Unconditional 4D Generation with Dictionary-based Neural Fields

Xinyi Zhang, Naiqi Li, Angela Dai

CVPR 2025posterarXiv:2503.06277

#2196

STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification

Siyi Du, Xinzhe Luo, Declan ORegan et al.

CVPR 2025posterarXiv:2403.10344

#2197

ViiNeuS: Volumetric Initialization for Implicit Neural Surface Reconstruction of Urban Scenes with Limited Image Overlap

Hala Djeghim, Nathan Piasco, Moussab Bennehar et al.

CVPR 2025posterarXiv:2504.01019

#2198

MixerMDM: Learnable Composition of Human Motion Diffusion Models

Pablo Ruiz-Ponce, German Barquero, Cristina Palmero et al.

CVPR 2024posterarXiv:2406.04961

#2199

Multiplane Prior Guided Few-Shot Aerial Scene Rendering

Zihan Gao, Licheng Jiao, Lingling Li et al.

CVPR 2024posterarXiv:2404.00524

#2200

TexVocab: Texture Vocabulary-conditioned Human Avatars

Yuxiao Liu, Zhe Li, Yebin Liu et al.