Most Cited CVPR "semantic relationships" Papers

5,589 papers found • Page 10 of 28

#1801

Text-Enhanced Data-free Approach for Federated Class-Incremental Learning

Minh-Tuan Tran, Trung Le, Xuan-May Le et al.

CVPR 2024arXiv:2403.14101
19
citations
#1802

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

Xiao Wang, Fuling Wang, Yuehang Li et al.

CVPR 2025arXiv:2410.00379
19
citations
#1803

DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving

Zhenhua Xu, Yan Bai, Yujia Zhang et al.

CVPR 2025highlight
19
citations
#1804

Hash3D: Training-free Acceleration for 3D Generation

Xingyi Yang, Songhua Liu, Xinchao Wang

CVPR 2025arXiv:2404.06091
19
citations
#1805

Overload: Latency Attacks on Object Detection for Edge Devices

Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung et al.

CVPR 2024arXiv:2304.05370
19
citations
#1806

ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models

Hao Yin, Guangzong Si, Zilei Wang

CVPR 2025arXiv:2503.13107
19
citations
#1807

A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack against Split Learning

Xiaoyang Xu, Mengda Yang, Wenzhe Yi et al.

CVPR 2024arXiv:2405.04115
19
citations
#1808

Attribute-Guided Pedestrian Retrieval: Bridging Person Re-ID with Internal Attribute Variability

Yan Huang, Zhang Zhang, Qiang Wu et al.

CVPR 2024
19
citations
#1809

Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation

Nicolas Dufour, Vicky Kalogeiton, David Picard et al.

CVPR 2025arXiv:2412.06781
19
citations
#1810

Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration

Tony C. W. MOK, Zi Li, Yunhao Bai et al.

CVPR 2024highlightarXiv:2402.18933
19
citations
#1811

Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model

Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.

CVPR 2024arXiv:2311.17112
19
citations
#1812

Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation

Mingyu Lee, Jongwon Choi

CVPR 2024arXiv:2403.06247
18
citations
#1813

3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling

Chaokang Jiang, Guangming Wang, Jiuming Liu et al.

CVPR 2024arXiv:2402.18146
18
citations
#1814

WaveFace: Authentic Face Restoration with Efficient Frequency Recovery

Yunqi Miao, Jiankang Deng, Jungong Han

CVPR 2024arXiv:2403.12760
18
citations
#1815

T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory

Daehee Park, Jaeseok Jeong, Sung-Hoon Yoon et al.

CVPR 2024arXiv:2403.10052
18
citations
#1816

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

Hang Hua, Qing Liu, Lingzhi Zhang et al.

CVPR 2025arXiv:2411.15411
18
citations
#1817

MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

Sanjoy Chowdhury, Sayan Nag, Joseph K J et al.

CVPR 2024highlightarXiv:2406.04673
18
citations
#1818

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach

Jing Bi, Lianggong Bruce Wen, Zhang Liu et al.

CVPR 2025arXiv:2412.18108
18
citations
#1819

FreePoint: Unsupervised Point Cloud Instance Segmentation

Zhikai Zhang, Jian Ding, Li Jiang et al.

CVPR 2024arXiv:2305.06973
18
citations
#1820

Text-Guided 3D Face Synthesis - From Generation to Editing

Yunjie Wu, Yapeng Meng, Zhipeng Hu et al.

CVPR 2024arXiv:2312.00375
18
citations
#1821

DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video

Huiqiang Sun, Xingyi Li, Liao Shen et al.

CVPR 2024arXiv:2403.10103
18
citations
#1822

Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring

Huicong Zhang, Haozhe Xie, Hongxun Yao

CVPR 2024arXiv:2406.07551
18
citations
#1823

LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising

Yuxing Duan

CVPR 2024arXiv:2405.19718
18
citations
#1824

Data Valuation and Detections in Federated Learning

Wenqian Li, Shuran Fu, Fengrui Zhang et al.

CVPR 2024arXiv:2311.05304
18
citations
#1825

Decompositional Neural Scene Reconstruction with Generative Diffusion Prior

Junfeng Ni, Yu Liu, Ruijie Lu et al.

CVPR 2025arXiv:2503.14830
18
citations
#1826

Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset

Zhao Dong, Ka chen, Zhaoyang Lv et al.

CVPR 2025highlightarXiv:2504.08541
18
citations
#1827

AssistGUI: Task-Oriented PC Graphical User Interface Automation

Difei Gao, Lei Ji, Zechen Bai et al.

CVPR 2024
18
citations
#1828

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

Lingmin Ran, Xiaodong Cun, Jia-Wei Liu et al.

CVPR 2024arXiv:2312.02238
18
citations
#1829

Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data

Xinting Liao, Weiming Liu, Chaochao Chen et al.

CVPR 2024arXiv:2403.16398
18
citations
#1830

Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation

Xiaohan Lei, Min Wang, Wengang Zhou et al.

CVPR 2024arXiv:2402.17587
18
citations
#1831

Revisiting Adversarial Training Under Long-Tailed Distributions

Xinli Yue, Ningping Mou, Qian Wang et al.

CVPR 2024arXiv:2403.10073
18
citations
#1832

RTracker: Recoverable Tracking via PN Tree Structured Memory

Yuqing Huang, Xin Li, Zikun Zhou et al.

CVPR 2024arXiv:2403.19242
18
citations
#1833

Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation

Xiyi Chen, Marko Mihajlovic, Shaofei Wang et al.

CVPR 2024arXiv:2401.04728
18
citations
#1834

A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization

Hongwei Ren, Jiadong Zhu, Yue Zhou et al.

CVPR 2024arXiv:2403.19412
18
citations
#1835

IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification

Yuhao Wang, Yongfeng Lv, Pingping Zhang et al.

CVPR 2025arXiv:2503.10324
18
citations
#1836

Understanding Video Transformers via Universal Concept Discovery

Matthew Kowal, Achal Dave, Rares Andrei Ambrus et al.

CVPR 2024highlightarXiv:2401.10831
18
citations
#1837

Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling

Junha Hyung, Kinam Kim, Susung Hong et al.

CVPR 2025arXiv:2411.18664
18
citations
#1838

Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing

Boqiang Zhang, Hongtao Xie, Zuan Gao et al.

CVPR 2024arXiv:2405.04377
18
citations
#1839

MagicArticulate: Make Your 3D Models Articulation-Ready

Chaoyue Song, Jianfeng Zhang, Xiu Li et al.

CVPR 2025arXiv:2502.12135
18
citations
#1840

Beyond Average: Individualized Visual Scanpath Prediction

Xianyu Chen, Ming Jiang, Qi Zhao

CVPR 2024arXiv:2404.12235
18
citations
#1841

SFOD: Spiking Fusion Object Detector

Yimeng Fan, Wei Zhang, Changsong Liu et al.

CVPR 2024arXiv:2403.15192
18
citations
#1842

Ref-GS: Directional Factorization for 2D Gaussian Splatting

Youjia Zhang, Anpei Chen, Yumin Wan et al.

CVPR 2025arXiv:2412.00905
18
citations
#1843

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation

Yihua Cheng, Yaning Zhu, Zongji Wang et al.

CVPR 2024arXiv:2403.15664
18
citations
#1844

INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations

Yongming Zhu, Longhao Zhang, Zhengkun Rong et al.

CVPR 2025arXiv:2412.04037
18
citations
#1845

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

Jiangyong Huang, Baoxiong Jia, Yan Wang et al.

CVPR 2025arXiv:2503.22420
18
citations
#1846

Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jue Zhang, Xiaoting Qin et al.

CVPR 2025highlightarXiv:2412.11077
18
citations
#1847

Spiking Transformer with Spatial-Temporal Attention

Donghyun Lee, Yuhang Li, Youngeun Kim et al.

CVPR 2025arXiv:2409.19764
18
citations
#1848

GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities

Rao Fu, Dingxi Zhang, Alex Jiang et al.

CVPR 2025highlightarXiv:2412.04244
18
citations
#1849

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Yiheng Li, RuiBing Hou, Hong Chang et al.

CVPR 2025highlightarXiv:2411.16781
18
citations
#1850

PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection

Jianan Ye, Weiguang Zhao, Xi Yang et al.

CVPR 2025arXiv:2412.12617
18
citations
#1851

HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images

Xihe Yang, Xingyu Chen, Daiheng Gao et al.

CVPR 2024arXiv:2311.15672
18
citations
#1852

Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement

Zaid Khan, Vijay Kumar BG, Samuel Schulter et al.

CVPR 2024arXiv:2404.04627
18
citations
#1853

MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation

Sumanth Udupa, Prajwal Gurunath, Aniruddh Sikdar et al.

CVPR 2024arXiv:2311.18331
18
citations
#1854

Shadow Generation for Composite Image Using Diffusion Model

Qingyang Liu, Junqi You, Jian-Ting Wang et al.

CVPR 2024arXiv:2403.15234
18
citations
#1855

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

Mingkun Lei, Xue Song, Beier Zhu et al.

CVPR 2025arXiv:2412.08503
18
citations
#1856

PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability

Weijie Zhou, Manli Tao, Chaoyang Zhao et al.

CVPR 2025arXiv:2503.08481
18
citations
#1857

HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

Yongliang Lin, Yongzhi Su, Praveen Nathan et al.

CVPR 2024arXiv:2311.12588
18
citations
#1858

FedMef: Towards Memory-efficient Federated Dynamic Pruning

Hong Huang, Weiming Zhuang, Chen Chen et al.

CVPR 2024arXiv:2403.14737
18
citations
#1859

Cloud-Device Collaborative Learning for Multimodal Large Language Models

Guanqun Wang, Jiaming Liu, Chenxuan Li et al.

CVPR 2024arXiv:2312.16279
18
citations
#1860

Cubify Anything: Scaling Indoor 3D Object Detection

Justin Lazarow, David Griffiths, Gefen Kohavi et al.

CVPR 2025highlightarXiv:2412.04458
18
citations
#1861

Exploring the Transferability of Visual Prompting for Multimodal Large Language Models

Yichi Zhang, Yinpeng Dong, Siyuan Zhang et al.

CVPR 2024highlightarXiv:2404.11207
18
citations
#1862

HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation

Boyuan Wang, Xiaofeng Wang, Chaojun Ni et al.

CVPR 2025arXiv:2503.24026
18
citations
#1863

DaReNeRF: Direction-aware Representation for Dynamic Scenes

Ange Lou, Benjamin Planche, Zhongpai Gao et al.

CVPR 2024arXiv:2403.02265
18
citations
#1864

Any6D: Model-free 6D Pose Estimation of Novel Object

Taeyeop Lee, Bowen Wen, Minjun Kang et al.

CVPR 2025arXiv:2503.18673
18
citations
#1865

TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing

Stefan Lionar, Jiabin Liang, Gim Hee Lee

CVPR 2025arXiv:2503.11629
18
citations
#1866

LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model

Dongkai Wang, shiyu xuan, Shiliang Zhang

CVPR 2024highlightarXiv:2406.04659
18
citations
#1867

Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh et al.

CVPR 2025arXiv:2411.18688
18
citations
#1868

Towards 3D Vision with Low-Cost Single-Photon Cameras

Fangzhou Mu, Carter Sifferman, Sacha Jungerman et al.

CVPR 2024arXiv:2403.17801
18
citations
#1869

Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation

Junyan Wang, Zhenhong Sun, Stewart Tan et al.

CVPR 2024arXiv:2403.05239
18
citations
#1870

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

Chao Xu, Yang Liu, Jiazheng Xing et al.

CVPR 2024arXiv:2403.01901
18
citations
#1871

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo et al.

CVPR 2025highlightarXiv:2412.04462
18
citations
#1872

CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement

Qiang Zhu, Jinhua Hao, Yukang Ding et al.

CVPR 2024arXiv:2403.10362
18
citations
#1873

PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor

Vidit Goel, Elia Peruzzo, Yifan Jiang et al.

CVPR 2024
18
citations
#1874

Robust Synthetic-to-Real Transfer for Stereo Matching

Jiawei Zhang, Jiahe Li, Lei Huang et al.

CVPR 2024arXiv:2403.07705
18
citations
#1875

Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training

Runze He, Shaofei Huang, Xuecheng Nie et al.

CVPR 2024arXiv:2312.01663
18
citations
#1876

DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation

Xiaoliang Ju, Zhaoyang Huang, Yijin Li et al.

CVPR 2024arXiv:2306.00519
18
citations
#1877

FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution

Junyang Chen, Jinshan Pan, Jiangxin Dong

CVPR 2025arXiv:2411.18824
18
citations
#1878

Video-Bench: Human-Aligned Video Generation Benchmark

Hui Han, Siyuan Li, Jiaqi Chen et al.

CVPR 2025arXiv:2504.04907
18
citations
#1879

The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion

Changan Chen, Juze Zhang, Shrinidhi Kowshika Lakshmikanth et al.

CVPR 2025arXiv:2412.10523
18
citations
#1880

Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks

Haijin Zeng, Xiangming Wang, Yongyong Chen et al.

CVPR 2025arXiv:2503.16930
18
citations
#1881

Interpreting Object-level Foundation Models via Visual Precision Search

Ruoyu Chen, Siyuan Liang, Jingzhi Li et al.

CVPR 2025highlightarXiv:2411.16198
18
citations
#1882

Rethinking the Evaluation Protocol of Domain Generalization

Han Yu, Xingxuan Zhang, Renzhe Xu et al.

CVPR 2024arXiv:2305.15253
18
citations
#1883

Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset

Xiao Wang, Yu Jin, Wentao Wu et al.

CVPR 2025arXiv:2412.06647
18
citations
#1884

Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters

Yuan Wang, Ouxiang Li, Tingting Mu et al.

CVPR 2025arXiv:2412.06143
18
citations
#1885

FAR: Flexible Accurate and Robust 6DoF Relative Camera Pose Estimation

Chris Rockwell, Nilesh Kulkarni, Linyi Jin et al.

CVPR 2024highlightarXiv:2403.03221
18
citations
#1886

DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

Harsh Rangwani, Pradipto Mondal, Mayank Mishra et al.

CVPR 2024arXiv:2404.02900
18
citations
#1887

Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

Yiming Li, Zhiheng Li, Nuo Chen et al.

CVPR 2024arXiv:2406.09383
18
citations
#1888

Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods

Chenfan Qu, Yiwu Zhong, Chongyu Liu et al.

CVPR 2024
18
citations
#1889

MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects

Lei Fan, Dongdong Fan, Zhiguang Hu et al.

CVPR 2025arXiv:2412.04867
18
citations
#1890

Iterative Predictor-Critic Code Decoding for Real-World Image Dehazing

Jiayi Fu, Siyu Liu, Zikun Liu et al.

CVPR 2025arXiv:2503.13147
17
citations
#1891

HRAvatar: High-Quality and Relightable Gaussian Head Avatar

Dongbin Zhang, Yunfei Liu, Lijian Lin et al.

CVPR 2025arXiv:2503.08224
17
citations
#1892

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

Kuan-Chih Huang, Weijie Lyu, Ming-Hsuan Yang et al.

CVPR 2024arXiv:2312.08371
17
citations
#1893

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

Yunzhi Zhang, Zizhang Li, Matt Zhou et al.

CVPR 2025highlightarXiv:2410.16770
17
citations
#1894

Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations

Shengeng Tang, Jiayi He, Lechao Cheng et al.

CVPR 2025arXiv:2411.16810
17
citations
#1895

Align Before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition

Yifei Chen, Dapeng Chen, Ruijin Liu et al.

CVPR 2024arXiv:2311.15619
17
citations
#1896

Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters

Zhiyang Guo, Jinxu Xiang, Kai Ma et al.

CVPR 2025highlightarXiv:2411.18197
17
citations
#1897

ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D Image

Marco Pesavento, Yuanlu Xu, Nikolaos Sarafianos et al.

CVPR 2024arXiv:2403.10357
17
citations
#1898

Efficiently Assemble Normalization Layers and Regularization for Federated Domain Generalization

Khiem Le, Tuan Long Ho, Cuong Do et al.

CVPR 2024arXiv:2403.15605
17
citations
#1899

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models

Xingrui Wang, Wufei Ma, Tiezheng Zhang et al.

CVPR 2025highlight
17
citations
#1900

Stratified Avatar Generation from Sparse Observations

Han Feng, Wenchao Ma, Quankai Gao et al.

CVPR 2024arXiv:2405.20786
17
citations
#1901

Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset

Yiqun Mei, Mingming He, Li Ma et al.

CVPR 2025arXiv:2503.14485
17
citations
#1902

De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts

Yuzheng Wang, Dingkang Yang, Zhaoyu Chen et al.

CVPR 2024arXiv:2403.19539
17
citations
#1903

Neural Visibility Field for Uncertainty-Driven Active Mapping

Shangjie Xue, Jesse Dill, Pranay Mathur et al.

CVPR 2024arXiv:2406.06948
17
citations
#1904

CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data

Wei Fang, Yuxing Tang, Heng Guo et al.

CVPR 2024arXiv:2404.04878
17
citations
#1905

Retraining-Free Model Quantization via One-Shot Weight-Coupling Learning

Chen Tang, Yuan Meng, Jiacheng Jiang et al.

CVPR 2024arXiv:2401.01543
17
citations
#1906

You Only Need Less Attention at Each Stage in Vision Transformers

Shuoxi Zhang, Hanpeng Liu, Stephen Lin et al.

CVPR 2024arXiv:2406.00427
17
citations
#1907

GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding

Zi-Ting Chou, Sheng-Yu Huang, I-Jieh Liu et al.

CVPR 2024arXiv:2403.03608
17
citations
#1908

Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera

Yuliang Guo, Sparsh Garg, S. Mahdi H. Miangoleh et al.

CVPR 2025arXiv:2501.02464
17
citations
#1909

Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation

Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang et al.

CVPR 2024arXiv:2403.04381
17
citations
#1910

Low-Resource Vision Challenges for Foundation Models

Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek

CVPR 2024arXiv:2401.04716
17
citations
#1911

CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

Xi Liu, Ying Guo, Cheng Zhen et al.

CVPR 2024arXiv:2403.00274
17
citations
#1912

Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

Haochen Han, Qinghua Zheng, Guang Dai et al.

CVPR 2024arXiv:2403.05105
17
citations
#1913

HDRFlow: Real-Time HDR Video Reconstruction with Large Motions

Gangwei Xu, Yujin Wang, Jinwei Gu et al.

CVPR 2024arXiv:2403.03447
17
citations
#1914

Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

Haoxuanye Ji, Pengpeng Liang, Erkang Cheng

CVPR 2024arXiv:2403.06093
17
citations
#1915

De-Diffusion Makes Text a Strong Cross-Modal Interface

Chen Wei, Chenxi Liu, Siyuan Qiao et al.

CVPR 2024arXiv:2311.00618
17
citations
#1916

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors

Haifeng Huang, Xinyi Chen, Yilun Chen et al.

CVPR 2025arXiv:2504.21530
17
citations
#1917

Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss

Jaeha Kim, Junghun Oh, Kyoung Mu Lee

CVPR 2024arXiv:2404.01692
17
citations
#1918

NeISF: Neural Incident Stokes Field for Geometry and Material Estimation

Chenhao Li, Taishi Ono, Takeshi Uemori et al.

CVPR 2024highlightarXiv:2311.13187
17
citations
#1919

Layout-Agnostic Scene Text Image Synthesis with Diffusion Models

Qilong Zhangli, Jindong Jiang, Di Liu et al.

CVPR 2024arXiv:2406.01062
17
citations
#1920

MambaIC: State Space Models for High-Performance Learned Image Compression

Fanhu Zeng, Hao Tang, Yihua Shao et al.

CVPR 2025arXiv:2503.12461
17
citations
#1921

PrEditor3D: Fast and Precise 3D Shape Editing

Ziya Erkoc, Can Gümeli, Chaoyang Wang et al.

CVPR 2025arXiv:2412.06592
17
citations
#1922

Geometry Transfer for Stylizing Radiance Fields

Hyunyoung Jung, Seonghyeon Nam, Nikolaos Sarafianos et al.

CVPR 2024arXiv:2402.00863
17
citations
#1923

EnvGS: Modeling View-Dependent Appearance with Environment Gaussian

Tao Xie, Xi Chen, Zhen Xu et al.

CVPR 2025arXiv:2412.15215
17
citations
#1924

HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models

Mengcheng Li, Hongwen Zhang, Yuxiang Zhang et al.

CVPR 2024highlightarXiv:2406.01334
17
citations
#1925

From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior

Jaeho Moon, Juan Luis Gonzalez Bello, Byeongjun Kwon et al.

CVPR 2024arXiv:2312.10118
17
citations
#1926

Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models

Kota Sueyoshi, Takashi Matsubara

CVPR 2024highlightarXiv:2311.16117
17
citations
#1927

C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction

Yiqun Lin, Jiewen Yang, hualiang wang et al.

CVPR 2024arXiv:2406.03902
17
citations
#1928

Lane2Seq: Towards Unified Lane Detection via Sequence Generation

Kunyang Zhou

CVPR 2024arXiv:2402.17172
17
citations
#1929

A Simple Data Augmentation for Feature Distribution Skewed Federated Learning

Yunlu Yan, Huazhu Fu, Yuexiang Li et al.

CVPR 2025arXiv:2306.09363
17
citations
#1930

UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior

I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu et al.

CVPR 2025highlightarXiv:2501.13134
17
citations
#1931

Condition-Aware Neural Network for Controlled Image Generation

Han Cai, Muyang Li, Qinsheng Zhang et al.

CVPR 2024arXiv:2404.01143
17
citations
#1932

A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals

Jiangnan Tang, Jingya Wang, Kaiyang Ji et al.

CVPR 2024arXiv:2404.04890
17
citations
#1933

Differentiable Information Bottleneck for Deterministic Multi-view Clustering

Xiaoqiang Yan, Zhixiang Jin, Fengshou Han et al.

CVPR 2024arXiv:2403.15681
17
citations
#1934

Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation

Fahimeh Hosseini Noohdani, Parsa Hosseini, Aryan Yazdan Parast et al.

CVPR 2024arXiv:2402.18919
17
citations
#1935

SfmCAD: Unsupervised CAD Reconstruction by Learning Sketch-based Feature Modeling Operations

Pu Li, Jianwei Guo, HUIBIN LI et al.

CVPR 2024
17
citations
#1936

QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition

Xiang Li, Jinglu Wang, Xiaohao Xu et al.

CVPR 2024arXiv:2310.00132
17
citations
#1937

Omni-ID: Holistic Identity Representation Designed for Generative Tasks

Guocheng Qian, Kuan-Chieh Wang, Or Patashnik et al.

CVPR 2025arXiv:2412.09694
17
citations
#1938

Adaptive VIO: Deep Visual-Inertial Odometry with Online Continual Learning

Youqi Pan, Wugen Zhou, Yingdian Cao et al.

CVPR 2024arXiv:2405.16754
17
citations
#1939

MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors

He Zhang, Shenghao Ren, Haolei Yuan et al.

CVPR 2024arXiv:2403.17610
17
citations
#1940

AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment

Yan Li, Yifei Xing, Xiangyuan Lan et al.

CVPR 2025arXiv:2412.00833
17
citations
#1941

Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation

Zhuoman Liu, Weicai Ye, Yan Luximon et al.

CVPR 2025arXiv:2411.14423
17
citations
#1942

Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment

Alireza Ganjdanesh, Shangqian Gao, Heng Huang

CVPR 2024arXiv:2403.19490
17
citations
#1943

DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data

Qihao Liu, Yi Zhang, Song Bai et al.

CVPR 2024arXiv:2406.04322
17
citations
#1944

Systematic Comparison of Semi-supervised and Self-supervised Learning for Medical Image Classification

Zhe Huang, Ruijie Jiang, Shuchin Aeron et al.

CVPR 2024arXiv:2307.08919
17
citations
#1945

VideoDirector: Precise Video Editing via Text-to-Video Models

Yukun Wang, Longguang Wang, Zhiyuan Ma et al.

CVPR 2025arXiv:2411.17592
17
citations
#1946

Object Pose Estimation via the Aggregation of Diffusion Features

Tianfu Wang, Guosheng Hu, Hongguang Wang

CVPR 2024highlightarXiv:2403.18791
17
citations
#1947

Patient-Level Anatomy Meets Scanning-Level Physics: Personalized Federated Low-Dose CT Denoising Empowered by Large Language Model

Ziyuan Yang, Yingyu Chen, Zhiwen Wang et al.

CVPR 2025arXiv:2503.00908
17
citations
#1948

JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration

yunlong lin, Zixu Lin, Haoyu Chen et al.

CVPR 2025arXiv:2504.04158
17
citations
#1949

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Jiuhai Chen, Jianwei Yang, Haiping Wu et al.

CVPR 2025arXiv:2412.04424
16
citations
#1950

NeRF Director: Revisiting View Selection in Neural Volume Rendering

Wenhui Xiao, Rodrigo Santa Cruz, David Ahmedt-Aristizabal et al.

CVPR 2024arXiv:2406.08839
16
citations
#1951

SURE: SUrvey REcipes for building reliable and robust deep networks

Yuting Li, Yingyi Chen, Xuanlong Yu et al.

CVPR 2024arXiv:2403.00543
16
citations
#1952

Adapters Strike Back

Jan-Martin Steitz, Stefan Roth

CVPR 2024arXiv:2406.06820
16
citations
#1953

KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

Yu Wang, Xin Li, Shengzhao Wen et al.

CVPR 2024arXiv:2211.08071
16
citations
#1954

MLLM-as-a-Judge for Image Safety without Human Labeling

Zhenting Wang, Shuming Hu, Shiyu Zhao et al.

CVPR 2025highlightarXiv:2501.00192
16
citations
#1955

UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping

Wenbo Wang, Fangyun Wei, Lei Zhou et al.

CVPR 2025arXiv:2412.02699
16
citations
#1956

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

Junyi Wu, Bin Duan, Weitai Kang et al.

CVPR 2024arXiv:2403.14552
16
citations
#1957

FrugalNeRF: Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors

Chin-Yang Lin, Chung-Ho Wu, Changhan Yeh et al.

CVPR 2025arXiv:2410.16271
16
citations
#1958

MaGGIe: Masked Guided Gradual Human Instance Matting

Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava et al.

CVPR 2024arXiv:2404.16035
16
citations
#1959

MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction

Wenyuan Zhang, Yixiao Yang, Han Huang et al.

CVPR 2025arXiv:2503.18363
16
citations
#1960

SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

Sumit Chaturvedi, Mengwei Ren, Yannick Hold-Geoffroy et al.

CVPR 2025arXiv:2501.09756
16
citations
#1961

Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction

Shiyu Zhao, Zhenting Wang, Felix Juefei-Xu et al.

CVPR 2025arXiv:2412.00556
16
citations
#1962

PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling

Xiaoyun Zheng, Liwei Liao, Xufeng Li et al.

CVPR 2024arXiv:2403.16080
16
citations
#1963

Multiple View Geometry Transformers for 3D Human Pose Estimation

Ziwei Liao, jialiang zhu, Chunyu Wang et al.

CVPR 2024arXiv:2311.10983
16
citations
#1964

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing

Gaoxiang Cong, Jiadong Pan, Liang Li et al.

CVPR 2025highlightarXiv:2412.08988
16
citations
#1965

Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors

Weilong Yan, Ming Li, Li Haipeng et al.

CVPR 2025arXiv:2503.20211
16
citations
#1966

RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark

Xin Zhang, Xue Yang, Yuxuan Li et al.

CVPR 2025arXiv:2501.04440
16
citations
#1967

Progressive Divide-and-Conquer via Subsampling Decomposition for Accelerated MRI

Chong Wang, Lanqing Guo, Yufei Wang et al.

CVPR 2024highlightarXiv:2403.10064
16
citations
#1968

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

Chen Duan, Pei Fu, Shan Guo et al.

CVPR 2024arXiv:2403.00303
16
citations
#1969

Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment

Johannes Schusterbauer, Ming Gui, Frank Fundel et al.

CVPR 2025arXiv:2506.02221
16
citations
#1970

Real-Time Simulated Avatar from Head-Mounted Sensors

Zhengyi Luo, Jinkun Cao, Rawal Khirodkar et al.

CVPR 2024highlightarXiv:2403.06862
16
citations
#1971

Quantization without Tears

Minghao Fu, Hao Yu, Jie Shao et al.

CVPR 2025arXiv:2411.13918
16
citations
#1972

Gaussian Shadow Casting for Neural Characters

Luis Bolanos, Shih-Yang Su, Helge Rhodin

CVPR 2024arXiv:2401.06116
16
citations
#1973

Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes

Gaurav Shrivastava, Abhinav Shrivastava

CVPR 2024
16
citations
#1974

ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object

Zhe Shan, Yang Liu, Lei Zhou et al.

CVPR 2025arXiv:2503.12006
16
citations
#1975

DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling

Xin Xie, Dong Gong

CVPR 2025arXiv:2412.00759
16
citations
#1976

Global-Local Tree Search in VLMs for 3D Indoor Scene Generation

Wei Deng, Mengshi Qi, Huadong Ma

CVPR 2025arXiv:2503.18476
16
citations
#1977

Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching

Lennart Bastian, Yizheng Xie, Nassir Navab et al.

CVPR 2024arXiv:2312.03678
16
citations
#1978

Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model

Leheng Zhang, Weiyi You, Kexuan Shi et al.

CVPR 2025arXiv:2503.18512
16
citations
#1979

Interactive3D: Create What You Want by Interactive 3D Generation

Shaocong Dong, Lihe Ding, Zhanpeng Huang et al.

CVPR 2024arXiv:2404.16510
16
citations
#1980

Diversified and Personalized Multi-rater Medical Image Segmentation

Yicheng Wu, Xiangde Luo, Zhe Xu et al.

CVPR 2024highlightarXiv:2403.13417
16
citations
#1981

Reversible Decoupling Network for Single Image Reflection Removal

Hao Zhao, Mingjia Li, Qiming Hu et al.

CVPR 2025arXiv:2410.08063
16
citations
#1982

Commonsense Prototype for Outdoor Unsupervised 3D Object Detection

Hai Wu, Shijia Zhao, Xun Huang et al.

CVPR 2024arXiv:2404.16493
16
citations
#1983

LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network

Hao Yang, Liyuan Pan, Yan Yang et al.

CVPR 2024arXiv:2307.09815
16
citations
#1984

Programmable Motion Generation for Open-Set Motion Control Tasks

Hanchao Liu, Xiaohang Zhan, Shaoli Huang et al.

CVPR 2024highlightarXiv:2405.19283
16
citations
#1985

Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts

Feng Liang, Haoyu Ma, Zecheng He et al.

CVPR 2025arXiv:2502.07802
16
citations
#1986

Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes

Zhiyuan Yu, Zheng Qin, lintao zheng et al.

CVPR 2024arXiv:2404.04557
16
citations
#1987

EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion

Haotian Wang, Yuzhe Weng, Yueyan Li et al.

CVPR 2025arXiv:2411.16726
16
citations
#1988

Iterated Learning Improves Compositionality in Large Vision-Language Models

Chenhao Zheng, Jieyu Zhang, Aniruddha Kembhavi et al.

CVPR 2024arXiv:2404.02145
16
citations
#1989

CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI

Siyuan Cheng, Lingjuan Lyu, Zhenting Wang et al.

CVPR 2025arXiv:2503.18286
16
citations
#1990

What How and When Should Object Detectors Update in Continually Changing Test Domains?

Jayeon Yoo, Dongkwan Lee, Inseop Chung et al.

CVPR 2024arXiv:2312.08875
16
citations
#1991

Revisiting MAE Pre-training for 3D Medical Image Segmentation

Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko et al.

CVPR 2025highlightarXiv:2410.23132
16
citations
#1992

Mimir: Improving Video Diffusion Models for Precise Text Understanding

Shuai Tan, Biao Gong, Yutong Feng et al.

CVPR 2025arXiv:2412.03085
16
citations
#1993

Plug-and-Play Diffusion Distillation

Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte et al.

CVPR 2024arXiv:2406.01954
16
citations
#1994

Frozen Feature Augmentation for Few-Shot Image Classification

Andreas Bär, Neil Houlsby, Mostafa Dehghani et al.

CVPR 2024arXiv:2403.10519
16
citations
#1995

Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?

Aritra Dutta, Srijan Das, Jacob Nielsen et al.

CVPR 2024arXiv:2312.04548
16
citations
#1996

MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors

Fanqi Pu, Yifan Wang, Jiru Deng et al.

CVPR 2025arXiv:2410.19590
16
citations
#1997

Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models

Gianni Franchi, Olivier Laurent, Maxence Leguéry et al.

CVPR 2024arXiv:2312.15297
16
citations
#1998

Efficient Meshflow and Optical Flow Estimation from Event Cameras

Xinglong Luo, Ao Luo, Zhengning Wang et al.

CVPR 2024
16
citations
#1999

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

Yixuan Zhu, Ao Li, Yansong Tang et al.

CVPR 2024arXiv:2404.01424
16
citations
#2000

Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

Tariq Berrada, Jakob Verbeek, camille couprie et al.

CVPR 2024arXiv:2312.13314
16
citations