Most Cited CVPR "asymmetrical reciprocity learning" Papers

5,589 papers found • Page 6 of 28

#1001

GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

Xiao Chen, Quanyi Li, Tai Wang et al.

CVPR 2024arXiv:2402.16174
35
citations
#1002

MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant

Chenlu Zhan, Gaoang Wang, Yu LIN et al.

CVPR 2024arXiv:2403.04290
35
citations
#1003

Collaborating Foundation Models for Domain Generalized Semantic Segmentation

Yasser Benigmim, Subhankar Roy, Slim Essid et al.

CVPR 2024arXiv:2312.09788
35
citations
#1004

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Yunhao Ge, Xiaohui Zeng, Jacob Huffman et al.

CVPR 2024arXiv:2404.19752
35
citations
#1005

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Hao Li, Changyao TIAN, Jie Shao et al.

CVPR 2025arXiv:2412.09604
35
citations
#1006

ICP-Flow: LiDAR Scene Flow Estimation with ICP

Yancong Lin, Holger Caesar

CVPR 2024arXiv:2402.17351
35
citations
#1007

CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation

Lingjun Zhao, Jingyu Song, Katherine Skinner

CVPR 2024arXiv:2403.19104
35
citations
#1008

Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

Hao Li, Ying Chen, Yifei Chen et al.

CVPR 2024arXiv:2402.19326
35
citations
#1009

Do Vision and Language Encoders Represent the World Similarly?

Mayug Maniparambil, Raiymbek Akshulakov, YASSER ABDELAZIZ DAHOU DJILALI et al.

CVPR 2024arXiv:2401.05224
35
citations
#1010

Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture

Fei Wang, Dan Guo, Kun Li et al.

CVPR 2024arXiv:2403.07347
35
citations
#1011

Alchemist: Parametric Control of Material Properties with Diffusion Models

Prafull Sharma, Varun Jampani, Yuanzhen Li et al.

CVPR 2024arXiv:2312.02970
35
citations
#1012

Open-World Human-Object Interaction Detection via Multi-modal Prompts

Jie Yang, Bingliang Li, Ailing Zeng et al.

CVPR 2024arXiv:2406.07221
35
citations
#1013

LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis

Zehan Zheng, Fan Lu, Weiyi Xue et al.

CVPR 2024arXiv:2404.02742
35
citations
#1014

Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences

Axel Barroso-Laguna, Sowmya Munukutla, Victor Adrian Prisacariu et al.

CVPR 2024arXiv:2404.06337
35
citations
#1015

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Enshen Zhou, Qi Su, Cheng Chi et al.

CVPR 2025arXiv:2412.04455
35
citations
#1016

SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers

Ioannis Kakogeorgiou, Spyros Gidaris, Konstantinos Karantzalos et al.

CVPR 2024highlightarXiv:2312.00648
35
citations
#1017

MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots

Tianchen Deng, Guole Shen, Chen Xun et al.

CVPR 2025
35
citations
#1018

Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.

CVPR 2025arXiv:2503.17316
35
citations
#1019

3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning

Yuncong Yang, Han Yang, Jiachen Zhou et al.

CVPR 2025arXiv:2411.17735
35
citations
#1020

HRVDA: High-Resolution Visual Document Assistant

Chaohu Liu, Kun Yin, Haoyu Cao et al.

CVPR 2024arXiv:2404.06918
35
citations
#1021

LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content

Qihao Zhao, Yalun Dai, Hao Li et al.

CVPR 2024arXiv:2403.05854
35
citations
#1022

AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities

Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.

CVPR 2025highlightarXiv:2412.14123
35
citations
#1023

How Far Can We Compress Instant-NGP-Based NeRF?

Yihang Chen, Qianyi Wu, Mehrtash Harandi et al.

CVPR 2024arXiv:2406.04101
34
citations
#1024

MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning

Zhe Li, Laurence Yang, Bocheng Ren et al.

CVPR 2024arXiv:2402.02045
34
citations
#1025

Learning Object State Changes in Videos: An Open-World Perspective

Zihui Xue, Kumar Ashutosh, Kristen Grauman

CVPR 2024arXiv:2312.11782
34
citations
#1026

AutoAD III: The Prequel – Back to the Pixels

Tengda Han, Max Bain, Arsha Nagrani et al.

CVPR 2024arXiv:2404.14412
34
citations
#1027

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

Ailin Deng, Tri Cao, Zhirui Chen et al.

CVPR 2025arXiv:2503.02199
34
citations
#1028

UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather

Haimei Zhao, Jing Zhang, Zhuo Chen et al.

CVPR 2024arXiv:2404.05145
34
citations
#1029

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Zongjian Li, Bin Lin, Yang Ye et al.

CVPR 2025arXiv:2411.17459
34
citations
#1030

Improved Implicit Neural Representation with Fourier Reparameterized Training

Kexuan Shi, Xingyu Zhou, Shuhang Gu

CVPR 2024arXiv:2401.07402
34
citations
#1031

FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment

Jinglin Xu, Sibo Yin, Guohao Zhao et al.

CVPR 2024arXiv:2405.06887
34
citations
#1032

Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning

Siteng Huang, Biao Gong, Yutong Feng et al.

CVPR 2024arXiv:2303.15230
34
citations
#1033

Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations

Chenyu You, Yifei Min, Weicheng Dai et al.

CVPR 2024arXiv:2403.07241
34
citations
#1034

CoGS: Controllable Gaussian Splatting

Heng Yu, Joel Julin, Zoltán Á. Milacski et al.

CVPR 2024arXiv:2312.05664
34
citations
#1035

Generative Gaussian Splatting for Unbounded 3D City Generation

Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.

CVPR 2025arXiv:2406.06526
34
citations
#1036

One Diffusion to Generate Them All

Duong H. Le, Tuan Pham, Sangho Lee et al.

CVPR 2025arXiv:2411.16318
34
citations
#1037

AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving

Mingfu Liang, Jong-Chyi Su, Samuel Schulter et al.

CVPR 2024arXiv:2403.17373
34
citations
#1038

MonoNPHM: Dynamic Head Reconstruction from Monocular Videos

Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos et al.

CVPR 2024highlightarXiv:2312.06740
34
citations
#1039

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

Chengjian Feng, Yujie Zhong, Zequn Jie et al.

CVPR 2024arXiv:2402.05937
34
citations
#1040

Relightable and Animatable Neural Avatar from Sparse-View Video

Zhen Xu, Sida Peng, Chen Geng et al.

CVPR 2024highlightarXiv:2308.07903
34
citations
#1041

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

Tiantian Geng, Jinrui Zhang, Qingni Wang et al.

CVPR 2025arXiv:2411.19772
34
citations
#1042

Three Pillars Improving Vision Foundation Model Distillation for Lidar

Gilles Puy, Spyros Gidaris, Alexandre Boulch et al.

CVPR 2024arXiv:2310.17504
34
citations
#1043

Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail

Luca Bartolomei, Fabio Tosi, Matteo Poggi et al.

CVPR 2025arXiv:2412.04472
34
citations
#1044

StarVector: Generating Scalable Vector Graphics Code from Images and Text

Juan Rodriguez, Abhay Puri, Shubham Agarwal et al.

CVPR 2025arXiv:2312.11556
34
citations
#1045

Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation

Yuanhong Chen, Yuyuan Liu, Hu Wang et al.

CVPR 2024arXiv:2304.02970
34
citations
#1046

CoralSCOP: Segment any COral Image on this Planet

Zheng Ziqiang, Liang Haixin, Binh-Son Hua et al.

CVPR 2024highlight
34
citations
#1047

On the Content Bias in Fréchet Video Distance

Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar et al.

CVPR 2024arXiv:2404.12391
34
citations
#1048

High-fidelity Person-centric Subject-to-Image Synthesis

Yibin Wang, Weizhong Zhang, Jianwei Zheng et al.

CVPR 2024arXiv:2311.10329
34
citations
#1049

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts

Jialin Wu, Xia Hu, Yaqing Wang et al.

CVPR 2024highlightarXiv:2312.00968
34
citations
#1050

Simple Semantic-Aided Few-Shot Learning

Hai Zhang, Junzhe Xu, Shanlin Jiang et al.

CVPR 2024arXiv:2311.18649
34
citations
#1051

PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models

Fei Deng, Qifei Wang, Wei Wei et al.

CVPR 2024arXiv:2402.08714
34
citations
#1052

ExtDM: Distribution Extrapolation Diffusion Model for Video Prediction

Zhicheng Zhang, Junyao Hu, Wentao Cheng et al.

CVPR 2024
34
citations
#1053

Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention

Xingyu Zhou, Leheng Zhang, Xiaorui Zhao et al.

CVPR 2024arXiv:2401.06312
34
citations
#1054

Active Generalized Category Discovery

Shijie Ma, Fei Zhu, Zhun Zhong et al.

CVPR 2024arXiv:2403.04272
34
citations
#1055

Revisiting Adversarial Training at Scale

Zeyu Wang, Xianhang li, Hongru Zhu et al.

CVPR 2024arXiv:2401.04727
34
citations
#1056

REACTO: Reconstructing Articulated Objects from a Single Video

Chaoyue Song, Jiacheng Wei, Chuan-Sheng Foo et al.

CVPR 2024arXiv:2404.11151
34
citations
#1057

On the Scalability of Diffusion-based Text-to-Image Generation

Hao Li, Yang Zou, Ying Wang et al.

CVPR 2024arXiv:2404.02883
34
citations
#1058

OpenStreetView-5M: The Many Roads to Global Visual Geolocation

Guillaume Astruc, Nicolas Dufour, Ioannis Siglidis et al.

CVPR 2024arXiv:2404.18873
34
citations
#1059

Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models

Peifei Zhu, Tsubasa Takahashi, Hirokatsu Kataoka

CVPR 2024arXiv:2404.09401
34
citations
#1060

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

Wei Li, Bing Hu, Rui Shao et al.

CVPR 2025arXiv:2503.03663
33
citations
#1061

Language-driven Grasp Detection

An Dinh Vuong, Minh Nhat VU, Baoru Huang et al.

CVPR 2024arXiv:2406.09489
33
citations
#1062

RoHM: Robust Human Motion Reconstruction via Diffusion

Siwei Zhang, Bharat Lal Bhatnagar, Yuanlu Xu et al.

CVPR 2024arXiv:2401.08570
33
citations
#1063

Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

Xiangheng Shan, Dongyue Wu, Guilin Zhu et al.

CVPR 2024arXiv:2406.09829
33
citations
#1064

Empowering LLMs to Understand and Generate Complex Vector Graphics

XiMing Xing, Juncheng Hu, Guotao Liang et al.

CVPR 2025arXiv:2412.11102
33
citations
#1065

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Zhongwei Ren, Yunchao Wei, Xun Guo et al.

CVPR 2025arXiv:2501.09781
33
citations
#1066

DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion

Jinyuan Liu, Bowei Zhang, Qingyun Mei et al.

CVPR 2025arXiv:2503.17673
33
citations
#1067

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

Zixuan Huang, Mark Boss, Aaryaman Vasishta et al.

CVPR 2025arXiv:2501.04689
33
citations
#1068

World-consistent Video Diffusion with Explicit 3D Modeling

Qihang Zhang, Shuangfei Zhai, Miguel Ángel Bautista et al.

CVPR 2025highlightarXiv:2412.01821
33
citations
#1069

CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment

Sajid Javed, Arif Mahmood, IYYAKUTTI IYAPPAN GANAPATHI et al.

CVPR 2024arXiv:2406.05205
33
citations
#1070

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Hyeonho Jeong, Chun-Hao P. Huang, Jong Chul Ye et al.

CVPR 2025arXiv:2412.06016
33
citations
#1071

G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis

Yufei Ye, Abhinav Gupta, Kris Kitani et al.

CVPR 2024arXiv:2404.12383
33
citations
#1072

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Di Zhang, Jingdi Lei, Junxian Li et al.

CVPR 2025arXiv:2411.18203
33
citations
#1073

EgoLife: Towards Egocentric Life Assistant

Jingkang Yang, Shuai Liu, Hongming Guo et al.

CVPR 2025arXiv:2503.03803
33
citations
#1074

MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts

Peijie Wang, Zhong-Zhi Li, Fei Yin et al.

CVPR 2025arXiv:2502.20808
33
citations
#1075

MAS: Multi-view Ancestral Sampling for 3D Motion Generation Using 2D Diffusion

Roy Kapon, Guy Tevet, Daniel Cohen-Or et al.

CVPR 2024arXiv:2310.14729
33
citations
#1076

3D-HGS: 3D Half-Gaussian Splatting

Haolin Li, Jinyang Liu, Mario Sznaier et al.

CVPR 2025arXiv:2406.02720
33
citations
#1077

StyleMaster: Stylize Your Video with Artistic Generation and Translation

Zixuan Ye, Huijuan Huang, Xintao Wang et al.

CVPR 2025arXiv:2412.07744
33
citations
#1078

Relaxed Contrastive Learning for Federated Learning

Seonguk Seo, Jinkyu Kim, Geeho Kim et al.

CVPR 2024arXiv:2401.04928
33
citations
#1079

Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution

Zhikai Chen, Fuchen Long, Zhaofan Qiu et al.

CVPR 2024arXiv:2403.17000
33
citations
#1080

Rethinking Generalizable Face Anti-spoofing via Hierarchical Prototype-guided Distribution Refinement in Hyperbolic Space

Chengyang Hu, Ke-Yue Zhang, Taiping Yao et al.

CVPR 2024highlight
33
citations
#1081

ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More

Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu et al.

CVPR 2024highlightarXiv:2403.12534
33
citations
#1082

Privacy-Preserving Face Recognition Using Trainable Feature Subtraction

Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.

CVPR 2024arXiv:2403.12457
33
citations
#1083

VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding

Chaoyu Li, Eun Woo Im, Pooyan Fazli

CVPR 2025arXiv:2412.03735
33
citations
#1084

ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis

Muhammad Hamza Mughal, Rishabh Dabral, Ikhsanul Habibie et al.

CVPR 2024arXiv:2403.17936
33
citations
#1085

Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation

Wenxiao Deng, Wenbin Li, Tianyu Ding et al.

CVPR 2024arXiv:2404.00563
33
citations
#1086

HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces

Haithem Turki, Vasu Agrawal, Samuel Rota Bulò et al.

CVPR 2024highlightarXiv:2312.03160
33
citations
#1087

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Lital Binyamin, Yoad Tewel, Hilit Segev et al.

CVPR 2025arXiv:2406.10210
33
citations
#1088

PartGen: Part-level 3D Generation and Reconstruction with Multi-view Diffusion Models

Minghao Chen, Roman Shapovalov, Iro Laina et al.

CVPR 2025highlightarXiv:2412.18608
33
citations
#1089

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

Jieming Cui, Tengyu Liu, Nian Liu et al.

CVPR 2024arXiv:2403.12835
33
citations
#1090

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

Shaobo Wang, Yicun Yang, Zhiyuan Liu et al.

CVPR 2025highlightarXiv:2502.20653
33
citations
#1091

Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning

Dipam Goswami, Albin Soutif, Yuyang Liu et al.

CVPR 2024arXiv:2405.19074
33
citations
#1092

Segment and Caption Anything

Xiaoke Huang, Jianfeng Wang, Yansong Tang et al.

CVPR 2024arXiv:2312.00869
33
citations
#1093

SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

Andong Wang, Bo Wu, Sunli Chen et al.

CVPR 2024arXiv:2405.09713
33
citations
#1094

Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors

Yu Zhang, Songpengcheng Xia, Lei Chu et al.

CVPR 2024arXiv:2312.02196
33
citations
#1095

GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs

Mustafa Munir, William Avery, Md Mostafijur Rahman et al.

CVPR 2024arXiv:2405.06849
32
citations
#1096

A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution

Zhixiong Yang, Jingyuan Xia, Shengxi Li et al.

CVPR 2024arXiv:2404.15620
32
citations
#1097

Transductive Zero-Shot and Few-Shot CLIP

Ségolène Martin, Yunshi HUANG, Fereshteh Shakeri et al.

CVPR 2024highlightarXiv:2405.18437
32
citations
#1098

CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection

Mikhail Kennerley, Jian-Gang Wang, Bharadwaj Veeravalli et al.

CVPR 2024arXiv:2403.19278
32
citations
#1099

Cross-modal Information Flow in Multimodal Large Language Models

Zhi Zhang, Srishti Yadav, Fengze Han et al.

CVPR 2025arXiv:2411.18620
32
citations
#1100

WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments

Jianhao Zheng, Zihan Zhu, Valentin Bieri et al.

CVPR 2025arXiv:2504.03886
32
citations
#1101

Inversion-Free Image Editing with Language-Guided Diffusion Models

Sihan Xu, Yidong Huang, Jiayi Pan et al.

CVPR 2024
32
citations
#1102

Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning

Huiyi Wang, Haodong Lu, Lina Yao et al.

CVPR 2025arXiv:2403.18886
32
citations
#1103

AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture Search

Junghyup Lee, Bumsub Ham

CVPR 2024arXiv:2403.19232
32
citations
#1104

Diffusion Renderer: Neural Inverse and Forward Rendering with Video Diffusion Models

Ruofan Liang, Žan Gojčič, Huan Ling et al.

CVPR 2025
32
citations
#1105

Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion

Fan Zhang, Shaodi You, Yu Li et al.

CVPR 2024highlightarXiv:2312.12471
32
citations
#1106

SnAG: Scalable and Accurate Video Grounding

Fangzhou Mu, Sicheng Mo, Yin Li

CVPR 2024arXiv:2404.02257
32
citations
#1107

Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM

Pingping Zhang, Tianyu Yan, Yang Liu et al.

CVPR 2024highlightarXiv:2404.04996
32
citations
#1108

Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

Renshuai Liu, Bowen Ma, Wei Zhang et al.

CVPR 2024highlightarXiv:2401.01207
32
citations
#1109

Sieve: Multimodal Dataset Pruning using Image Captioning Models

Anas Mahmoud, Mostafa Elhoushi, Amro Abbas et al.

CVPR 2024arXiv:2310.02110
32
citations
#1110

OmniViD: A Generative Framework for Universal Video Understanding

Junke Wang, Dongdong Chen, Chong Luo et al.

CVPR 2024arXiv:2403.17935
32
citations
#1111

Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models

Zhejun Zhang, Peter Karkus, Maximilian Igl et al.

CVPR 2025arXiv:2412.05334
32
citations
#1112

Plug and Play Active Learning for Object Detection

Chenhongyi Yang, Lichao Huang, Elliot Crowley

CVPR 2024arXiv:2211.11612
32
citations
#1113

Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation

Xiao Lin, Wenfei Yang, Yuan Gao et al.

CVPR 2024arXiv:2403.19527
32
citations
#1114

Towards Transferable Targeted 3D Adversarial Attack in the Physical World

Yao Huang, Yinpeng Dong, Shouwei Ruan et al.

CVPR 2024arXiv:2312.09558
32
citations
#1115

DarkIR: Robust Low-Light Image Restoration

Daniel Feijoo, Juan C. Benito, Alvaro Garcia et al.

CVPR 2025arXiv:2412.13443
32
citations
#1116

Color Shift Estimation-and-Correction for Image Enhancement

Yiyu Li, Ke Xu, Gerhard Hancke et al.

CVPR 2024arXiv:2405.17725
32
citations
#1117

ViT-Lens: Towards Omni-modal Representations

Stan Weixian Lei, Yixiao Ge, Kun Yi et al.

CVPR 2024arXiv:2311.16081
32
citations
#1118

Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation

Zhiwei Yang, Kexue Fu, Minghong Duan et al.

CVPR 2024arXiv:2402.18467
32
citations
#1119

Physical Property Understanding from Language-Embedded Feature Fields

Albert J. Zhai, Yuan Shen, Emily Y. Chen et al.

CVPR 2024arXiv:2404.04242
32
citations
#1120

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.

CVPR 2025arXiv:2406.04321
32
citations
#1121

A Simple Recipe for Language-guided Domain Generalized Segmentation

Mohammad Fahes, TUAN-HUNG VU, Andrei Bursuc et al.

CVPR 2024arXiv:2311.17922
32
citations
#1122

SpecNeRF: Gaussian Directional Encoding for Specular Reflections

Li Ma, Vasu Agrawal, Haithem Turki et al.

CVPR 2024highlightarXiv:2312.13102
32
citations
#1123

MonSter: Marry Monodepth to Stereo Unleashes Power

JunDa Cheng, Longliang Liu, Gangwei Xu et al.

CVPR 2025highlight
32
citations
#1124

Complexity Experts are Task-Discriminative Learners for Any Image Restoration

Eduard Zamfir, Zongwei Wu, Nancy Mehta et al.

CVPR 2025arXiv:2411.18466
32
citations
#1125

Towards Generalizable Multi-Object Tracking

Zheng Qin, Le Wang, Sanping Zhou et al.

CVPR 2024arXiv:2406.00429
32
citations
#1126

Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation

Xiaoyang Wang, Huihui Bai, Limin Yu et al.

CVPR 2024arXiv:2403.06462
32
citations
#1127

Single Domain Generalization for Crowd Counting

Zhuoxuan Peng, S.-H. Gary Chan

CVPR 2024arXiv:2403.09124
32
citations
#1128

HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D

Sangmin Woo, byeongjun park, Hyojun Go et al.

CVPR 2024arXiv:2312.15980
32
citations
#1129

VOODOO 3D: Volumetric Portrait Disentanglement For One-Shot 3D Head Reenactment

Phong Tran, Egor Zakharov, Long Nhat Ho et al.

CVPR 2024arXiv:2312.04651
32
citations
#1130

TokenCompose: Text-to-Image Diffusion with Token-level Supervision

Zirui Wang, Zhizhou Sha, Zheng Ding et al.

CVPR 2024arXiv:2312.03626
32
citations
#1131

LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation

Kibum Kim, Kanghoon Yoon, Jaehyeong Jeon et al.

CVPR 2024arXiv:2310.10404
32
citations
#1132

Material Palette: Extraction of Materials from a Single Image

Ivan Lopes, Fabio Pizzati, Raoul de Charette

CVPR 2024arXiv:2311.17060
32
citations
#1133

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis

Yuchao Gu, Xintao Wang, Yixiao Ge et al.

CVPR 2024arXiv:2212.03185
32
citations
#1134

LT3SD: Latent Trees for 3D Scene Diffusion

Quan Meng, Lei Li, Matthias Nießner et al.

CVPR 2025arXiv:2409.08215
31
citations
#1135

AV-RIR: Audio-Visual Room Impulse Response Estimation

Anton Ratnarajah, Sreyan Ghosh, Sonal Kumar et al.

CVPR 2024arXiv:2312.00834
31
citations
#1136

Unified Language-driven Zero-shot Domain Adaptation

Senqiao Yang, Zhuotao Tian, Li Jiang et al.

CVPR 2024arXiv:2404.07155
31
citations
#1137

UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence

Ruihai Wu, Haoran Lu, Yiyan Wang et al.

CVPR 2024arXiv:2405.06903
31
citations
#1138

Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling

Shentong Mo, Pedro Morgado

CVPR 2024arXiv:2312.01017
31
citations
#1139

IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing

Chun Gu, Xiaofei Wei, Zixuan Zeng et al.

CVPR 2025arXiv:2412.15867
31
citations
#1140

Modular Blind Video Quality Assessment

Wen Wen, Mu Li, Yabin ZHANG et al.

CVPR 2024arXiv:2402.19276
31
citations
#1141

VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation

Hanzhi Chen, Boyang Sun, Anran Zhang et al.

CVPR 2025arXiv:2503.07135
31
citations
#1142

Think Twice Before Selection: Federated Evidential Active Learning for Medical Image Analysis with Domain Shifts

Jiayi Chen, Benteng Ma, Hengfei Cui et al.

CVPR 2024arXiv:2312.02567
31
citations
#1143

Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model

Runmin Dong, Shuai Yuan, Bin Luo et al.

CVPR 2024arXiv:2403.17460
31
citations
#1144

Contextrast: Contextual Contrastive Learning for Semantic Segmentation

Changki Sung, Wanhee Kim, Jungho An et al.

CVPR 2024arXiv:2404.10633
31
citations
#1145

VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Shehan Munasinghe, Hanan Gani, Wenqi Zhu et al.

CVPR 2025arXiv:2411.04923
31
citations
#1146

NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging

Takahiro Shirakawa, Seiichi Uchida

CVPR 2024arXiv:2403.03485
31
citations
#1147

Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection

Jiaming Li, Jiacheng Zhang, Jichang Li et al.

CVPR 2024arXiv:2406.00510
31
citations
#1148

PhysGen3D: Crafting a Miniature Interactive World from a Single Image

Boyuan Chen, Hanxiao Jiang, Shaowei Liu et al.

CVPR 2025arXiv:2503.20746
31
citations
#1149

Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection

Wei Luo, Yunkang Cao, Haiming Yao et al.

CVPR 2025arXiv:2503.02424
31
citations
#1150

FlowIE: Efficient Image Enhancement via Rectified Flow

Yixuan Zhu, Wenliang Zhao, Ao Li et al.

CVPR 2024arXiv:2406.00508
31
citations
#1151

Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps

Octave Mariotti, Oisin Mac Aodha, Hakan Bilen

CVPR 2024arXiv:2312.13216
31
citations
#1152

It's All About Your Sketch: Democratising Sketch Control in Diffusion Models

Subhadeep Koley, Ayan Kumar Bhunia, Deeptanshu Sekhri et al.

CVPR 2024arXiv:2403.07234
31
citations
#1153

Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining

Jiahao Nie, Yun Xing, Gongjie Zhang et al.

CVPR 2024arXiv:2401.08407
31
citations
#1154

Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer

Wenqiao Zhang, Zheqi Lv

CVPR 2024arXiv:2311.12905
31
citations
#1155

Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning

xin zhang, Jiawei Du, Weiying Xie et al.

CVPR 2024arXiv:2311.13613
31
citations
#1156

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation

Luca Barsellotti, Roberto Amoroso, Marcella Cornia et al.

CVPR 2024arXiv:2404.06542
31
citations
#1157

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Jianzong Wu, Xiangtai Li, Chenyang Si et al.

CVPR 2024arXiv:2401.10226
31
citations
#1158

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

Minghan LI, Shuai Li, Xindong Zhang et al.

CVPR 2024arXiv:2402.18115
31
citations
#1159

Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering

Vivek Gopalakrishnan, Neel Dey, Polina Golland

CVPR 2024arXiv:2312.06358
31
citations
#1160

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

Haoyi Jiang, Liu Liu, Tianheng Cheng et al.

CVPR 2025arXiv:2412.13193
31
citations
#1161

Visual Agentic AI for Spatial Reasoning with a Dynamic API

Damiano Marsili, Rohun Agrawal, Yisong Yue et al.

CVPR 2025arXiv:2502.06787
31
citations
#1162

APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

Weizhao He, Yang Zhang, Wei Zhuo et al.

CVPR 2024arXiv:2406.08372
31
citations
#1163

Attention Calibration for Disentangled Text-to-Image Personalization

Yanbing Zhang, Mengping Yang, Qin Zhou et al.

CVPR 2024arXiv:2403.18551
31
citations
#1164

Diffusion-based Blind Text Image Super-Resolution

Yuzhe Zhang, jiawei zhang, Hao Li et al.

CVPR 2024arXiv:2312.08886
31
citations
#1165

PREGO: Online Mistake Detection in PRocedural EGOcentric Videos

Alessandro Flaborea, Guido M. D&amp, #x27 et al.

CVPR 2024arXiv:2404.01933
31
citations
#1166

Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation

Yunhe Gao

CVPR 2024arXiv:2306.02416
31
citations
#1167

InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion

Jihyun Lee, Shunsuke Saito, Giljoo Nam et al.

CVPR 2024arXiv:2403.17422
31
citations
#1168

Localization Is All You Evaluate: Data Leakage in Online Mapping Datasets and How to Fix It

Adam Lilja, Junsheng Fu, Erik Stenborg et al.

CVPR 2024arXiv:2312.06420
31
citations
#1169

AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark

Li Lin, Santosh Santosh, Mingyang Wu et al.

CVPR 2025arXiv:2406.00783
31
citations
#1170

PTQ4SAM: Post-Training Quantization for Segment Anything

Chengtao Lv, Hong Chen, Jinyang Guo et al.

CVPR 2024arXiv:2405.03144
31
citations
#1171

Can Biases in ImageNet Models Explain Generalization?

Paul Gavrikov, Janis Keuper

CVPR 2024arXiv:2404.01509
31
citations
#1172

Learning to Transform Dynamically for Better Adversarial Transferability

Rongyi Zhu, Zeliang Zhang, Susan Liang et al.

CVPR 2024arXiv:2405.14077
31
citations
#1173

Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework

Ziyao Huang, Fan Tang, Yong Zhang et al.

CVPR 2024arXiv:2403.16510
30
citations
#1174

Data Poisoning based Backdoor Attacks to Contrastive Learning

Jinghuai Zhang, Hongbin Liu, Jinyuan Jia et al.

CVPR 2024arXiv:2211.08229
30
citations
#1175

Point2CAD: Reverse Engineering CAD Models from 3D Point Clouds

Yujia Liu, Anton Obukhov, Jan D. Wegner et al.

CVPR 2024highlightarXiv:2312.04962
30
citations
#1176

DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes

Chensheng Peng, Chengwei Zhang, Yixiao Wang et al.

CVPR 2025arXiv:2411.11921
30
citations
#1177

Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Hongjie Wang, Difan Liu, Yan Kang et al.

CVPR 2024arXiv:2405.05252
30
citations
#1178

EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation

Chanyoung Kim, Woojung Han, Dayun Ju et al.

CVPR 2024highlightarXiv:2403.01482
30
citations
#1179

Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning

Hanxun Yu, Wentong Li, Song Wang et al.

CVPR 2025highlightarXiv:2503.00513
30
citations
#1180

A Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling

Wentao Qu, Yuantian Shao, Lingwu Meng et al.

CVPR 2024arXiv:2312.02719
30
citations
#1181

Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation

Bingfeng Zhang, Siyue Yu, Yunchao Wei et al.

CVPR 2024highlightarXiv:2406.11189
30
citations
#1182

SHAP-EDITOR: Instruction-Guided Latent 3D Editing in Seconds

Minghao Chen, Junyu Xie, Iro Laina et al.

CVPR 2024arXiv:2312.09246
30
citations
#1183

Beyond Text: Frozen Large Language Models in Visual Signal Comprehension

Lei Zhu, Fangyun Wei, Yanye Lu

CVPR 2024arXiv:2403.07874
30
citations
#1184

CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation

Kangfu Mei, Mauricio Delbracio, Hossein Talebi et al.

CVPR 2024arXiv:2310.01407
30
citations
#1185

CapHuman: Capture Your Moments in Parallel Universes

Chao Liang, Fan Ma, Linchao Zhu et al.

CVPR 2024arXiv:2402.00627
30
citations
#1186

LEOD: Label-Efficient Object Detection for Event Cameras

Ziyi Wu, Mathias Gehrig, Qing Lyu et al.

CVPR 2024arXiv:2311.17286
30
citations
#1187

FreeDrag: Feature Dragging for Reliable Point-based Image Editing

Pengyang Ling, Lin Chen, Pan Zhang et al.

CVPR 2024arXiv:2307.04684
30
citations
#1188

VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

Yang Chen, Yingwei Pan, haibo yang et al.

CVPR 2024arXiv:2403.17001
30
citations
#1189

3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes

Jan Held, Renaud Vandeghen, Abdullah J Hamdi et al.

CVPR 2025highlightarXiv:2411.14974
30
citations
#1190

K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

Ziheng Ouyang, Zhen Li, Qibin Hou

CVPR 2025arXiv:2502.18461
30
citations
#1191

Universal Segmentation at Arbitrary Granularity with Language Instruction

Yong Liu, Cairong Zhang, Yitong Wang et al.

CVPR 2024arXiv:2312.01623
30
citations
#1192

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Sheng Zhou, Junbin Xiao, Qingyun Li et al.

CVPR 2025arXiv:2502.07411
30
citations
#1193

3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features

Chenfeng Xu, Huan Ling, Sanja Fidler et al.

CVPR 2024arXiv:2311.04391
30
citations
#1194

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

Wenhao Li, Mengyuan Liu, Hong Liu et al.

CVPR 2024highlightarXiv:2311.12028
30
citations
#1195

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Jianing "Jed" Yang, Xuweiyi Chen, Nikhil Madaan et al.

CVPR 2025arXiv:2406.05132
30
citations
#1196

Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring

Chengxu Liu, Xuan Wang, Xiangyu Xu et al.

CVPR 2024arXiv:2404.13153
30
citations
#1197

EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition

Xu Zheng, Addison, Lin Wang

CVPR 2024arXiv:2403.14082
30
citations
#1198

Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions

Stefan Andreas Baumann, Felix Krause, Michael Neumayr et al.

CVPR 2025arXiv:2403.17064
30
citations
#1199

VideoCon: Robust Video-Language Alignment via Contrast Captions

Hritik Bansal, Yonatan Bitton, Idan Szpektor et al.

CVPR 2024arXiv:2311.10111
30
citations
#1200

Multi-Space Alignments Towards Universal LiDAR Segmentation

Youquan Liu, Lingdong Kong, Xiaoyang Wu et al.

CVPR 2024arXiv:2405.01538
30
citations