Most Cited CVPR &quot;action region localization&quot; Papers

CVPR 2024arXiv:2403.08436

#3002

PFStorer: Personalized Face Restoration and Super-Resolution

Tuomas Varanka, Tapani Toivonen, Soumya Tripathy et al.

CVPR 2025arXiv:2407.08027

#3003

Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images

Kazi Sajeed Mehrab, M. Maruf, Arka Daw et al.

CVPR 2024arXiv:2406.06813

#3004

Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation

Dong Zhao, Shuang Wang, Qi Zang et al.

CVPR 2025arXiv:2503.18314

#3005

LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty

Christoforos N. Spartalis, Theodoros Semertzidis, Efstratios Gavves et al.

#3006

Cross Initialization for Face Personalization of Text-to-Image Models

Lianyu Pang, Jian Yin, Haoran Xie et al.

CVPR 2025arXiv:2506.13110

#3007

GS-2DGS: Geometrically Supervised 2DGS for Reflective Object Reconstruction

Jinguang Tong, Xuesong li, Fahira Afzal Maken et al.

CVPR 2025highlightarXiv:2411.17662

#3008

RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training

Raktim Gautam Goswami, Prashanth Krishnamurthy, Yann LeCun et al.

CVPR 2025arXiv:2503.10148

#3009

3D Student Splatting and Scooping

Jialin Zhu, Jiangbei Yue, Feixiang He et al.

CVPR 2024highlightarXiv:2303.06346

#3010

3DInAction: Understanding Human Actions in 3D Point Clouds

Yizhak Ben-Shabat, Oren Shrout, Stephen Gould

CVPR 2025arXiv:2503.01291

#3011

SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance

Peishan Cong, Ziyi Wang, Yuexin Ma et al.

CVPR 2025highlightarXiv:2503.16964

#3012

DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery

Jiadong Tang, Yu Gao, Dianyi Yang et al.

CVPR 2024arXiv:2311.11845

#3013

Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields

Zhiyuan Min, Yawei Luo, Wei Yang et al.

#3014

ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models

Heng Yin, Yuqiang Ren, Ke Yan et al.

CVPR 2025arXiv:2412.09191

#3015

RAD: Region-Aware Diffusion Models for Image Inpainting

Sora Kim, Sungho Suh, Minsik Lee

CVPR 2025arXiv:2503.12009

#3016

UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection

Xin Jin, Haisheng Su, Kai Liu et al.

CVPR 2025arXiv:2411.17030

#3017

g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks

Zihan Wang, Gim Hee Lee

CVPR 2024arXiv:2403.11113

#3018

Local-consistent Transformation Learning for Rotation-invariant Point Cloud Analysis

Yiyang Chen, Lunhao Duan, Shanshan Zhao et al.

CVPR 2024arXiv:2405.03388

#3019

3D LiDAR Mapping in Dynamic Environments using a 4D Implicit Neural Representation

Xingguang Zhong, Yue Pan, Cyrill Stachniss et al.

CVPR 2024arXiv:2406.19393

#3020

Looking 3D: Anomaly Detection with 2D-3D Alignment

Ankan Kumar Bhunia, Changjian Li, Hakan Bilen

CVPR 2024arXiv:2402.18102

#3021

Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging

Bhargav Ghanekar, Salman Siddique Khan, Pranav Sharma et al.

CVPR 2024highlightarXiv:2404.03652

#3022

The More You See in 2D the More You Perceive in 3D

Xinyang Han, Zelin Gao, Angjoo Kanazawa et al.

CVPR 2025arXiv:2411.16064

#3023

Multi-Granularity Class Prototype Topology Distillation for Class-Incremental Source-Free Unsupervised Domain Adaptation

Peihua Deng, Jiehua Zhang, Xichun Sheng et al.

CVPR 2025arXiv:2312.05984

#3024

Accurate Differential Operators for Hybrid Neural Fields

Aditya Chetan, Guandao Yang, Zichen Wang et al.

CVPR 2024arXiv:2403.16141

#3025

Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes

Takashi Otonari, Satoshi Ikehata, Kiyoharu Aizawa

CVPR 2025arXiv:2501.12218

#3026

Exploring Temporally-Aware Features for Point Tracking

Inès Hyeonsu Kim, Seokju Cho, Gabriel Huang et al.

CVPR 2025highlightarXiv:2502.15011

#3027

CrossOver: 3D Scene Cross-Modal Alignment

Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys et al.

CVPR 2024arXiv:2305.11288

#3028

Riemannian Multinomial Logistics Regression for SPD Neural Networks

Ziheng Chen, Yue Song, Gaowen Liu et al.

CVPR 2024arXiv:2404.11139

#3029

GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement

Linfang Zheng, Tze Ho Elden Tse, Chen Wang et al.

CVPR 2025highlightarXiv:2503.07026

#3030

Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways

Yi Liu, Hao Zhou, Benlei Cui et al.

CVPR 2025highlightarXiv:2412.01052

#3031

CRISP: Object Pose and Shape Estimation with Test-Time Adaptation

Jingnan Shi, Rajat Talak, Harry Zhang et al.

CVPR 2025arXiv:2405.18840

#3032

Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation

Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.

CVPR 2025arXiv:2503.18595

#3033

Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition

Chengxiang Huang, Yake Wei, Zequn Yang et al.

CVPR 2025arXiv:2412.09593

#3034

Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion

Zexin He, Tengfei Wang, Xin Huang et al.

CVPR 2024arXiv:2404.01686

#3035

JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

Duy Tho Le, Chenhui Gou, Stavya Datta et al.

CVPR 2025highlightarXiv:2410.23780

#3036

Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map

Xinyuan Chang, Maixuan Xue, Xinran Liu et al.

CVPR 2024highlightarXiv:2405.03662

#3037

Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation

Dong Lao, Congli Wang, Alex Wong et al.

CVPR 2024arXiv:2403.03662

#3038

Harnessing Meta-Learning for Improving Full-Frame Video Stabilization

Muhammad Kashif Ali, Eun Woo Im, Dongjin Kim et al.

CVPR 2024arXiv:2404.01828

#3039

Defense without Forgetting: Continual Adversarial Defense with Anisotropic & Isotropic Pseudo Replay

Yuhang Zhou, Zhongyun Hua

CVPR 2025arXiv:2503.16822

#3040

RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos

Yuxin Yao, Zhi Deng, Junhui Hou

CVPR 2025arXiv:2503.20172

#3041

Guiding Human-Object Interactions with Rich Geometry and Relations

Mengqing Xue, Yifei Liu, Ling Guo et al.

CVPR 2025arXiv:2501.04336

#3042

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

Zeyi Huang, Yuyang Ji, Xiaofang Wang et al.

CVPR 2025arXiv:2503.23670

#3043

Learning Bijective Surface Parameterization for Inferring Signed Distance Functions from Sparse Point Clouds with Grid Deformation

Takeshi Noda, Chao Chen, Junsheng Zhou et al.

CVPR 2024arXiv:2405.18706

#3044

FocSAM: Delving Deeply into Focused Objects in Segmenting Anything

You Huang, Zongyu Lan, Liujuan Cao et al.

CVPR 2024arXiv:2403.00939

#3045

G3DR: Generative 3D Reconstruction in ImageNet

Pradyumna Reddy, Ismail Elezi, Jiankang Deng

CVPR 2025highlightarXiv:2410.10604

#3046

Multi-modal Vision Pre-training for Medical Image Analysis

Shaohao Rui, Lingzhi Chen, Zhenyu Tang et al.

CVPR 2024arXiv:2405.11483

#3047

MICap: A Unified Model for Identity-Aware Movie Descriptions

Haran Raajesh, Naveen Reddy Desanur, Zeeshan Khan et al.

CVPR 2024highlightarXiv:2401.01823

#3048

Detours for Navigating Instructional Videos

Kumar Ashutosh, Zihui Xue, Tushar Nagarajan et al.

CVPR 2024highlightarXiv:2312.04529

#3049

Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance

Yuto Enyo, Ko Nishino

CVPR 2025arXiv:2503.21459

#3050

RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives

Chirag Parikh, Deepti Rawat, Rakshitha R. T. et al.

CVPR 2025arXiv:2412.15200

#3051

DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation

Wang Zhao, Yan-Pei Cao, Jiale Xu et al.

CVPR 2024arXiv:2403.01231

#3052

Benchmarking Segmentation Models with Mask-Preserved Attribute Editing

Zijin Yin, Kongming Liang, Bing Li et al.

CVPR 2024arXiv:2404.01123

#3053

CLIPtone: Unsupervised Learning for Text-based Image Tone Adjustment

Hyeongmin Lee, Kyoungkook Kang, Jungseul Ok et al.

CVPR 2024arXiv:2404.09993

#3054

No More Ambiguity in 360° Room Layout via Bi-Layout Estimation

Yu-Ju Tsai, Jin-Cheng Jhang, JINGJING ZHENG et al.

CVPR 2024arXiv:2404.00385

#3055

Constrained Layout Generation with Factor Graphs

Mohammed Haroon Dupty, Yanfei Dong, Sicong Leng et al.

CVPR 2025arXiv:2503.21747

#3056

CTRL-O: Language-Controllable Object-Centric Visual Representation Learning

Aniket Rajiv Didolkar, Andrii Zadaianchuk, Rabiul Awal et al.

CVPR 2024arXiv:2406.18540

#3057

Fully Exploiting Every Real Sample: SuperPixel Sample Gradient Model Stealing

Yunlong Zhao, Xiaoheng Deng, Yijing Liu et al.

CVPR 2025arXiv:2505.06166

#3058

DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models

Radu Alexandru Rosu, Keyu Wu, Yao Feng et al.

CVPR 2024arXiv:2305.17368

#3059

Instance-based Max-margin for Practical Few-shot Recognition

Minghao Fu, Ke Zhu

CVPR 2025arXiv:2504.20026

#3060

LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields

Zhengqin Li, Dilin Wang, Ka chen et al.

CVPR 2024arXiv:2312.04552

#3061

Generating Illustrated Instructions

Sachit Menon, Ishan Misra, Rohit Girdhar

CVPR 2024arXiv:2404.07504

#3062

Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange

Yanhao Wu, Tong Zhang, Wei Ke et al.

CVPR 2025arXiv:2412.01798

#3063

SEAL: Semantic Attention Learning for Long Video Representation

Lan Wang, Yujia Chen, Wen-Sheng Chu et al.

CVPR 2025arXiv:2411.11909

#3064

SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization

Hongrui Jia, Chaoya Jiang, Haiyang Xu et al.

CVPR 2024arXiv:2405.19833

#3065

KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree Rotation

Fengyuan Yang, Kerui Gu, Angela Yao

CVPR 2024arXiv:2403.06102

#3066

Coherent Temporal Synthesis for Incremental Action Segmentation

Guodong Ding, Hans Golong, Angela Yao

#3067

Scene Map-based Prompt Tuning for Navigation Instruction Generation

Sheng Fan, Rui Liu, Wenguan Wang et al.

CVPR 2025arXiv:2502.11925

#3068

GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs

Yi Fang, Bowen Jin, Jiacheng Shen et al.

CVPR 2025arXiv:2503.20824

#3069

Exploiting Temporal State Space Sharing for Video Semantic Segmentation

Hesham Syed, Yun Liu, Guolei Sun et al.

CVPR 2025arXiv:2312.04540

#3070

Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations

Ahmad Rahimi, Po-Chien Luan, Yuejiang Liu et al.

CVPR 2025arXiv:2503.19783

#3071

Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models

Kartik Thakral, Tamar Glaser, Tal Hassner et al.

#3072

Robust Multimodal Survival Prediction with Conditional Latent Differentiation Variational AutoEncoder

Junjie Zhou, Jiao Tang, Yingli Zuo et al.

CVPR 2024arXiv:2402.17065

#3073

Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions

Saeed Khorram, Mingqi Jiang, Mohamad Shahbazi et al.

CVPR 2025arXiv:2506.21976

#3074

SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model

Shuhan Tan, John Wheatley Lambert, Hong Jeon et al.

CVPR 2024arXiv:2404.01351

#3075

AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation

Taeckyung Lee, Sorn Chottananurak, Taesik Gong et al.

CVPR 2025arXiv:2509.22412

#3076

FreqDebias: Towards Generalizable Deepfake Detection via Consistency-Driven Frequency Debiasing

Hossein Kashiani, Niloufar Alipour Talemi, Fatemeh Afghah

CVPR 2025arXiv:2503.10065

#3077

Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild

Damien Teney, Liangze Jiang, Florin Gogianu et al.

CVPR 2024arXiv:2405.12200

#3078

Multi-View Attentive Contextualization for Multi-View 3D Object Detection

Xianpeng Liu, Ce Zheng, Ming Qian et al.

CVPR 2025highlightarXiv:2502.20162

#3079

Gradient-Guided Annealing for Domain Generalization

Aristotelis Ballas, Christos Diou

CVPR 2025arXiv:2503.17782

#3080

GOAL: Global-local Object Alignment Learning

Hyungyu Choi, Young Kyun Jang, Chanho Eom

CVPR 2025arXiv:2503.12866

#3081

SCAP: Transductive Test-Time Adaptation via Supportive Clique-based Attribute Prompting

Chenyu Zhang, Kunlun Xu, Zichen Liu et al.

CVPR 2025arXiv:2504.02764

#3082

Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model

Shengjun Zhang, Jinzhao Li, Xin Fei et al.

CVPR 2025highlightarXiv:2503.20308

#3083

Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics

Lee Chae-Yeon, Oh Hyun-Bin, Han EunGi et al.

CVPR 2024arXiv:2403.18186

#3084

Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting

Haiwei Chen, Yajie Zhao

CVPR 2025arXiv:2503.19777

#3085

LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation

Vladan Stojnić, Yannis Kalantidis, Jiri Matas et al.

CVPR 2025arXiv:2504.10000

#3086

Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?

Yanbo Wang, Jiyang Guan, Jian Liang et al.

CVPR 2025arXiv:2502.20678

#3087

STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding

Aaryan Garg, Akash Kumar, Yogesh S. Rawat

CVPR 2025arXiv:2412.04470

#3088

Turbo3D: Ultra-fast Text-to-3D Generation

Hanzhe Hu, Tianwei Yin, Fujun Luan et al.

CVPR 2025arXiv:2503.15842

#3089

FedAWA: Adaptive Optimization of Aggregation Weights in Federated Learning Using Client Vectors

Changlong Shi, He Zhao, Bingjie Zhang et al.

CVPR 2025arXiv:2503.01725

#3090

HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization

Zitang Zhou, Ke Mei, Yu Lu et al.

CVPR 2024arXiv:2312.03420

#3091

Artist-Friendly Relightable and Animatable Neural Heads

Yingyan Xu, Prashanth Chandran, Sebastian Weiss et al.

CVPR 2025highlightarXiv:2505.04657

#3092

EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events

Shuoyan Wei, Feng Li, Shengeng Tang et al.

CVPR 2025arXiv:2504.00999

#3093

MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization

Siyuan Li, Luyuan Zhang, Zedong Wang et al.

CVPR 2025arXiv:2505.04270

#3094

Object-Shot Enhanced Grounding Network for Egocentric Video

Yisen Feng, Haoyu Zhang, Meng Liu et al.

CVPR 2024arXiv:2406.01843

#3095

L-MAGIC: Language Model Assisted Generation of Images with Coherence

zhipeng cai, Matthias Mueller, Reiner Birkl et al.

CVPR 2025arXiv:2507.06928

#3096

Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement

Qiyuan Dai, Hanzhuo Huang, Yu Wu et al.

CVPR 2025arXiv:2411.17176

#3097

ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting

Chengyou Jia, Changliang Xia, Zhuohang Dang et al.

CVPR 2024arXiv:2403.04198

#3098

CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoor Object Detection from Multi-view Images

Guanlin Shen, Jingwei Huang, Zhihua Hu et al.

CVPR 2025arXiv:2412.06774

#3099

Visual Lexicon: Rich Image Features in Language Space

XuDong Wang, Xingyi Zhou, Alireza Fathi et al.

CVPR 2024arXiv:2311.04246

#3100

ADFactory: An Effective Framework for Generalizing Optical Flow with NeRF

Han Ling, Quansen Sun, Yinghui Sun et al.

CVPR 2025arXiv:2405.16555

#3101

Building Vision Models upon Heat Conduction

Zhaozhi Wang, Yue Liu, Yunjie Tian et al.

CVPR 2024arXiv:2311.17094

#3102

In Search of a Data Transformation That Accelerates Neural Field Training

Junwon Seo, Sangyoon Lee, Kwang In Kim et al.

CVPR 2025arXiv:2412.16153

#3103

MotiF: Making Text Count in Image Animation with Motion Focal Loss

Shijie Wang, Samaneh Azadi, Rohit Girdhar et al.

CVPR 2025arXiv:2503.23331

#3104

HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation

Hongwei Zheng, Han Li, Wenrui Dai et al.

CVPR 2025highlightarXiv:2503.07591

#3105

Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning

Bardia Safaei, Faizan Siddiqui, Jiacong Xu et al.

CVPR 2025highlightarXiv:2502.20256

#3106

Do Computer Vision Foundation Models Learn the Low-level Characteristics of the Human Visual System?

Yancheng Cai, Fei Yin, Dounia Hammou et al.

CVPR 2024arXiv:2404.02889

#3107

Steganographic Passport: An Owner and User Verifiable Credential for Deep Model IP Protection Without Retraining

Qi Cui, Ruohan Meng, Chaohui Xu et al.

CVPR 2025arXiv:2503.16134

#3108

Binarized Mamba-Transformer for Lightweight Quad Bayer HybridEVS Demosaicing

Shiyang Zhou, Haijin Zeng, Yunfan Lu et al.

CVPR 2025arXiv:2503.01463

#3109

MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism

Zhixiong Nan, Xianghong Li, Tao Xiang et al.

CVPR 2025arXiv:2411.10818

#3110

FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations

Hmrishav Bandyopadhyay, Yi-Zhe Song

CVPR 2025arXiv:2412.09680

#3111

PBR-NeRF: Inverse Rendering with Physics-Based Neural Fields

Sean Wu, Shamik Basu, Tim Broedermann et al.

CVPR 2024arXiv:2403.02041

#3112

A Generative Approach for Wikipedia-Scale Visual Entity Recognition

Mathilde Caron, Ahmet Iscen, Alireza Fathi et al.

CVPR 2025arXiv:2504.00996

#3113

TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting

Liangbin Xie, Daniil Pakhomov, Zhonghao Wang et al.

CVPR 2025arXiv:2503.20998

#3114

CoMapGS: Covisibility Map-based Gaussian Splatting for Sparse Novel View Synthesis

Youngkyoon Jang, Eduardo Pérez-Pellitero

CVPR 2025arXiv:2503.06621

#3115

Dynamic Updates for Language Adaptation in Visual-Language Tracking

Xiaohai Li, Bineng Zhong, Qihua Liang et al.

#3116

Querying as Prompt: Parameter-Efficient Learning for Multimodal Language Model

Tian Liang, Jing Huang, Ming Kong et al.

CVPR 2024arXiv:2312.08338

#3117

Global Latent Neural Rendering

Thomas Tanay, Matteo Maggioni

CVPR 2025highlightarXiv:2501.05446

#3118

Relative Pose Estimation through Affine Corrections of Monocular Depth Priors

Yifan Yu, Shaohui Liu, Rémi Pautrat et al.

CVPR 2025highlightarXiv:2503.18420

#3119

Panorama Generation From NFoV Image Done Right

Dian Zheng, Cheng Zhang, Xiao-Ming Wu et al.

CVPR 2024arXiv:2406.06730

#3120

TRINS: Towards Multimodal Language Models that Can Read

Ruiyi Zhang, Yanzhe Zhang, Jian Chen et al.

CVPR 2025arXiv:2505.05587

#3121

Steepest Descent Density Control for Compact 3D Gaussian Splatting

Peihao Wang, Yuehao Wang, Dilin Wang et al.

#3122

Towards High-fidelity Artistic Image Vectorization via Texture-Encapsulated Shape Parameterization

Ye Chen, Bingbing Ni, Jinfan Liu et al.

CVPR 2025arXiv:2410.11619

#3123

MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval

Reno Kriz, Kate Sanders, David Etter et al.

CVPR 2024highlightarXiv:2403.15891

#3124

Human Motion Prediction Under Unexpected Perturbation

Jiangbei Yue, Baiyi Li, Julien Pettré et al.

CVPR 2024arXiv:2404.00676

#3125

OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos

Dongyoung Choi, Hyeonjoong Jang, Min H. Kim

CVPR 2025arXiv:2412.09545

#3126

SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing

Xueting Li, Ye Yuan, Shalini De Mello et al.

CVPR 2025arXiv:2503.18513

#3127

LookCloser: Frequency-aware Radiance Field for Tiny-Detail Scene

Xiaoyu Zhang, Weihong Pan, Chong Bao et al.

CVPR 2025arXiv:2411.16932

#3128

Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding

Andong Deng, Zhongpai Gao, Anwesa Choudhuri et al.

CVPR 2024arXiv:2403.12236

#3129

Improving Generalization via Meta-Learning on Hard Samples

Nishant Jain, Arun Suggala, Pradeep Shenoy

CVPR 2025arXiv:2504.07894

#3130

DiverseFlow: Sample-Efficient Diverse Mode Coverage in Flows

Mashrur M. Morshed, Vishnu Naresh Boddeti

CVPR 2025arXiv:2504.18032

#3131

Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models

Chen Chen, Daochang Liu, Mubarak Shah et al.

CVPR 2025arXiv:2408.16266

#3132

Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification

Yanghao Wang, Long Chen

CVPR 2024arXiv:2507.14559

#3133

LEAD: Exploring Logit Space Evolution for Model Selection

Zixuan Hu, Xiaotong Li, SHIXIANG TANG et al.

#3134

Unsupervised Deep Unrolling Networks for Phase Unwrapping

Zhile Chen, Yuhui Quan, Hui Ji

CVPR 2024arXiv:2311.17938

#3135

Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations

Lei Fan, Jianxiong Zhou, Xiaoying Xing et al.

CVPR 2025arXiv:2412.05538

#3136

Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models

Hao Cheng, Erjia Xiao, Jiayan Yang et al.

CVPR 2025arXiv:2503.18434

#3137

A Simple yet Effective Layout Token in Large Language Models for Document Understanding

Zhaoqing Zhu, Chuwei Luo, Zirui Shao et al.

CVPR 2025arXiv:2502.04293

#3138

GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation

Weihang Li, Hongli XU, Junwen Huang et al.

CVPR 2025arXiv:2504.04191

#3139

GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill

Jieming Cui, Tengyu Liu, Ziyu Meng et al.

CVPR 2025arXiv:2412.01814

#3140

COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training

Sanghwan Kim, Rui Xiao, Iuliana Georgescu et al.

CVPR 2025arXiv:2503.24129

#3141

It’s a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data

Dominik Schnaus, Nikita Araslanov, Daniel Cremers

CVPR 2025arXiv:2405.04533

#3142

ChatHuman: Chatting about 3D Humans with Tools

Jing Lin, Yao Feng, Weiyang Liu et al.

CVPR 2024arXiv:2402.07739

#3143

Task-Conditioned Adaptation of Visual Features in Multi-Task Policy Learning

Pierre Marza, Laetitia Matignon, Olivier Simonin et al.

CVPR 2024arXiv:2404.01591

#3144

Language Model Guided Interpretable Video Action Reasoning

Ning Wang, Guangming Zhu, Hongsheng Li et al.

CVPR 2025arXiv:2505.07209

#3145

Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models

Yan Xie, Zequn Zeng, Hao Zhang et al.

CVPR 2025arXiv:2405.03689

#3146

Pose Priors from Language Models

Sanjay Subramanian, Evonne Ng, Lea Müller et al.

CVPR 2024arXiv:2405.20729

#3147

Extreme Point Supervised Instance Segmentation

Hyeonjun Lee, Sehyun Hwang, Suha Kwak

CVPR 2025arXiv:2411.01492

#3148

EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark

Ming Li, Jike Zhong, Tianle Chen et al.

CVPR 2025highlightarXiv:2412.06234

#3149

Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction

Seungtae Nam, Xiangyu Sun, Gyeongjin Kang et al.

CVPR 2024highlightarXiv:2403.04303

#3150

LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking

Jialin Li, Qiang Nie, Weifu Fu et al.

CVPR 2024arXiv:2311.13612

#3151

Descriptor and Word Soups: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning

Christopher Liao, Theodoros Tsiligkaridis, Brian Kulis

CVPR 2025arXiv:2505.13437

#3152

FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance

Dian Shao, Mingfei Shi, Shengda Xu et al.

CVPR 2024arXiv:2312.02480

#3153

Differentiable Point-based Inverse Rendering

Hoon-Gyu Chung, Seokjun Choi, Seung-Hwan Baek

CVPR 2025arXiv:2502.20256

#3154

The Computer Vision Foundation

Yancheng Cai, Fei Yin, Dounia Hammou et al.

CVPR 2025arXiv:2506.18335

#3155

Rethinking Decoder Design: Improving Biomarker Segmentation Using Depth-to-Space Restoration and Residual Linear Attention

Saad Wazir, Daeyoung Kim

CVPR 2025arXiv:2504.02264

#3156

MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception

Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao et al.

CVPR 2025arXiv:2504.05303

#3157

InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

Sai Kumar Dwivedi, Dimitrije Antić, Shashank Tripathi et al.

CVPR 2024arXiv:2402.18862

#3158

Towards Backward-Compatible Continual Learning of Image Compression

Zhihao Duan, Ming Lu, Justin Yang et al.

CVPR 2024arXiv:2404.00301

#3159

Monocular Identity-Conditioned Facial Reflectance Reconstruction

Xingyu Ren, Jiankang Deng, Yuhao Cheng et al.

CVPR 2025arXiv:2504.14967

#3160

3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations

yating wang, Xuan Wang, Ran Yi et al.

CVPR 2025arXiv:2412.01822

#3161

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models

Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang et al.

CVPR 2025arXiv:2503.18725

#3162

FG^2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching

Zimin Xia, Alex Alahi

CVPR 2024arXiv:2304.05440

#3163

PixelRNN: In-pixel Recurrent Neural Networks for End-to-end–optimized Perception with Neural Sensors

Haley So, Laurie Bose, Piotr Dudek et al.

CVPR 2025arXiv:2504.12959

#3164

Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction

Dubing Chen, Huan Zheng, Jin Fang et al.

CVPR 2025arXiv:2412.00719

#3165

Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation

Shuling Zhao, Fa-Ting Hong, Xiaoshui Huang et al.

CVPR 2025arXiv:2407.13772

#3166

GroupMamba: Efficient Group-Based Visual State Space Model

Abdelrahman Shaker, Syed Talal Wasim, Salman Khan et al.

CVPR 2024arXiv:2404.03183

#3167

BodyMAP - Jointly Predicting Body Mesh and 3D Applied Pressure Map for People in Bed

Abhishek Tandon, Anujraaj Goyal, Henry M. Clever et al.

CVPR 2024arXiv:2403.11162

#3168

CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion

Xiaoyu Wu, Yang Hua, Chumeng Liang et al.

CVPR 2024arXiv:2403.02561

#3169

Semantic Human Mesh Reconstruction with Textures

xiaoyu zhan, Jianxin Yang, Yuanqi Li et al.

CVPR 2025highlightarXiv:2503.04919

#3170

FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement

Ian Huang, Yanan Bao, Karen Truong et al.

CVPR 2025arXiv:2503.23282

#3171

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos

Felix Wimbauer, Weirong Chen, Dominik Muhle et al.

CVPR 2025arXiv:2503.21781

#3172

VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models

Chi-Pin Huang, Yen-Siang Wu, Hung-Kai Chung et al.

CVPR 2025arXiv:2504.00420

#3173

Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation

Yuanqi Yao, Siao Liu, Haoming Song et al.

CVPR 2025highlightarXiv:2504.12284

#3174

How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions

Aditya Prakash, Benjamin E Lundell, Dmitry Andreychuk et al.

CVPR 2024arXiv:2308.15692

#3175

Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models

Takami Sato, Justin Yue, Nanze Chen et al.

CVPR 2024arXiv:2403.08262

#3176

BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image

Minje Kim, Tae-Kyun Kim

CVPR 2025highlightarXiv:2411.14628

#3177

HotSpot: Signed Distance Function Optimization with an Asymptotically Sufficient Condition

Zimo Wang, Cheng Wang, Taiki Yoshino et al.

CVPR 2024arXiv:2406.11129

#3178

Neural Lineage

Runpeng Yu, Xinchao Wang

#3179

M3amba: Memory Mamba is All You Need for Whole Slide Image Classification

Tingting Zheng, Kui Jiang, Yi Xiao et al.

CVPR 2024arXiv:2411.15673

#3180

Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment

Alvi Md Ishmam, Chris Thomas

CVPR 2025arXiv:2501.10283

#3181

GauSTAR: Gaussian Surface Tracking and Reconstruction

Chengwei Zheng, Lixin Xue, Juan Jose Zarate et al.

CVPR 2025highlightarXiv:2503.04459

#3182

Question-Aware Gaussian Experts for Audio-Visual Question Answering

Hongyeob Kim, Inyoung Jung, Dayoon Suh et al.

#3183

Implicit Motion Function

Yue Gao, Jiahao Li, Lei Chu et al.

CVPR 2025arXiv:2502.20249

#3184

Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels

Pierre Vuillecard, Jean-marc Odobez

#3185

When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach

TAO MA, Bing Bai, Haozhe Lin et al.

#3186

AVF-MAE++: Scaling Affective Video Facial Masked Autoencoders via Efficient Audio-Visual Self-Supervised Learning

Xuecheng Wu, Heli Sun, Yifan Wang et al.

CVPR 2024arXiv:2403.11380

#3187

Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach

Beichen Zhang, Xiaoxing Wang, Xiaohan Qin et al.

CVPR 2025arXiv:2412.14456

#3188

LEDiff: Latent Exposure Diffusion for HDR Generation

Chao Wang, Zhihao Xia, Thomas Leimkuehler et al.

#3189

Exploring Historical Information for RGBE Visual Tracking with Mamba

Chuanyu Sun, Jiqing Zhang, Yang Wang et al.

CVPR 2024arXiv:2404.03518

#3190

SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation

Chen Sichen, Yingyi Zhang, Siming Huang et al.

CVPR 2025arXiv:2411.10411

#3191

Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation

Markus Karmann, Onay Urfalioglu

CVPR 2025arXiv:2503.17731

#3192

Co-op: Correspondence-based Novel Object Pose Estimation

Sungphill Moon, Hyeontae Son, Dongcheol Hur et al.

CVPR 2024arXiv:2403.09230

#3193

Improving Distant 3D Object Detection Using 2D Box Supervision

Zetong Yang, Zhiding Yu, Christopher Choy et al.

CVPR 2024arXiv:2312.17686

#3194

Multiscale Vision Transformers Meet Bipartite Matching for Efficient Single-stage Action Localization

Ioanna Ntinou, Enrique Sanchez, Georgios Tzimiropoulos

CVPR 2025arXiv:2502.16638

#3195

Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression

Xiaoyi Qu, David Aponte, Colby Banbury et al.

CVPR 2025arXiv:2507.06973

#3196

Free on the Fly: Enhancing Flexibility in Test-Time Adaptation with Online EM

Qiyuan Dai, Sibei Yang

CVPR 2025arXiv:2503.16709

#3197

QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge

Xuan Shen, Weize Ma, Jing Liu et al.

CVPR 2025arXiv:2410.16290

#3198

A Unified Model for Compressed Sensing MRI Across Undersampling Patterns

Armeet Singh Jatyani, Jiayun Wang, Aditi Chandrashekar et al.

CVPR 2025highlightarXiv:2404.03632

#3199

Reference-Based 3D-Aware Image Editing with Triplanes

Bahri Batuhan Bilecen, Yiğit Yalın, Ning Yu et al.

CVPR 2025arXiv:2412.13652

#3200

RelationField: Relate Anything in Radiance Fields

Sebastian Koch, Johanna Wald, Mirco Colosi et al.