Most Cited CVPR "multimodal video analysis" Papers

5,589 papers found • Page 16 of 28

#3001

VIRES: Video Instance Repainting via Sketch and Text Guided Generation

Shuchen Weng, Haojie Zheng, Peixuan Zhang et al.

CVPR 2025arXiv:2411.16199
1
citations
#3002

Non-Rigid Structure-from-Motion: Temporally-Smooth Procrustean Alignment and Spatially-Variant Deformation Modeling

Jiawei Shi, Hui Deng, Yuchao Dai

CVPR 2024arXiv:2405.04309
1
citations
#3003

GeoDepth: From Point-to-Depth to Plane-to-Depth Modeling for Self-Supervised Monocular Depth Estimation

Haifeng Wu, Shuhang Gu, Lixin Duan et al.

CVPR 2025
1
citations
#3004

Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models

Jinhui Yi, Syed Talal Wasim, Yanan Luo et al.

CVPR 2025arXiv:2412.18609
1
citations
#3005

WISNet: Pseudo Label Generation on Unbalanced and Patch Annotated Waste Images

Shifan Zhang, Hongzi Zhu, Yinan He et al.

CVPR 2025
1
citations
#3006

ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos

Zetong Zhang, Manuel Kaufmann, Lixin Xue et al.

CVPR 2025arXiv:2504.13167
1
citations
#3007

Link-based Contrastive Learning for One-Shot Unsupervised Domain Adaptation

Yue Zhang, Mingyue Bin, Yuyang Zhang et al.

CVPR 2025
1
citations
#3008

Nested Diffusion Models Using Hierarchical Latent Priors

Xiao Zhang, Ruoxi Jiang, Rebecca Willett et al.

CVPR 2025arXiv:2412.05984
1
citations
#3009

End-to-End HOI Reconstruction Transformer with Graph-based Encoding

Zhenrong Wang, Qi Zheng, Sihan Ma et al.

CVPR 2025highlightarXiv:2503.06012
1
citations
#3010

SCSA: A Plug-and-Play Semantic Continuous-Sparse Attention for Arbitrary Semantic Style Transfer

Chunnan Shang, Zhizhong Wang, Hongwei Wang et al.

CVPR 2025highlightarXiv:2503.04119
1
citations
#3011

High-quality Point Cloud Oriented Normal Estimation via Hybrid Angular and Euclidean Distance Encoding

Yuanqi Li, Jingcheng Huang, Hongshen Wang et al.

CVPR 2025
1
citations
#3012

DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation

Xiaoliang Ju, Hongsheng Li

CVPR 2025arXiv:2503.06900
1
citations
#3013

Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel Imaging

Ping Wang, Lishun Wang, Gang Qu et al.

CVPR 2025arXiv:2505.23180
#3014

Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models

David Stotko, Nils Wandel, Reinhard Klein

CVPR 2024arXiv:2311.12796
#3015

From Head to Tail: Efficient Black-box Model Inversion Attack via Long-tailed Learning

Ziang Li, Hongguang Zhang, Juan Wang et al.

CVPR 2025arXiv:2503.16266
#3016

SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing

Xueting Li, Ye Yuan, Shalini De Mello et al.

CVPR 2025arXiv:2412.09545
#3017

SerialGen: Personalized Image Generation by First Standardization Then Personalization

Cong Xie, Han Zou, Ruiqi Yu et al.

CVPR 2025arXiv:2412.01485
#3018

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

Tong Wu, Guandao Yang, Zhibing Li et al.

CVPR 2024arXiv:2401.04092
#3019

On Train-Test Class Overlap and Detection for Image Retrieval

Chull Hwan Song, Jooyoung Yoon, Taebaek Hwang et al.

CVPR 2024arXiv:2404.01524
#3020

Orthogonal Adaptation for Modular Customization of Diffusion Models

Ryan Po, Guandao Yang, Kfir Aberman et al.

CVPR 2024highlightarXiv:2312.02432
#3021

Permutation Equivariance of Transformers and Its Applications

Hengyuan Xu, Liyao Xiang, Hangyu Ye et al.

CVPR 2024arXiv:2304.07735
#3022

RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark

Xin Zhang, Xue Yang, Yuxuan Li et al.

CVPR 2025arXiv:2501.04440
#3023

Few-shot Implicit Function Generation via Equivariance

Suizhi Huang, Xingyi Yang, Hongtao Lu et al.

CVPR 2025highlightarXiv:2501.01601
#3024

ViTamin: Designing Scalable Vision Models in the Vision-Language Era

Jieneng Chen, Qihang Yu, Xiaohui Shen et al.

CVPR 2024arXiv:2404.02132
#3025

MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration

Boyun Li, Haiyu Zhao, Wenxin Wang et al.

CVPR 2025arXiv:2412.20066
#3026

HomoFormer: Homogenized Transformer for Image Shadow Removal

Jie Xiao, Xueyang Fu, Yurui Zhu et al.

CVPR 2024
#3027

ViKIENet: Towards Efficient 3D Object Detection with Virtual Key Instance Enhanced Network

Zhuochen Yu, Bijie Qiu, Andy W. H. Khong

CVPR 2025
#3028

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

Yiming Zhang, Zhening Xing, Yanhong Zeng et al.

CVPR 2024arXiv:2312.13964
#3029

CoCoGaussian: Leveraging Circle of Confusion for Gaussian Splatting from Defocused Images

Jungho Lee, Suhwan Cho, Taeoh Kim et al.

CVPR 2025arXiv:2412.16028
#3030

DemoCaricature: Democratising Caricature Generation with a Rough Sketch

Dar-Yen Chen, Ayan Kumar Bhunia, Subhadeep Koley et al.

CVPR 2024arXiv:2312.04364
#3031

End-to-End Spatio-Temporal Action Localisation with Video Transformers

Alexey Gritsenko, Xuehan Xiong, Josip Djolonga et al.

CVPR 2024arXiv:2304.12160
#3032

TRINS: Towards Multimodal Language Models that Can Read

Ruiyi Zhang, Yanzhe Zhang, Jian Chen et al.

CVPR 2024arXiv:2406.06730
#3033

EFHQ: Multi-purpose ExtremePose-Face-HQ dataset

Trung Dao, Duc H Vu, Cuong Pham et al.

CVPR 2024arXiv:2312.17205
#3034

Logarithmic Lenses: Exploring Log RGB Data for Image Classification

Bruce Maxwell, Sumegha Singhania, Avnish Patel et al.

CVPR 2024
#3035

Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

Tariq Berrada, Jakob Verbeek, camille couprie et al.

CVPR 2024arXiv:2312.13314
#3036

TokenCompose: Text-to-Image Diffusion with Token-level Supervision

Zirui Wang, Zhizhou Sha, Zheng Ding et al.

CVPR 2024arXiv:2312.03626
#3037

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

David Junhao Zhang, Roni Paiss, Shiran Zada et al.

CVPR 2025arXiv:2411.05003
#3038

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

Mahtab Bigverdi, Zelun Luo, Cheng-Yu Hsieh et al.

CVPR 2025arXiv:2412.03548
#3039

Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates

Ka Chun SHUM, Jaeyeon Kim, Binh-Son Hua et al.

CVPR 2024arXiv:2309.11281
#3040

Infer from What You Have Seen Before: Temporally-dependent Classifier for Semi-supervised Video Segmentation

Jiafan Zhuang, Zilei Wang, Yixin Zhang et al.

CVPR 2024
#3041

Seeing the World through Your Eyes

Hadi Alzayer, Kevin Zhang, Brandon Y. Feng et al.

CVPR 2024arXiv:2306.09348
#3042

Augmenting Perceptual Super-Resolution via Image Quality Predictors

Fengjia Zhang, Samrudhdhi Rangrej, Tristan T Aumentado-Armstrong et al.

CVPR 2025arXiv:2504.18524
#3043

Beyond Single-Modal Boundary: Cross-Modal Anomaly Detection through Visual Prototype and Harmonization

Kai Mao, Ping Wei, Yiyang Lian et al.

CVPR 2025
#3044

Learning Vision from Models Rivals Learning Vision from Data

Yonglong Tian, Lijie Fan, Kaifeng Chen et al.

CVPR 2024arXiv:2312.17742
#3045

Text Augmented Correlation Transformer For Few-shot Classification & Segmentation

Srinivasa Rao Nandam, Sara Atito, Zhenhua Feng et al.

CVPR 2025
#3046

RegionGPT: Towards Region Understanding Vision Language Model

Qiushan Guo, Shalini De Mello, Danny Yin et al.

CVPR 2024arXiv:2403.02330
#3047

Unlocking the Potential of Pre-trained Vision Transformers for Few-Shot Semantic Segmentation through Relationship Descriptors

Ziqin Zhou, Hai-Ming Xu, Yangyang Shu et al.

CVPR 2024
#3048

Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image

Yiqun Mei, Yu Zeng, He Zhang et al.

CVPR 2024arXiv:2403.09632
#3049

Relightful Harmonization: Lighting-aware Portrait Background Replacement

Mengwei Ren, Wei Xiong, Jae Shin Yoon et al.

CVPR 2024arXiv:2312.06886
#3050

Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions

Saeed Khorram, Mingqi Jiang, Mohamad Shahbazi et al.

CVPR 2024arXiv:2402.17065
#3051

Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences

Axel Barroso-Laguna, Sowmya Munukutla, Victor Adrian Prisacariu et al.

CVPR 2024arXiv:2404.06337
#3052

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Xiang Yue, Yuansheng Ni, Kai Zhang et al.

CVPR 2024arXiv:2311.16502
#3053

MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation

Zhenyu Wu, Yuheng Zhou, Xiuwei Xu et al.

CVPR 2025arXiv:2503.13446
#3054

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

Yanhui Wang, Jianmin Bao, Wenming Weng et al.

CVPR 2024highlightarXiv:2311.18829
#3055

MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video

Hengyi Wang, Jingwen Wang, Lourdes Agapito

CVPR 2024arXiv:2312.00778
#3056

Capturing Closely Interacted Two-Person Motions with Reaction Priors

Qi Fang, Yinghui Fan, Yanjun Li et al.

CVPR 2024
#3057

MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing

Shuo Wang, Wanting Li, Yongcai Wang et al.

CVPR 2025arXiv:2412.20082
#3058

TAGA: Self-supervised Learning for Template-free Animatable Gaussian Articulated Model

Zhichao Zhai, Guikun Chen, Wenguan Wang et al.

CVPR 2025
#3059

LightOctree: Lightweight 3D Spatially-Coherent Indoor Lighting Estimation

Xuecan Wang, Shibang Xiao, Xiaohui Liang

CVPR 2024arXiv:2404.03925
#3060

Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages

Matteo Farina, Massimiliano Mancini, Giovanni Iacca et al.

CVPR 2025arXiv:2503.11609
#3061

Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents

Jun Chen, Dannong Xu, Junjie Fei et al.

CVPR 2025arXiv:2411.16740
#3062

Bi-level Learning of Task-Specific Decoders for Joint Registration and One-Shot Medical Image Segmentation

Xin Fan, Xiaolin Wang, Jiaxin Gao et al.

CVPR 2024
#3063

All-Day Multi-Camera Multi-Target Tracking

Huijie Fan, Yu Qiao, Yihao Zhen et al.

CVPR 2025
#3064

FairCLIP: Harnessing Fairness in Vision-Language Learning

Yan Luo, MIN SHI, Muhammad Osama Khan et al.

CVPR 2024arXiv:2403.19949
#3065

Navigate Beyond Shortcuts: Debiased Learning Through the Lens of Neural Collapse

Yining Wang, Junjie Sun, Chenyue Wang et al.

CVPR 2024highlightarXiv:2405.05587
#3066

DiVa-360: The Dynamic Visual Dataset for Immersive Neural Fields

Cheng-You Lu, Peisen Zhou, Angela Xing et al.

CVPR 2024highlightarXiv:2307.16897
#3067

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Mark Hamilton, Andrew Zisserman, John Hershey et al.

CVPR 2024
#3068

Learning Visual Prompt for Gait Recognition

Kang Ma, Ying Fu, Chunshui Cao et al.

CVPR 2024
#3069

Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding

Wenbo Chen, Zhen Xu, Ruotao Xu et al.

CVPR 2025
#3070

Segment Any Motion in Videos

Nan Huang, Wenzhao Zheng, Chenfeng Xu et al.

CVPR 2025arXiv:2503.22268
#3071

Visual Prompting for One-shot Controllable Video Editing without Inversion

Zhengbo Zhang, Yuxi Zhou, DUO PENG et al.

CVPR 2025arXiv:2504.14335
#3072

BIMBA: Selective-Scan Compression for Long-Range Video Question Answering

Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang et al.

CVPR 2025arXiv:2503.09590
#3073

ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models

Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.

CVPR 2024arXiv:2311.16494
#3074

PolarRec: Improving Radio Interferometric Data Reconstruction Using Polar Coordinates

Ruoqi Wang, Zhuoyang Chen, Jiayi Zhu et al.

CVPR 2024
#3075

LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network

Hao Yang, Liyuan Pan, Yan Yang et al.

CVPR 2024arXiv:2307.09815
#3076

Tiled Diffusion

Or Madar, Ohad Fried

CVPR 2025arXiv:2412.15185
#3077

StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN

Jongwoo Choi, Kwanggyoon Seo, Amirsaman Ashtari et al.

CVPR 2024arXiv:2403.14186
#3078

MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints

Pengfei Xie, Wenqiang Xu, Tutian Tang et al.

CVPR 2024arXiv:2404.10227
#3079

CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion

Xiaoyu Wu, Yang Hua, Chumeng Liang et al.

CVPR 2024arXiv:2403.11162
#3080

Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation

Ming Xu, Stephen Gould

CVPR 2024arXiv:2404.01518
#3081

ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization

Weiyao Wang, Pierre Gleize, Hao Tang et al.

CVPR 2024arXiv:2401.08937
#3082

Learning Large-Factor EM Image Super-Resolution with Generative Priors

Jiateng Shou, Zeyu Xiao, Shiyu Deng et al.

CVPR 2024
#3083

Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge

Yaqi Zhao, Yuanyang Yin, Lin Li et al.

CVPR 2025arXiv:2411.16824
#3084

PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion

Ying-Tian Liu, Yuan-Chen Guo, Guan Luo et al.

CVPR 2024arXiv:2312.09069
#3085

Learning for Transductive Threshold Calibration in Open-World Recognition

Qin ZHANG, DONGSHENG An, Tianjun Xiao et al.

CVPR 2024arXiv:2305.12039
#3086

Hazy Low-Quality Satellite Video Restoration Via Learning Optimal Joint Degradation Patterns and Continuous-Scale Super-Resolution Reconstruction

Ning Ni, Libao Zhang

CVPR 2025
#3087

ADD: Attribution-Driven Data Augmentation Framework for Boosting Image Super-Resolution

Zeyu Mi, Yu-Bin Yang

CVPR 2025
#3088

SASep: Saliency-Aware Structured Separation of Geometry and Feature for Open Set Learning on Point Clouds

Jinfeng Xu, Xianzhi Li, Yuan Tang et al.

CVPR 2025arXiv:2506.13224
#3089

Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation

Zhiwei Yang, Kexue Fu, Minghong Duan et al.

CVPR 2024arXiv:2402.18467
#3090

Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

Junyi Zhang, Charles Herrmann, Junhwa Hur et al.

CVPR 2024arXiv:2311.17034
#3091

Amodal Ground Truth and Completion in the Wild

Guanqi Zhan, Chuanxia Zheng, Weidi Xie et al.

CVPR 2024arXiv:2312.17247
#3092

MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding

Chun-Peng Chang, Shaoxiang Wang, Alain Pagani et al.

CVPR 2024arXiv:2403.03077
#3093

Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling

Jianan Fan, Dongnan Liu, Hang Chang et al.

CVPR 2024arXiv:2403.01053
#3094

Real-Time Exposure Correction via Collaborative Transformations and Adaptive Sampling

Ziwen Li, Feng Zhang, Meng Cao et al.

CVPR 2024
#3095

Gaussian Splatting SLAM

Hidenobu Matsuki, Riku Murai, Paul Kelly et al.

CVPR 2024highlightarXiv:2312.06741
#3096

PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought

Junyi Yao, Yijiang Liu, Zhen Dong et al.

CVPR 2024
#3097

NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis

Zinuo You, Andreas Geiger, Anpei Chen

CVPR 2024arXiv:2312.13328
#3098

SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model

Yucheng Mao, Boyang Wang, Nilesh Kulkarni et al.

CVPR 2025arXiv:2503.14463
#3099

Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction

Guillaume Jaume, Anurag Vaidya, Richard J. Chen et al.

CVPR 2024arXiv:2304.06819
#3100

Practical Measurements of Translucent Materials with Inter-Pixel Translucency Prior

Zhenyu Chen, Jie Guo, Shuichang Lai et al.

CVPR 2024
#3101

All-Optical Nonlinear Diffractive Deep Network for Ultrafast Image Denoising

Xiaoling Zhou, Zhemg Lee, Wei Ye et al.

CVPR 2025highlight
#3102

DPSeg: Dual-Prompt Cost Volume Learning for Open-Vocabulary Semantic Segmentation

Ziyu Zhao, Xiaoguang Li, Lingjia Shi et al.

CVPR 2025arXiv:2505.11676
#3103

RTracker: Recoverable Tracking via PN Tree Structured Memory

Yuqing Huang, Xin Li, Zikun Zhou et al.

CVPR 2024arXiv:2403.19242
#3104

View-Category Interactive Sharing Transformer for Incomplete Multi-View Multi-Label Learning

Shilong Ou, Zhe Xue, Yawen Li et al.

CVPR 2024highlight
#3105

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

Yabin Zhang, Wenjie Zhu, Hui Tang et al.

CVPR 2024arXiv:2403.17589
#3106

DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification

Darryl Ho, Samuel Madden

CVPR 2025arXiv:2506.12585
#3107

Hierarchical Knowledge Prompt Tuning for Multi-task Test-Time Adaptation

Qiang Zhang, Mengsheng Zhao, Jiawei Liu et al.

CVPR 2025
#3108

CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization

Junhao Xu, Yanan Zhang, Zhi Cai et al.

CVPR 2025arXiv:2503.03430
#3109

A Versatile Framework for Continual Test-Time Domain Adaptation: Balancing Discriminability and Generalizability

Xu Yang, Xuan chen, Moqi Li et al.

CVPR 2024
#3110

Efficient Solution of Point-Line Absolute Pose

Petr Hruby, Timothy Duff, Marc Pollefeys

CVPR 2024highlightarXiv:2404.16552
#3111

SPIN: Simultaneous Perception Interaction and Navigation

Shagun Uppal, Ananye Agarwal, Haoyu Xiong et al.

CVPR 2024arXiv:2405.07991
#3112

CAMixerSR: Only Details Need More "Attention"

Yan Wang, Yi Liu, Shijie Zhao et al.

CVPR 2024
#3113

A Focused Human Body Model for Accurate Anthropometric Measurements Extraction

Shuhang Chen, Xianliang Huang, Zhizhou Zhong et al.

CVPR 2025
#3114

FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures

Lisa Mais, Peter Hirsch, Claire Managan et al.

CVPR 2024arXiv:2404.00130
#3115

POPDG: Popular 3D Dance Generation with PopDanceSet

Zhenye Luo, Min Ren, Xuecai Hu et al.

CVPR 2024arXiv:2405.03178
#3116

RankMatch: Exploring the Better Consistency Regularization for Semi-supervised Semantic Segmentation

Huayu Mai, Rui Sun, Tianzhu Zhang et al.

CVPR 2024
#3117

CoDe: An Explicit Content Decoupling Framework for Image Restoration

Enxuan Gu, Hongwei Ge, Yong Guo

CVPR 2024
#3118

Masked Spatial Propagation Network for Sparsity-Adaptive Depth Refinement

Jinyoung Jun, Jae-Han Lee, Chang-Su Kim

CVPR 2024arXiv:2404.19294
#3119

D^4: Dataset Distillation via Disentangled Diffusion Model

Duo Su, Junjie Hou, Weizhi Gao et al.

CVPR 2024
#3120

Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations

Jungin Park, Jiyoung Lee, Kwanghoon Sohn

CVPR 2025arXiv:2503.19706
#3121

An Empirical Study of the Generalization Ability of Lidar 3D Object Detectors to Unseen Domains

George Eskandar

CVPR 2024arXiv:2402.17562
#3122

MarkovGen: Structured Prediction for Efficient Text-to-Image Generation

Sadeep Jayasumana, Daniel Glasner, Srikumar Ramalingam et al.

CVPR 2024arXiv:2308.10997
#3123

Be More Specific: Evaluating Object-centric Realism in Synthetic Images

Anqi Liang, Ciprian Adrian Corneanu, Qianli Feng et al.

CVPR 2025
#3124

Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data

Xinting Liao, Weiming Liu, Chaochao Chen et al.

CVPR 2024arXiv:2403.16398
#3125

GPVK-VL: Geometry-Preserving Virtual Keyframes for Visual Localization under Large Viewpoint Changes

Yunxuan Li, Lei Fan, Xiaoying Xing et al.

CVPR 2025
#3126

ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification

Jiangbo Shi, Chen Li, Tieliang Gong et al.

CVPR 2024arXiv:2502.08391
#3127

Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance

Phuc Nguyen, Tuan Duc Ngo, Evangelos Kalogerakis et al.

CVPR 2024arXiv:2312.10671
#3128

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs

Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed et al.

CVPR 2024arXiv:2404.07449
#3129

CaDeT: a Causal Disentanglement Approach for Robust Trajectory Prediction in Autonomous Driving

Mozhgan Pourkeshavarz, Junrui Zhang, Amir Rasouli

CVPR 2024
#3130

Continual Forgetting for Pre-trained Vision Models

Hongbo Zhao, Bolin Ni, Junsong Fan et al.

CVPR 2024arXiv:2403.11530
#3131

Layered Motion Fusion: Lifting Motion Segmentation to 3D in Egocentric Videos

Vadim Tschernezki, Diane Larlus, Andrea Vedaldi et al.

CVPR 2025arXiv:2506.05546
#3132

Boosting Neural Representations for Videos with a Conditional Decoder

XINJIE ZHANG, Ren Yang, Dailan He et al.

CVPR 2024highlightarXiv:2402.18152
#3133

Unsupervised Feature Learning with Emergent Data-Driven Prototypicality

Yunhui Guo, Youren Zhang, Yubei Chen et al.

CVPR 2024arXiv:2307.01421
#3134

Text-Guided 3D Face Synthesis - From Generation to Editing

Yunjie Wu, Yapeng Meng, Zhipeng Hu et al.

CVPR 2024arXiv:2312.00375
#3135

Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

Huan Ling, Seung Wook Kim, Antonio Torralba et al.

CVPR 2024highlightarXiv:2312.13763
#3136

IReNe: Instant Recoloring of Neural Radiance Fields

Alessio Mazzucchelli, Adrian Garcia-Garcia, Elena Garces et al.

CVPR 2024arXiv:2405.19876
#3137

MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations

Kyungho Bae, Jinhyung Kim, Sihaeng Lee et al.

CVPR 2025highlightarXiv:2503.15871
#3138

Adapting Pre-trained 3D Models for Point Cloud Video Understanding via Cross-frame Spatio-temporal Perception

Baixuan Lv, Yaohua Zha, Tao Dai et al.

CVPR 2025
#3139

Constrained Layout Generation with Factor Graphs

Mohammed Haroon Dupty, Yanfei Dong, Sicong Leng et al.

CVPR 2024arXiv:2404.00385
#3140

Deterministic Certification of Graph Neural Networks against Graph Poisoning Attacks with Arbitrary Perturbations

Jiate Li, Meng Pang, Yun Dong et al.

CVPR 2025arXiv:2503.18503
#3141

URHand: Universal Relightable Hands

Zhaoxi Chen, Gyeongsik Moon, Kaiwen Guo et al.

CVPR 2024arXiv:2401.05334
#3142

Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D Motion

Saad Lahlali, Sandra Kara, Hejer AMMAR et al.

CVPR 2025arXiv:2503.15022
#3143

Rethinking Reconstruction and Denoising in the Dark: New Perspective, General Architecture and Beyond

Long Ma, Tengyu Ma, Ziye Li et al.

CVPR 2025
#3144

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Enshen Zhou, Qi Su, Cheng Chi et al.

CVPR 2025arXiv:2412.04455
#3145

Neural Implicit Morphing of Face Images

Guilherme Schardong, Tiago Novello, Hallison Paz et al.

CVPR 2024arXiv:2308.13888
#3146

Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation

Yuheng Feng, Changsong Wen, Zelin Peng et al.

CVPR 2025
#3147

Distilling CLIP with Dual Guidance for Learning Discriminative Human Body Shape Representation

Feng Liu, Minchul Kim, Zhiyuan Ren et al.

CVPR 2024
#3148

Snapshot Lidar: Fourier Embedding of Amplitude and Phase for Single-Image Depth Reconstruction

Sarah Friday, Yunzi Shi, Yaswanth Kumar Cherivirala et al.

CVPR 2024
#3149

CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification

Haoran Lai, Qingsong Yao, Zihang Jiang et al.

CVPR 2024arXiv:2402.17417
#3150

MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant

Chenlu Zhan, Gaoang Wang, Yu LIN et al.

CVPR 2024arXiv:2403.04290
#3151

GLID: Pre-training a Generalist Encoder-Decoder Vision Model

Jihao Liu, Jinliang Zheng, Yu Liu et al.

CVPR 2024arXiv:2404.07603
#3152

Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion

Sofia Casarin, Cynthia Ugwu, Sergio Escalera et al.

CVPR 2024arXiv:2403.15194
#3153

Kernel Adaptive Convolution for Scene Text Detection via Distance Map Prediction

Jinzhi Zheng, Heng Fan, Libo Zhang

CVPR 2024
#3154

GeoAvatar: Geometrically-Consistent Multi-Person Avatar Reconstruction from Sparse Multi-View Videos

Soohyun Lee, SeoYeon Kim, HeeKyung Lee et al.

CVPR 2025
#3155

Universal Domain Adaptation for Semantic Segmentation

Seun-An Choe, Keon Hee Park, Jinwoo Choi et al.

CVPR 2025arXiv:2505.22458
#3156

NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation

Sicheng Li, Hao Li, Yiyi Liao et al.

CVPR 2024arXiv:2404.02185
#3157

Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing

Hyelin Nam, Gihyun Kwon, Geon Yeong Park et al.

CVPR 2024arXiv:2311.18608
#3158

Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy

Zesen Cheng, Hang Zhang, Kehan Li et al.

CVPR 2025highlight
#3159

GeoMM: On Geodesic Perspective for Multi-modal Learning

Shibin Mei, Hang Wang, Bingbing Ni

CVPR 2025arXiv:2505.11216
#3160

PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar

Tzofi Klinghoffer, Xiaoyu Xiang, Siddharth Somasundaram et al.

CVPR 2024arXiv:2312.14239
#3161

DiffLoc: Diffusion Model for Outdoor LiDAR Localization

Wen Li, Yuyang Yang, Shangshu Yu et al.

CVPR 2024
#3162

Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation

Markus Karmann, Onay Urfalioglu

CVPR 2025arXiv:2411.10411
#3163

Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation

Mingyu Lee, Jongwon Choi

CVPR 2024arXiv:2403.06247
#3164

Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models

Pengze Zhang, Hubery Yin, Chen Li et al.

CVPR 2024highlightarXiv:2403.08381
#3165

Stabilizing and Accelerating Autofocus with Expert Trajectory Regularized Deep Reinforcement Learning

Shouhang Zhu, Chenglin Li, Yuankun Jiang et al.

CVPR 2025
#3166

RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models

Greg Heinrich, Mike Ranzinger, Danny Yin et al.

CVPR 2025arXiv:2412.07679
#3167

Font-Agent: Enhancing Font Understanding with Large Language Models

Yingxin Lai, Cuijie Xu, Haitian Shi et al.

CVPR 2025
#3168

Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning

Siteng Huang, Biao Gong, Yutong Feng et al.

CVPR 2024arXiv:2303.15230
#3169

Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement

Daiwei Yu, Zhuorong Li, Lina Wei et al.

CVPR 2024arXiv:2403.09101
#3170

LoCoNet: Long-Short Context Network for Active Speaker Detection

Xizi Wang, Feng Cheng, Gedas Bertasius

CVPR 2024arXiv:2301.08237
#3171

WinSyn: : A High Resolution Testbed for Synthetic Data

Tom Kelly, John Femiani, Peter Wonka

CVPR 2024arXiv:2310.08471
#3172

Multi-Modal Contrastive Masked Autoencoders: A Two-Stage Progressive Pre-training Approach for RGBD Datasets

Muhammad Abdullah Jamal, Omid Mohareri

CVPR 2025
#3173

Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

Daichi Horita, Naoto Inoue, Kotaro Kikuchi et al.

CVPR 2024arXiv:2311.13602
#3174

Wired Perspectives: Multi-View Wire Art Embraces Generative AI

Zhiyu Qu, LAN YANG, Honggang Zhang et al.

CVPR 2024arXiv:2311.15421
#3175

Small Scale Data-Free Knowledge Distillation

He Liu, Yikai Wang, Huaping Liu et al.

CVPR 2024arXiv:2406.07876
#3176

Transfer CLIP for Generalizable Image Denoising

Jun Cheng, Dong Liang, Shan Tan

CVPR 2024arXiv:2403.15132
#3177

Validating Privacy-Preserving Face Recognition under a Minimum Assumption

Hui Zhang, Xingbo Dong, YenLungLai et al.

CVPR 2024
#3178

STINR: Deciphering Spatial Transcriptomics via Implicit Neural Representation

Yisi Luo, Xile Zhao, Kai Ye et al.

CVPR 2025
#3179

CLiC: Concept Learning in Context

Mehdi Safaee, Aryan Mikaeili, Or Patashnik et al.

CVPR 2024highlightarXiv:2311.17083
#3180

IDGuard: Robust General Identity-centric POI Proactive Defense Against Face Editing Abuse

Yunshu Dai, Jianwei Fei, Fangjun Huang

CVPR 2024
#3181

3D-SLNR: A Super Lightweight Neural Representation for Large-scale 3D Mapping

Chenhui Shi, Fulin Tang, Ning An et al.

CVPR 2025
#3182

R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization

Xudong Jiang, Fangjinhua Wang, Silvano Galliani et al.

CVPR 2025arXiv:2501.01421
#3183

Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology

Wenhao Tang, Fengtao ZHOU, Sheng Huang et al.

CVPR 2024arXiv:2402.17228
#3184

A Unified Image-Dense Annotation Generation Model for Underwater Scenes

Hongkai Lin, Dingkang Liang, Zhenghao Qi et al.

CVPR 2025arXiv:2503.21771
#3185

SpatialTracker: Tracking Any 2D Pixels in 3D Space

Yuxi Xiao, Qianqian Wang, Shangzhan Zhang et al.

CVPR 2024highlightarXiv:2404.04319
#3186

TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models

Zhongwei Zhang, Fuchen Long, Yingwei Pan et al.

CVPR 2024arXiv:2403.17005
#3187

Perceptual Assessment and Optimization of HDR Image Rendering

Peibei Cao, Rafal Mantiuk, Kede Ma

CVPR 2024arXiv:2310.12877
#3188

Pose-Transformed Equivariant Network for 3D Point Trajectory Prediction

Ruixuan Yu, Jian Sun

CVPR 2024
#3189

FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

Alice Heiman, Xiaoman Zhang, Emma Chen et al.

CVPR 2025arXiv:2411.18672
#3190

Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction

Dong Li, Wenqi Zhong, Wei Yu et al.

CVPR 2025arXiv:2505.16980
#3191

Temporally Consistent Object-Centric Learning by Contrasting Slots

Anna Manasyan, Maximilian Seitzer, Filip Radovic et al.

CVPR 2025arXiv:2412.14295
#3192

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

Yuwen Xiong, Zhiqi Li, Yuntao Chen et al.

CVPR 2024highlightarXiv:2401.06197
#3193

SET: Spectral Enhancement for Tiny Object Detection

Huixin Sun, Runqi Wang, Yanjing Li et al.

CVPR 2025
#3194

Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

Jianing "Jed" Yang, Alexander Sax, Kevin Liang et al.

CVPR 2025arXiv:2501.13928
#3195

Illumination Spectrum Estimation for Multispectral Images via Surface Reflectance Modeling and Spatial-Spectral Feature Generation

Hyejin Oh, Woo-Shik Kim, Sangyoon Lee et al.

CVPR 2025
#3196

Multimodal Representation Learning by Alternating Unimodal Adaptation

Xiaohui Zhang, Jaehong Yoon, Mohit Bansal et al.

CVPR 2024arXiv:2311.10707
#3197

Compositional Video Understanding with Spatiotemporal Structure-based Transformers

Hoyeoung Yun, Jinwoo Ahn, Minseo Kim et al.

CVPR 2024
#3198

EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance

Yang Yue, Yulin Wang, Haojun Jiang et al.

CVPR 2025arXiv:2504.13065
#3199

Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text

Junshu Tang, Yanhong Zeng, Ke Fan et al.

CVPR 2024arXiv:2403.16897
#3200

CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model

Xiaoding Yuan, Shitao Tang, Kejie Li et al.

CVPR 2025arXiv:2407.07174