Most Cited CVPR "discrete token space" Papers

5,589 papers found • Page 18 of 28

Filters:Most Cited CVPR discrete token space Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#3401

LiSA: LiDAR Localization with Semantic Awareness

Bochun Yang, Zijun Li, Wen Li et al.

CVPR 2024highlight

#3402

Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation

Jin Wang, Bingfeng Zhang, Jian Pang et al.

CVPR 2024arXiv:2405.08458

#3403

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary

Kevin Qinghong Lin, Mike Zheng Shou

CVPR 2025arXiv:2503.09402

#3404

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

Ke Fan, Zechen Bai, Tianjun Xiao et al.

CVPR 2024arXiv:2406.09196

#3405

Learning Coupled Dictionaries from Unpaired Data for Image Super-Resolution

Longguang Wang, Juncheng Li, Yingqian Wang et al.

CVPR 2024

#3406

C3: High-Performance and Low-Complexity Neural Compression from a Single Image or Video

Hyunjik Kim, Matthias Bauer, Lucas Theis et al.

CVPR 2024arXiv:2312.02753

#3407

AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing

Fan Yang, Tianyi Chen, XIAOSHENG HE et al.

CVPR 2024arXiv:2312.02209

#3408

iToF-flow-based High Frame Rate Depth Imaging

Yu Meng, Zhou Xue, Xu Chang et al.

CVPR 2024

#3409

Rethinking Human Motion Prediction with Symplectic Integral

Haipeng Chen, Kedi L yu, Zhenguang Liu et al.

CVPR 2024

#3410

Detector-Free Structure from Motion

Xingyi He, Jiaming Sun, Yifan Wang et al.

CVPR 2024arXiv:2306.15669

#3411

Holodeck: Language Guided Generation of 3D Embodied AI Environments

Yue Yang, Fan-Yun Sun, Luca Weihs et al.

CVPR 2024arXiv:2312.09067

#3412

DiVAS: Video and Audio Synchronization with Dynamic Frame Rates

Clara Maria Fernandez Labrador, Mertcan Akcay, Eitan Abecassis et al.

CVPR 2024

#3413

Redefining <Creative> in Dictionary: Towards an Enhanced Semantic Understanding of Creative Generation

Fu Feng, Yucheng Xie, Xu Yang et al.

CVPR 2025arXiv:2410.24160

#3414

Variance-Based Membership Inference Attacks Against Large-Scale Image Captioning Models

Daniel Samira, Edan Habler, Yuval Elovici et al.

CVPR 2025

#3415

Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation

Libiao Chen, Dong Nie, Junjun Pan et al.

CVPR 2025

#3416

Camera Resection from Known Line Pencils and a Radially Distorted Scanline

Juan Carlos Dibene Simental, Enrique Dunn

CVPR 2025

#3417

Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos

Chen Liu, Peike Li, Qingtao Yu et al.

CVPR 2024

#3418

Inter-X: Towards Versatile Human-Human Interaction Analysis

Liang Xu, Xintao Lv, Yichao Yan et al.

CVPR 2024arXiv:2312.16051

#3419

Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

Yijun Yang, Tianyi Zhou, kanxue Li et al.

CVPR 2024arXiv:2311.16714

#3420

One-Shot Open Affordance Learning with Foundation Models

Gen Li, Deqing Sun, Laura Sevilla-Lara et al.

CVPR 2024arXiv:2311.17776

#3421

SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks

Xinyu Shi, Zecheng Hao, Zhaofei Yu

CVPR 2024arXiv:2403.14302

#3422

Tactile-Augmented Radiance Fields

Yiming Dou, Fengyu Yang, Yi Liu et al.

CVPR 2024arXiv:2405.04534

#3423

Mean-Shift Feature Transformer

Takumi Kobayashi

CVPR 2024

#3424

Consistent Prompting for Rehearsal-Free Continual Learning

Zhanxin Gao, Jun Cen, Xiaobin Chang

CVPR 2024arXiv:2403.08568

#3425

KVQ: Kwai Video Quality Assessment for Short-form Videos

Yiting Lu, Xin Li, Yajing Pei et al.

CVPR 2024arXiv:2402.07220

#3426

SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons

Yuanyou Xu, Zongxin Yang, Yi Yang

CVPR 2025highlight

#3427

Purified and Unified Steganographic Network

GuoBiao Li, Sheng Li, Zicong Luo et al.

CVPR 2024arXiv:2402.17210

#3428

PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation

Ardian Umam, Cheng-Kun Yang, Min-Hung Chen et al.

CVPR 2024arXiv:2312.04016

#3429

Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model

Dian Zheng, Xiao-Ming Wu, Shuzhou Yang et al.

CVPR 2024arXiv:2403.11157

#3430

Closest Neighbors are Harmful for Lightweight Masked Auto-encoders

Jian Meng, Ahmed Hasssan, Li Yang et al.

CVPR 2025

#3431

Fast Adaptation for Human Pose Estimation via Meta-Optimization

Shengxiang Hu, Huaijiang Sun, Bin Li et al.

CVPR 2024

#3432

Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection

Jiawen Zhu, Choubo Ding, Yu Tian et al.

CVPR 2024arXiv:2310.12790

#3433

L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream

Jingtao Sun, Yaonan Wang, Mingtao Feng et al.

CVPR 2024

#3434

MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling

Xuzhe Zhang, Yuhao Wu, Elsa Angelini et al.

CVPR 2024arXiv:2303.09373

#3435

Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection

Jikang Cheng, Zhiyuan Yan, Ying Zhang et al.

CVPR 2025arXiv:2411.11396

#3436

IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM

Minghao Yin, Shangzhe Wu, Kai Han

CVPR 2024

#3437

RDD: Robust Feature Detector and Descriptor using Deformable Transformer

Gonglin Chen, Tianwen Fu, Haiwei Chen et al.

CVPR 2025arXiv:2505.08013

#3438

Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens

Zhiwen Chen, Zhiyu Zhu, Yifan Zhang et al.

CVPR 2024

#3439

FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering

Guofeng Feng, Siyan Chen, Rong Fu et al.

CVPR 2025arXiv:2408.07967

#3440

Boosting Image Quality Assessment through Efficient Transformer Adaptation with Local Feature Enhancement

Kangmin Xu, Liang Liao, Jing Xiao et al.

CVPR 2024

#3441

Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels

Zhuohong Li, Wei He, Jiepan Li et al.

CVPR 2024highlightarXiv:2403.02746

#3442

Multi-Modal Hallucination Control by Visual Information Grounding

Alessandro Favero, Luca Zancato, Matthew Trager et al.

CVPR 2024arXiv:2403.14003

#3443

Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning

Hasin Us Sami, Swapneel Sen, Amit K. Roy-Chowdhury et al.

CVPR 2025arXiv:2506.04453

#3444

Exploring Orthogonality in Open World Object Detection

Zhicheng Sun, Jinghan Li, Yadong Mu

CVPR 2024

#3445

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu et al.

CVPR 2024arXiv:2310.10624

#3446

SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer

Rui Zhu, Yingwei Pan, Yehao Li et al.

CVPR 2024arXiv:2403.17004

#3447

Traffic Scene Parsing through the TSP6K Dataset

Peng-Tao Jiang, Yuqi Yang, Yang Cao et al.

CVPR 2024arXiv:2303.02835

#3448

TransPixeler: Advancing Text-to-Video Generation with Transparency

Luozhou Wang, Yijun Li, ZhiFei Chen et al.

CVPR 2025arXiv:2501.03006

#3449

Hybrid Reciprocal Transformer with Triplet Feature Alignment for Scene Graph Generation

Jiawei Fu, ZHANG Tiantian, Kai Chen et al.

CVPR 2025

#3450

KPConvX: Modernizing Kernel Point Convolution with Kernel Attention

Hugues Thomas, Yao-Hung Hubert Tsai, Timothy Barfoot et al.

CVPR 2024arXiv:2405.13194

#3451

Latency Correction for Event-guided Deblurring and Frame Interpolation

Yixin Yang, Jinxiu Liang, Bohan Yu et al.

CVPR 2024

#3452

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Muyang Li, Tianle Cai, Jiaxin Cao et al.

CVPR 2024highlightarXiv:2402.19481

#3453

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

Juhong Min, Shyamal Buch, Arsha Nagrani et al.

CVPR 2024arXiv:2404.06511

#3454

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models

Nastaran Saadati, Minh Pham, Nasla Saleem et al.

CVPR 2024arXiv:2404.08079

#3455

NTO3D: Neural Target Object 3D Reconstruction with Segment Anything

Xiaobao Wei, Renrui Zhang, Jiarui Wu et al.

CVPR 2024arXiv:2309.12790

#3456

Text-Driven Image Editing via Learnable Regions

Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai et al.

CVPR 2024arXiv:2311.16432

#3457

ZERO-IG: Zero-Shot Illumination-Guided Joint Denoising and Adaptive Enhancement for Low-Light Images

Yiqi Shi, Duo Liu, Liguo Zhang et al.

CVPR 2024

#3458

Self-Supervised Representation Learning from Arbitrary Scenarios

Zhaowen Li, Yousong Zhu, Zhiyang Chen et al.

CVPR 2024

#3459

Factored-NeuS: Reconstructing Surfaces, Illumination, and Materials of Possibly Glossy Objects

Yue Fan, Ningjing Fan, Ivan Skorokhodov et al.

CVPR 2025arXiv:2305.17929

#3460

Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model

Yuxiang Mao, Zhenfeng Fan, Zhijie Zhang et al.

CVPR 2025

#3461

Rethinking Multi-domain Generalization with A General Learning Objective

Zhaorui Tan, Xi Yang, Kaizhu Huang

CVPR 2024arXiv:2402.18853

#3462

Adversarial Distillation Based on Slack Matching and Attribution Region Alignment

Shenglin Yin, Zhen Xiao, Mingxuan Song et al.

CVPR 2024

#3463

SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction

Yutao Tang, Yuxiang Guo, Deming Li et al.

CVPR 2025arXiv:2411.12592

#3464

Let's Chorus: Partner-aware Hybrid Song-Driven 3D Head Animation

Xiumei Xie, Zikai Huang, Wenhao Xu et al.

CVPR 2025

#3465

Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model

Yue-Hua Han, Tai-Ming Huang, Kailung Hua et al.

CVPR 2025arXiv:2404.05583

#3466

The Neglected Tails in Vision-Language Models

Shubham Parashar, Tian Liu, Zhiqiu Lin et al.

CVPR 2024arXiv:2401.12425

#3467

Dense-SfM: Structure from Motion with Dense Consistent Matching

JongMin Lee, Sungjoo Yoo

CVPR 2025arXiv:2501.14277

#3468

Multi-View Attentive Contextualization for Multi-View 3D Object Detection

Xianpeng Liu, Ce Zheng, Ming Qian et al.

CVPR 2024arXiv:2405.12200

#3469

SODA: Bottleneck Diffusion Models for Representation Learning

Drew Hudson, Daniel Zoran, Mateusz Malinowski et al.

CVPR 2024arXiv:2311.17901

#3470

LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos

Daniel Etaat, Dvij Rajesh Kalaria, Nima Rahmanian et al.

CVPR 2025arXiv:2503.20936

#3471

AHIVE: Anatomy-aware Hierarchical Vision Encoding for Interactive Radiology Report Retrieval

Sixing Yan, William K. Cheung, Ivor Tsang et al.

CVPR 2024

#3472

Effortless Active Labeling for Long-Term Test-Time Adaptation

Guowei Wang, Changxing Ding

CVPR 2025arXiv:2503.14564

#3473

SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh Deformation

Yanzhe Liu, Rong Chen, Yushi Li et al.

CVPR 2024

#3474

Enhancing the Power of OOD Detection via Sample-Aware Model Selection

Feng Xue, Zi He, Yuan Zhang et al.

CVPR 2024

#3475

Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations

Kewei Wang, Yizheng Wu, Jun Cen et al.

CVPR 2024arXiv:2403.13261

#3476

Can Machines Understand Composition? Dataset and Benchmark for Photographic Image Composition Embedding and Understanding

Zhaoran Zhao, Peng Lu, Anran Zhang et al.

CVPR 2025highlight

#3477

MMA: Multi-Modal Adapter for Vision-Language Models

Lingxiao Yang, Ru-Yuan Zhang, Yanchen Wang et al.

CVPR 2024

#3478

Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment

Zheren Fu, Lei Zhang, Hou Xia et al.

CVPR 2024

#3479

Grounding and Enhancing Grid-based Models for Neural Fields

Zelin Zhao, FENGLEI FAN, Wenlong Liao et al.

CVPR 2024arXiv:2403.20002

#3480

A Category Agnostic Model for Visual Rearrangment

Yuyi Liu, Xinhang Song, Weijie Li et al.

CVPR 2024

#3481

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos

Felix Wimbauer, Weirong Chen, Dominik Muhle et al.

CVPR 2025arXiv:2503.23282

#3482

Towards More Unified In-context Visual Understanding

Dianmo Sheng, Dongdong Chen, Zhentao Tan et al.

CVPR 2024arXiv:2312.02520

#3483

Towards Progressive Multi-Frequency Representation for Image Warping

Jun Xiao, Zihang Lyu, Cong Zhang et al.

CVPR 2024

#3484

Fourier-basis Functions to Bridge Augmentation Gap: Rethinking Frequency Augmentation in Image Classification

Mei Vaish, Shunxin Wang, Nicola Strisciuglio

CVPR 2024arXiv:2403.01944

#3485

SDBF: Steep-Decision-Boundary Fingerprinting for Hard-Label Tampering Detection of DNN Models

Xiaofan Bai, Shixin Li, Xiaojing Ma et al.

CVPR 2025

#3486

VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction

Jiaqi Lin, Zhihao Li, Xiao Tang et al.

CVPR 2024arXiv:2402.17427

#3487

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation

Yihua Cheng, Yaning Zhu, Zongji Wang et al.

CVPR 2024arXiv:2403.15664

#3488

Molecular Data Programming: Towards Molecule Pseudo-labeling with Systematic Weak Supervision

Xin Juan, Kaixiong Zhou, Ninghao Liu et al.

CVPR 2024

#3489

OTE: Exploring Accurate Scene Text Recognition Using One Token

Jianjun Xu, Yuxin Wang, Hongtao Xie et al.

CVPR 2024

#3490

RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images

Junjin Xiao, Qing Zhang, Yongwei Nie et al.

CVPR 2025arXiv:2503.14198

#3491

TTA-EVF: Test-Time Adaptation for Event-based Video Frame Interpolation via Reliable Pixel and Sample Estimation

Hoonhee Cho, Taewoo Kim, Yuhwan Jeong et al.

CVPR 2024

#3492

HUGS: Human Gaussian Splats

Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel et al.

CVPR 2024arXiv:2311.17910

#3493

IRIS: Inverse Rendering of Indoor Scenes from Low Dynamic Range Images

Chih-Hao Lin, Jia-Bin Huang, Zhengqin Li et al.

CVPR 2025arXiv:2401.12977

#3494

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

Yushi Hu, Otilia Stretcu, Chun-Ta Lu et al.

CVPR 2024arXiv:2312.03052

#3495

Gromov–Wasserstein Problem with Cyclic Symmetry

Shoichiro Takeda, Yasunori Akagi

CVPR 2025

#3496

Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera

Jiye Lee, Hanbyul Joo

CVPR 2024arXiv:2401.00847

#3497

CASP: Consistency-aware Audio-induced Saliency Prediction Model for Omnidirectional Video

Zhaolin Wan, Han Qin, Zhiyang Li et al.

CVPR 2025

#3498

DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses

Chen Zhao, Tong Zhang, Zheng Dang et al.

CVPR 2024

#3499

A Universal Scale-Adaptive Deformable Transformer for Image Restoration across Diverse Artifacts

Xuyi He, Yuhui Quan, Ruotao Xu et al.

CVPR 2025

#3500

AVID: Any-Length Video Inpainting with Diffusion Model

Zhixing Zhang, Bichen Wu, Xiaoyan Wang et al.

CVPR 2024arXiv:2312.03816

#3501

Neural Inverse Rendering from Propagating Light

Anagh Malik, Benjamin Attal, Andrew Xie et al.

CVPR 2025arXiv:2506.05347

#3502

Hyper-MD: Mesh Denoising with Customized Parameters Aware of Noise Intensity and Geometric Characteristics

Xingtao Wang, Hongliang Wei, Xiaopeng Fan et al.

CVPR 2024

#3503

TULIP: Multi-camera 3D Precision Assessment of Parkinson’s Disease

Kyungdo Kim, Sihan Lyu, Sneha Mantri et al.

CVPR 2024

#3504

ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation

Ali Athar, Xueqing Deng, Liang-Chieh Chen

CVPR 2025arXiv:2412.09754

#3505

A4A: Adapter for Adapter Transfer via All-for-All Mapping for Cross-Architecture Models

Keyu Tu, Mengqi Huang, Zhuowei Chen et al.

CVPR 2025

#3506

KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation

Jihua Peng, Yanghong Zhou, Tracy P Y Mok

CVPR 2024arXiv:2404.00658

#3507

MFP: Making Full Use of Probability Maps for Interactive Image Segmentation

Chaewon Lee, Seon-Ho Lee, Chang-Su Kim

CVPR 2024arXiv:2404.18448

#3508

Towards Precise Embodied Dialogue Localization via Causality Guided Diffusion

Haoyu Wang, Le Wang, Sanping Zhou et al.

CVPR 2025

#3509

Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining

Jiahao Nie, Yun Xing, Gongjie Zhang et al.

CVPR 2024arXiv:2401.08407

#3510

Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning

Zhengwei Fang, Rui Wang, Tao Huang et al.

CVPR 2024highlightarXiv:2209.11964

#3511

An Empirical Study of Scaling Law for Scene Text Recognition

Miao Rang, Zhenni Bi, Chuanjian Liu et al.

CVPR 2024

#3512

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

Xianqi Wang, Gangwei Xu, Hao Jia et al.

CVPR 2024highlightarXiv:2403.00486

#3513

When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation

Xiaoming Li, Xinyu Hou, Chen Change Loy

CVPR 2024

#3514

Differentiable Neural Surface Refinement for Modeling Transparent Objects

Weijian Deng, Dylan Campbell, Chunyi Sun et al.

CVPR 2024

#3515

Disentangling Safe and Unsafe Image Corruptions via Anisotropy and Locality

Ramchandran Muthukumar, Ambar Pal, Jeremias Sulam et al.

CVPR 2025

#3516

Low-power Continuous Remote Behavioral Localization with Event Cameras

Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez et al.

CVPR 2024arXiv:2312.03799

#3517

Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting

Taeho Kang, Youngki Lee

CVPR 2024highlightarXiv:2402.18330

#3518

AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One

Mike Ranzinger, Greg Heinrich, Jan Kautz et al.

CVPR 2024arXiv:2312.06709

#3519

Towards Co-Evaluation of Cameras HDR and Algorithms for Industrial-Grade 6DoF Pose Estimation

Agastya Kalra, Guy Stoppi, Dmitrii Marin et al.

CVPR 2024

#3520

Tune-An-Ellipse: CLIP Has Potential to Find What You Want

Jinheng Xie, Songhe Deng, Bing Li et al.

CVPR 2024highlight

#3521

Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks

Uranik Berisha, Jens Mehnert, Alexandru Paul Condurache

CVPR 2025arXiv:2505.15414

#3522

Doppelgängers and Adversarial Vulnerability

George Kamberov

CVPR 2025highlightarXiv:2410.13193

#3523

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

song yiran, Qianyu Zhou, Xiangtai Li et al.

CVPR 2024arXiv:2401.02317

#3524

MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention

Yuhan Wang, Fangzhou Hong, Shuai Yang et al.

CVPR 2025arXiv:2503.08664

#3525

Matrix-Free Shared Intrinsics Bundle Adjustment

Daniel Safari

CVPR 2025

#3526

Seeing More with Less: Human-like Representations in Vision Models

Andrey Gizdov, Shimon Ullman, Daniel Harari

CVPR 2025highlight

#3527

EarthLoc: Astronaut Photography Localization by Indexing Earth from Space

Gabriele Berton, Alex Stoken, Barbara Caputo et al.

CVPR 2024arXiv:2403.06758

#3528

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

Jianyi Wang, Zhijie Lin, Meng Wei et al.

CVPR 2025highlightarXiv:2501.01320

#3529

PairDETR : Joint Detection and Association of Human Bodies and Faces

Ammar Ali, Georgii Gaikov, Denis Rybalchenko et al.

CVPR 2024

#3530

Close Imitation of Expert Retouching for Black-and-White Photography

Seunghyun Shin, Jisu Shin, Jihwan Bae et al.

CVPR 2024

#3531

Chain of Semantics Programming in 3D Gaussian Splatting Representation for 3D Vision Grounding

Jiaxin Shi, Mingyue Xiang, Hao Sun et al.

CVPR 2025

#3532

OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos

Dongyoung Choi, Hyeonjoong Jang, Min H. Kim

CVPR 2024arXiv:2404.00676

#3533

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Kai Yang, Jian Tao, Jiafei Lyu et al.

CVPR 2024arXiv:2311.13231

#3534

Reconstructing Hands in 3D with Transformers

Georgios Pavlakos, Dandan Shan, Ilija Radosavovic et al.

CVPR 2024arXiv:2312.05251

#3535

XFeat: Accelerated Features for Lightweight Image Matching

Guilherme Potje, Felipe Cadar, André Araujo et al.

CVPR 2024arXiv:2404.19174

#3536

Systematic Comparison of Semi-supervised and Self-supervised Learning for Medical Image Classification

Zhe Huang, Ruijie Jiang, Shuchin Aeron et al.

CVPR 2024arXiv:2307.08919

#3537

GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation

WEIMING ZHANG, Yexin Liu, Xu Zheng et al.

CVPR 2024arXiv:2403.16370

#3538

AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction

Lingteng Qiu, Shenhao Zhu, Qi Zuo et al.

CVPR 2025arXiv:2412.02684

#3539

VRP-SAM: SAM with Visual Reference Prompt

Yanpeng Sun, Jiahui Chen, Shan Zhang et al.

CVPR 2024arXiv:2402.17726

#3540

Fuzzy Multimodal Learning for Trusted Cross-modal Retrieval

Siyuan Duan, Yuan Sun, Dezhong Peng et al.

CVPR 2025

#3541

NVILA: Efficient Frontier Visual Language Models

Zhijian Liu, Ligeng Zhu, Baifeng Shi et al.

CVPR 2025arXiv:2412.04468

#3542

DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis

Jiapeng Tang, Yinyu Nie, Lev Markhasin et al.

CVPR 2024arXiv:2303.14207

#3543

FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis

Wonjoon Jin, Qi Dai, Chong Luo et al.

CVPR 2025arXiv:2502.08244

#3544

Looking Similar Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning

Nikhil Singh, Chih-Wei Wu, Iroro Orife et al.

CVPR 2024arXiv:2304.05600

#3545

YOLO-World: Real-Time Open-Vocabulary Object Detection

Tianheng Cheng, Lin Song, Yixiao Ge et al.

CVPR 2024arXiv:2401.17270

#3546

Bézier Everywhere All at Once: Learning Drivable Lanes as Bézier Graphs

Hugh Blayney, Hanlin Tian, Hamish Scott et al.

CVPR 2024

#3547

UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation

Yinqiao Wang, Hao Xu, Pheng-Ann Heng et al.

CVPR 2025arXiv:2503.13303

#3548

Taming Self-Training for Open-Vocabulary Object Detection

Shiyu Zhao, Samuel Schulter, Long Zhao et al.

CVPR 2024arXiv:2308.06412

#3549

Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning

Hao Jiang, Bingfeng Zhou, Yadong Mu

CVPR 2024

#3550

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer et al.

CVPR 2024arXiv:2311.15826

#3551

FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation

Zijia Lu, Ehsan Elhamifar

CVPR 2024

#3552

Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images

Jie Mei, Chenyu Lin, Yu Qiu et al.

CVPR 2025arXiv:2503.17261

#3553

No Thing, Nothing: Highlighting Safety-Critical Classes for Robust LiDAR Semantic Segmentation in Adverse Weather

Junsung Park, HwiJeong Lee, Inha Kang et al.

CVPR 2025arXiv:2503.15910

#3554

Learning Partonomic 3D Reconstruction from Image Collections

Xiaoqian Ruan, Pei Yu, Dian Jia et al.

CVPR 2025

#3555

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang et al.

CVPR 2024arXiv:2312.02134

#3556

LOGICZSL: Exploring Logic-induced Representation for Compositional Zero-shot Learning

Peng Wu, Xiankai Lu, Hao Hu et al.

CVPR 2025

#3557

ShapeMatcher: Self-Supervised Joint Shape Canonicalization Segmentation Retrieval and Deformation

Yan Di, Chenyangguang Zhang, Chaowei Wang et al.

CVPR 2024

#3558

Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

Rui Li, Tobias Fischer, Mattia Segu et al.

CVPR 2024arXiv:2404.03658

#3559

SVDTree: Semantic Voxel Diffusion for Single Image Tree Reconstruction

Yuan Li, Zhihao Liu, Bedrich Benes et al.

CVPR 2024

#3560

3D Student Splatting and Scooping

Jialin Zhu, Jiangbei Yue, Feixiang He et al.

CVPR 2025arXiv:2503.10148

#3561

Error Detection in Egocentric Procedural Task Videos

Shih-Po Lee, Zijia Lu, Zekun Zhang et al.

CVPR 2024

#3562

Patch2Self2: Self-supervised Denoising on Coresets via Matrix Sketching

Shreyas Fadnavis, Agniva Chowdhury, Joshua Batson et al.

CVPR 2024

#3563

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

Ganggui Ding, Canyu Zhao, Wen Wang et al.

CVPR 2024arXiv:2405.13870

#3564

HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction

Yi ZHOU, Hui Zhang, Jiaqian Yu et al.

CVPR 2024arXiv:2403.08639

#3565

LEDiff: Latent Exposure Diffusion for HDR Generation

Chao Wang, Zhihao Xia, Thomas Leimkuehler et al.

CVPR 2025arXiv:2412.14456

#3566

Generative Unlearning for Any Identity

Juwon Seo, Sung-Hoon Lee, Tae-Young Lee et al.

CVPR 2024arXiv:2405.09879

#3567

SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers

Nikaan Nikzad, YI LIAO, Yongsheng Gao et al.

CVPR 2025arXiv:2409.19850

#3568

Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange

Yanhao Wu, Tong Zhang, Wei Ke et al.

CVPR 2024arXiv:2404.07504

#3569

Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition

Zihan Wang, Siyang Song, Cheng Luo et al.

CVPR 2024arXiv:2404.06443

#3570

Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation

Sixian Zhang, Xinyao Yu, Xinhang Song et al.

CVPR 2024

#3571

PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting

Cheng Zhang, Haofei Xu, Qianyi Wu et al.

CVPR 2025arXiv:2412.12096

#3572

SVDinsTN: A Tensor Network Paradigm for Efficient Structure Search from Regularized Modeling Perspective

Yu-Bang Zheng, Xile Zhao, Junhua Zeng et al.

CVPR 2024highlightarXiv:2305.14912

#3573

Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection

Ting Lei, Shaofeng Yin, Yang Liu

CVPR 2024arXiv:2404.06194

#3574

Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration

Tony C. W. MOK, Zi Li, Yunhao Bai et al.

CVPR 2024highlightarXiv:2402.18933

#3575

PoseIRM: Enhance 3D Human Pose Estimation on Unseen Camera Settings via Invariant Risk Minimization

Yanlu Cai, Weizhong Zhang, Yuan Wu et al.

CVPR 2024

#3576

On the Estimation of Image-matching Uncertainty in Visual Place Recognition

Mubariz Zaffar, Liangliang Nan, Julian F. P. Kooij

CVPR 2024highlightarXiv:2404.00546

#3577

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

shiyu xuan, Qingpei Guo, Ming Yang et al.

CVPR 2024arXiv:2310.00582

#3578

LoS: Local Structure-Guided Stereo Matching

Kunhong Li, Longguang Wang, Ye Zhang et al.

CVPR 2024

#3579

GS-2DGS: Geometrically Supervised 2DGS for Reflective Object Reconstruction

Jinguang Tong, Xuesong li, Fahira Afzal Maken et al.

CVPR 2025arXiv:2506.13110

#3580

RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation

Oded Bialer, Yuval Haitman

CVPR 2024arXiv:2404.18150

#3581

OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation

Jisoo Jeong, Hong Cai, Risheek Garrepalli et al.

CVPR 2024arXiv:2403.18092

#3582

FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer

Dongyeong Hwang, Hyunju Kim, Sunwoo Kim et al.

CVPR 2024arXiv:2403.12821

#3583

Mip-Splatting: Alias-free 3D Gaussian Splatting

Zehao Yu, Anpei Chen, Binbin Huang et al.

CVPR 2024arXiv:2311.16493

#3584

Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation

Guangyang Wu, Xiaohong Liu, Jun Jia et al.

CVPR 2024arXiv:2403.06452

#3585

HotSpot: Signed Distance Function Optimization with an Asymptotically Sufficient Condition

Zimo Wang, Cheng Wang, Taiki Yoshino et al.

CVPR 2025highlightarXiv:2411.14628

#3586

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding

chenkai zhang, Yiming Lei, Zeming Liu et al.

CVPR 2025arXiv:2504.21435

#3587

ProMark: Proactive Diffusion Watermarking for Causal Attribution

Vishal Asnani, John Collomosse, Tu Bui et al.

CVPR 2024arXiv:2403.09914

#3588

MMM: Generative Masked Motion Model

Ekkasit Pinyoanuntapong, Pu Wang, Minwoo Lee et al.

CVPR 2024highlightarXiv:2312.03596

#3589

Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts

Jiawen Zhu, Guansong Pang

CVPR 2024arXiv:2403.06495

#3590

DiffForensics: Leveraging Diffusion Prior to Image Forgery Detection and Localization

Zeqin Yu, Jiangqun Ni, Yuzhen Lin et al.

CVPR 2024

#3591

VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

Syed Talal Wasim, Muzammal Naseer, Salman Khan et al.

CVPR 2024

#3592

Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling

Liwen Wu, Sai Bi, Zexiang Xu et al.

CVPR 2024highlightarXiv:2405.14847

#3593

Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection

Jiangyi Wang, Na Zhao

CVPR 2025arXiv:2503.16125

#3594

SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

Gang Zhang, Chen Junnan, Guohuan Gao et al.

CVPR 2024arXiv:2403.05817

#3595

Sheared Backpropagation for Fine-tuning Foundation Models

Zhiyuan Yu, Li Shen, Liang Ding et al.

CVPR 2024

#3596

On the Content Bias in Fréchet Video Distance

Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar et al.

CVPR 2024arXiv:2404.12391

#3597

Text-Driven Fashion Image Editing with Compositional Concept Learning and Counterfactual Abduction

Shanshan Huang, Haoxuan Li, Chunyuan Zheng et al.

CVPR 2025

#3598

Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?

Aritra Dutta, Srijan Das, Jacob Nielsen et al.

CVPR 2024arXiv:2312.04548

#3599

Number it: Temporal Grounding Videos like Flipping Manga

Yongliang Wu, Xinting Hu, Yuyang Sun et al.

CVPR 2025arXiv:2411.10332

#3600

Autoregressive Sequential Pretraining for Visual Tracking

Shiyi Liang, Yifan Bai, Yihong Gong et al.

CVPR 2025

← Previous

1...16 17 18 19 20...28