Most Cited CVPR "video recognition" Papers

5,589 papers found • Page 18 of 28

#3401

Ungeneralizable Examples

Jingwen Ye, Xinchao Wang

CVPR 2024posterarXiv:2404.14016
#3402

Language-only Training of Zero-shot Composed Image Retrieval

Geonmo Gu, Sanghyuk Chun, Wonjae Kim et al.

CVPR 2024poster
#3403

Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models

Shitian Zhao, Zhuowan Li, YadongLu et al.

CVPR 2024highlightarXiv:2312.06685
#3404

Rapid Motor Adaptation for Robotic Manipulator Arms

Yichao Liang, Kevin Ellis, João F. Henriques

CVPR 2024posterarXiv:2312.04670
#3405

DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models

Haoyang Li, Liang Wang, Chao Wang et al.

CVPR 2025posterarXiv:2503.13443
#3406

Instruct-Imagen: Image Generation with Multi-modal Instruction

Hexiang Hu, Kelvin C.K. Chan, Yu-Chuan Su et al.

CVPR 2024posterarXiv:2401.01952
#3407

Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation

Dong Lao, Congli Wang, Alex Wong et al.

CVPR 2024highlightarXiv:2405.03662
#3408

Adapting to Length Shift: FlexiLength Network for Trajectory Prediction

Yi Xu, Yun Fu

CVPR 2024posterarXiv:2404.00742
#3409

FedCS: Coreset Selection for Federated Learning

Chenhe Hao, Weiying Xie, Daixun Li et al.

CVPR 2025poster
#3410

CausalPC: Improving the Robustness of Point Cloud Classification by Causal Effect Identification

Yuanmin Huang, Mi Zhang, Daizong Ding et al.

CVPR 2024poster
#3411

GraphI2P: Image-to-Point Cloud Registration with Exploring Pattern of Correspondence via Graph Learning

Lin Bie, Shouan Pan, Siqi Li et al.

CVPR 2025poster
#3412

LiSA: LiDAR Localization with Semantic Awareness

Bochun Yang, Zijun Li, Wen Li et al.

CVPR 2024highlight
#3413

Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation

Jin Wang, Bingfeng Zhang, Jian Pang et al.

CVPR 2024posterarXiv:2405.08458
#3414

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

Ke Fan, Zechen Bai, Tianjun Xiao et al.

CVPR 2024posterarXiv:2406.09196
#3415

Learning Coupled Dictionaries from Unpaired Data for Image Super-Resolution

Longguang Wang, Juncheng Li, Yingqian Wang et al.

CVPR 2024poster
#3416

C3: High-Performance and Low-Complexity Neural Compression from a Single Image or Video

Hyunjik Kim, Matthias Bauer, Lucas Theis et al.

CVPR 2024posterarXiv:2312.02753
#3417

AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing

Fan Yang, Tianyi Chen, XIAOSHENG HE et al.

CVPR 2024posterarXiv:2312.02209
#3418

iToF-flow-based High Frame Rate Depth Imaging

Yu Meng, Zhou Xue, Xu Chang et al.

CVPR 2024poster
#3419

Rethinking Human Motion Prediction with Symplectic Integral

Haipeng Chen, Kedi L yu, Zhenguang Liu et al.

CVPR 2024poster
#3420

Detector-Free Structure from Motion

Xingyi He, Jiaming Sun, Yifan Wang et al.

CVPR 2024posterarXiv:2306.15669
#3421

Holodeck: Language Guided Generation of 3D Embodied AI Environments

Yue Yang, Fan-Yun Sun, Luca Weihs et al.

CVPR 2024posterarXiv:2312.09067
#3422

DiVAS: Video and Audio Synchronization with Dynamic Frame Rates

Clara Maria Fernandez Labrador, Mertcan Akcay, Eitan Abecassis et al.

CVPR 2024poster
#3423

Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos

Chen Liu, Peike Li, Qingtao Yu et al.

CVPR 2024poster
#3424

Inter-X: Towards Versatile Human-Human Interaction Analysis

Liang Xu, Xintao Lv, Yichao Yan et al.

CVPR 2024posterarXiv:2312.16051
#3425

Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

Yijun Yang, Tianyi Zhou, kanxue Li et al.

CVPR 2024posterarXiv:2311.16714
#3426

One-Shot Open Affordance Learning with Foundation Models

Gen Li, Deqing Sun, Laura Sevilla-Lara et al.

CVPR 2024posterarXiv:2311.17776
#3427

SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks

Xinyu Shi, Zecheng Hao, Zhaofei Yu

CVPR 2024posterarXiv:2403.14302
#3428

FlexUOD: The Answer to Real-world Unsupervised Image Outlier Detection

Zhonghang Liu, Kun Zhou, Changshuo Wang et al.

CVPR 2025poster
#3429

Tactile-Augmented Radiance Fields

Yiming Dou, Fengyu Yang, Yi Liu et al.

CVPR 2024posterarXiv:2405.04534
#3430

Mean-Shift Feature Transformer

Takumi Kobayashi

CVPR 2024poster
#3431

Consistent Prompting for Rehearsal-Free Continual Learning

Zhanxin Gao, Jun Cen, Xiaobin Chang

CVPR 2024posterarXiv:2403.08568
#3432

KVQ: Kwai Video Quality Assessment for Short-form Videos

Yiting Lu, Xin Li, Yajing Pei et al.

CVPR 2024posterarXiv:2402.07220
#3433

Purified and Unified Steganographic Network

GuoBiao Li, Sheng Li, Zicong Luo et al.

CVPR 2024posterarXiv:2402.17210
#3434

Samba: A Unified Mamba-based Framework for General Salient Object Detection

Jiahao He, Keren Fu, Xiaohong Liu et al.

CVPR 2025highlight
#3435

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Alejandro Lozano, Min Woo Sun, James Burgess et al.

CVPR 2025posterarXiv:2501.07171
#3436

Collaborative Tree Search for Enhancing Embodied Multi-Agent Collaboration

Lizheng Zu, Lin Lin, Song Fu et al.

CVPR 2025poster
#3437

Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model

Dian Zheng, Xiao-Ming Wu, Shuzhou Yang et al.

CVPR 2024posterarXiv:2403.11157
#3438

FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error

Beilin Chu, Xuan Xu, Xin Wang et al.

CVPR 2025posterarXiv:2412.07140
#3439

Fast Adaptation for Human Pose Estimation via Meta-Optimization

Shengxiang Hu, Huaijiang Sun, Bin Li et al.

CVPR 2024poster
#3440

Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection

Jiawen Zhu, Choubo Ding, Yu Tian et al.

CVPR 2024posterarXiv:2310.12790
#3441

L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream

Jingtao Sun, Yaonan Wang, Mingtao Feng et al.

CVPR 2024poster
#3442

MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling

Xuzhe Zhang, Yuhao Wu, Elsa Angelini et al.

CVPR 2024posterarXiv:2303.09373
#3443

CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians

Chongjian GE, Chenfeng Xu, Yuanfeng Ji et al.

CVPR 2025posterarXiv:2410.20723
#3444

IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM

Minghao Yin, Shangzhe Wu, Kai Han

CVPR 2024poster
#3445

Dual Exposure Stereo for Extended Dynamic Range 3D Imaging

Juhyung Choi, Jinneyong Kim, Seokjun Choi et al.

CVPR 2025posterarXiv:2412.02351
#3446

Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens

Zhiwen Chen, Zhiyu Zhu, Yifan Zhang et al.

CVPR 2024poster
#3447

Boosting Image Quality Assessment through Efficient Transformer Adaptation with Local Feature Enhancement

Kangmin Xu, Liang Liao, Jing Xiao et al.

CVPR 2024poster
#3448

Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels

Zhuohong Li, Wei He, Jiepan Li et al.

CVPR 2024highlightarXiv:2403.02746
#3449

Multi-Modal Hallucination Control by Visual Information Grounding

Alessandro Favero, Luca Zancato, Matthew Trager et al.

CVPR 2024posterarXiv:2403.14003
#3450

Improved Monocular Depth Prediction Using Distance Transform Over Pre-semantic Contours with Self-supervised Neural Networks

Marwane Hariat, Antoine Manzanera, David Filliat

CVPR 2025poster
#3451

ERUPT: Efficient Rendering with Unposed Patch Transformer

Maxim Shugaev, Vincent Chen, Maxim Karrenbach et al.

CVPR 2025posterarXiv:2503.24374
#3452

Exploring Orthogonality in Open World Object Detection

Zhicheng Sun, Jinghan Li, Yadong Mu

CVPR 2024poster
#3453

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu et al.

CVPR 2024posterarXiv:2310.10624
#3454

SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer

Rui Zhu, Yingwei Pan, Yehao Li et al.

CVPR 2024posterarXiv:2403.17004
#3455

Traffic Scene Parsing through the TSP6K Dataset

Peng-Tao Jiang, Yuqi Yang, Yang Cao et al.

CVPR 2024posterarXiv:2303.02835
#3456

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary

Kevin Qinghong Lin, Mike Zheng Shou

CVPR 2025posterarXiv:2503.09402
#3457

KPConvX: Modernizing Kernel Point Convolution with Kernel Attention

Hugues Thomas, Yao-Hung Hubert Tsai, Timothy Barfoot et al.

CVPR 2024posterarXiv:2405.13194
#3458

Latency Correction for Event-guided Deblurring and Frame Interpolation

Yixin Yang, Jinxiu Liang, Bohan Yu et al.

CVPR 2024poster
#3459

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Muyang Li, Tianle Cai, Jiaxin Cao et al.

CVPR 2024highlightarXiv:2402.19481
#3460

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

Juhong Min, Shyamal Buch, Arsha Nagrani et al.

CVPR 2024posterarXiv:2404.06511
#3461

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models

Nastaran Saadati, Minh Pham, Nasla Saleem et al.

CVPR 2024posterarXiv:2404.08079
#3462

NTO3D: Neural Target Object 3D Reconstruction with Segment Anything

Xiaobao Wei, Renrui Zhang, Jiarui Wu et al.

CVPR 2024posterarXiv:2309.12790
#3463

Redefining <Creative> in Dictionary: Towards an Enhanced Semantic Understanding of Creative Generation

Fu Feng, Yucheng Xie, Xu Yang et al.

CVPR 2025posterarXiv:2410.24160
#3464

Text-Driven Image Editing via Learnable Regions

Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai et al.

CVPR 2024posterarXiv:2311.16432
#3465

ZERO-IG: Zero-Shot Illumination-Guided Joint Denoising and Adaptive Enhancement for Low-Light Images

Yiqi Shi, Duo Liu, Liguo Zhang et al.

CVPR 2024poster
#3466

Self-Supervised Representation Learning from Arbitrary Scenarios

Zhaowen Li, Yousong Zhu, Zhiyang Chen et al.

CVPR 2024poster
#3467

Variance-Based Membership Inference Attacks Against Large-Scale Image Captioning Models

Daniel Samira, Edan Habler, Yuval Elovici et al.

CVPR 2025poster
#3468

Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation

Libiao Chen, Dong Nie, Junjun Pan et al.

CVPR 2025poster
#3469

Rethinking Multi-domain Generalization with A General Learning Objective

Zhaorui Tan, Xi Yang, Kaizhu Huang

CVPR 2024posterarXiv:2402.18853
#3470

Camera Resection from Known Line Pencils and a Radially Distorted Scanline

Juan Carlos Dibene Simental, Enrique Dunn

CVPR 2025poster
#3471

Adversarial Distillation Based on Slack Matching and Attribution Region Alignment

Shenglin Yin, Zhen Xiao, Mingxuan Song et al.

CVPR 2024poster
#3472

SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons

Yuanyou Xu, Zongxin Yang, Yi Yang

CVPR 2025highlight
#3473

The Neglected Tails in Vision-Language Models

Shubham Parashar, Tian Liu, Zhiqiu Lin et al.

CVPR 2024posterarXiv:2401.12425
#3474

Closest Neighbors are Harmful for Lightweight Masked Auto-encoders

Jian Meng, Ahmed Hasssan, Li Yang et al.

CVPR 2025poster
#3475

Multi-View Attentive Contextualization for Multi-View 3D Object Detection

Xianpeng Liu, Ce Zheng, Ming Qian et al.

CVPR 2024posterarXiv:2405.12200
#3476

SODA: Bottleneck Diffusion Models for Representation Learning

Drew Hudson, Daniel Zoran, Mateusz Malinowski et al.

CVPR 2024posterarXiv:2311.17901
#3477

Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection

Jikang Cheng, Zhiyuan Yan, Ying Zhang et al.

CVPR 2025posterarXiv:2411.11396
#3478

AHIVE: Anatomy-aware Hierarchical Vision Encoding for Interactive Radiology Report Retrieval

Sixing Yan, William K. Cheung, Ivor Tsang et al.

CVPR 2024poster
#3479

SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh Deformation

Yanzhe Liu, Rong Chen, Yushi Li et al.

CVPR 2024poster
#3480

Enhancing the Power of OOD Detection via Sample-Aware Model Selection

Feng Xue, Zi He, Yuan Zhang et al.

CVPR 2024poster
#3481

Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations

Kewei Wang, Yizheng Wu, Jun Cen et al.

CVPR 2024posterarXiv:2403.13261
#3482

RDD: Robust Feature Detector and Descriptor using Deformable Transformer

Gonglin Chen, Tianwen Fu, Haiwei Chen et al.

CVPR 2025posterarXiv:2505.08013
#3483

MMA: Multi-Modal Adapter for Vision-Language Models

Lingxiao Yang, Ru-Yuan Zhang, Yanchen Wang et al.

CVPR 2024poster
#3484

Grounding and Enhancing Grid-based Models for Neural Fields

Zelin Zhao, FENGLEI FAN, Wenlong Liao et al.

CVPR 2024posterarXiv:2403.20002
#3485

FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering

Guofeng Feng, Siyan Chen, Rong Fu et al.

CVPR 2025posterarXiv:2408.07967
#3486

A Category Agnostic Model for Visual Rearrangment

Yuyi Liu, Xinhang Song, Weijie Li et al.

CVPR 2024poster
#3487

Towards More Unified In-context Visual Understanding

Dianmo Sheng, Dongdong Chen, Zhentao Tan et al.

CVPR 2024posterarXiv:2312.02520
#3488

Towards Progressive Multi-Frequency Representation for Image Warping

Jun Xiao, Zihang Lyu, Cong Zhang et al.

CVPR 2024poster
#3489

Fourier-basis Functions to Bridge Augmentation Gap: Rethinking Frequency Augmentation in Image Classification

Mei Vaish, Shunxin Wang, Nicola Strisciuglio

CVPR 2024posterarXiv:2403.01944
#3490

Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning

Hasin Us Sami, Swapneel Sen, Amit K. Roy-Chowdhury et al.

CVPR 2025posterarXiv:2506.04453
#3491

VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction

Jiaqi Lin, Zhihao Li, Xiao Tang et al.

CVPR 2024posterarXiv:2402.17427
#3492

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation

Yihua Cheng, Yaning Zhu, Zongji Wang et al.

CVPR 2024posterarXiv:2403.15664
#3493

Molecular Data Programming: Towards Molecule Pseudo-labeling with Systematic Weak Supervision

Xin Juan, Kaixiong Zhou, Ninghao Liu et al.

CVPR 2024poster
#3494

OTE: Exploring Accurate Scene Text Recognition Using One Token

Jianjun Xu, Yuxin Wang, Hongtao Xie et al.

CVPR 2024poster
#3495

TTA-EVF: Test-Time Adaptation for Event-based Video Frame Interpolation via Reliable Pixel and Sample Estimation

Hoonhee Cho, Taewoo Kim, Yuhwan Jeong et al.

CVPR 2024poster
#3496

HUGS: Human Gaussian Splats

Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel et al.

CVPR 2024posterarXiv:2311.17910
#3497

TransPixeler: Advancing Text-to-Video Generation with Transparency

Luozhou Wang, Yijun Li, ZhiFei Chen et al.

CVPR 2025posterarXiv:2501.03006
#3498

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

Yushi Hu, Otilia Stretcu, Chun-Ta Lu et al.

CVPR 2024posterarXiv:2312.03052
#3499

Hybrid Reciprocal Transformer with Triplet Feature Alignment for Scene Graph Generation

Jiawei Fu, ZHANG Tiantian, Kai Chen et al.

CVPR 2025poster
#3500

Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera

Jiye Lee, Hanbyul Joo

CVPR 2024posterarXiv:2401.00847
#3501

Factored-NeuS: Reconstructing Surfaces, Illumination, and Materials of Possibly Glossy Objects

Yue Fan, Ningjing Fan, Ivan Skorokhodov et al.

CVPR 2025posterarXiv:2305.17929
#3502

DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses

Chen Zhao, Tong Zhang, Zheng Dang et al.

CVPR 2024poster
#3503

Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model

Yuxiang Mao, Zhenfeng Fan, Zhijie Zhang et al.

CVPR 2025poster
#3504

AVID: Any-Length Video Inpainting with Diffusion Model

Zhixing Zhang, Bichen Wu, Xiaoyan Wang et al.

CVPR 2024posterarXiv:2312.03816
#3505

Hyper-MD: Mesh Denoising with Customized Parameters Aware of Noise Intensity and Geometric Characteristics

Xingtao Wang, Hongliang Wei, Xiaopeng Fan et al.

CVPR 2024poster
#3506

SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction

Yutao Tang, Yuxiang Guo, Deming Li et al.

CVPR 2025posterarXiv:2411.12592
#3507

Let's Chorus: Partner-aware Hybrid Song-Driven 3D Head Animation

Xiumei Xie, Zikai Huang, Wenhao Xu et al.

CVPR 2025poster
#3508

KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation

Jihua Peng, Yanghong Zhou, Tracy P Y Mok

CVPR 2024posterarXiv:2404.00658
#3509

Dense-SfM: Structure from Motion with Dense Consistent Matching

JongMin Lee, Sungjoo Yoo

CVPR 2025posterarXiv:2501.14277
#3510

MFP: Making Full Use of Probability Maps for Interactive Image Segmentation

Chaewon Lee, Seon-Ho Lee, Chang-Su Kim

CVPR 2024posterarXiv:2404.18448
#3511

LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos

Daniel Etaat, Dvij Rajesh Kalaria, Nima Rahmanian et al.

CVPR 2025posterarXiv:2503.20936
#3512

Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining

Jiahao Nie, Yun Xing, Gongjie Zhang et al.

CVPR 2024posterarXiv:2401.08407
#3513

Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning

Zhengwei Fang, Rui Wang, Tao Huang et al.

CVPR 2024highlightarXiv:2209.11964
#3514

An Empirical Study of Scaling Law for Scene Text Recognition

Miao Rang, Zhenni Bi, Chuanjian Liu et al.

CVPR 2024poster
#3515

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

Xianqi Wang, Gangwei Xu, Hao Jia et al.

CVPR 2024highlightarXiv:2403.00486
#3516

When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation

Xiaoming Li, Xinyu Hou, Chen Change Loy

CVPR 2024poster
#3517

Effortless Active Labeling for Long-Term Test-Time Adaptation

Guowei Wang, Changxing Ding

CVPR 2025posterarXiv:2503.14564
#3518

Differentiable Neural Surface Refinement for Modeling Transparent Objects

Weijian Deng, Dylan Campbell, Chunyi Sun et al.

CVPR 2024poster
#3519

Can Machines Understand Composition? Dataset and Benchmark for Photographic Image Composition Embedding and Understanding

Zhaoran Zhao, Peng Lu, Anran Zhang et al.

CVPR 2025highlight
#3520

Low-power Continuous Remote Behavioral Localization with Event Cameras

Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez et al.

CVPR 2024posterarXiv:2312.03799
#3521

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos

Felix Wimbauer, Weirong Chen, Dominik Muhle et al.

CVPR 2025posterarXiv:2503.23282
#3522

Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting

Taeho Kang, Youngki Lee

CVPR 2024highlightarXiv:2402.18330
#3523

AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One

Mike Ranzinger, Greg Heinrich, Jan Kautz et al.

CVPR 2024posterarXiv:2312.06709
#3524

Towards Co-Evaluation of Cameras HDR and Algorithms for Industrial-Grade 6DoF Pose Estimation

Agastya Kalra, Guy Stoppi, Dmitrii Marin et al.

CVPR 2024poster
#3525

Tune-An-Ellipse: CLIP Has Potential to Find What You Want

Jinheng Xie, Songhe Deng, Bing Li et al.

CVPR 2024highlight
#3526

SDBF: Steep-Decision-Boundary Fingerprinting for Hard-Label Tampering Detection of DNN Models

Xiaofan Bai, Shixin Li, Xiaojing Ma et al.

CVPR 2025poster
#3527

RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images

Junjin Xiao, Qing Zhang, Yongwei Nie et al.

CVPR 2025posterarXiv:2503.14198
#3528

IRIS: Inverse Rendering of Indoor Scenes from Low Dynamic Range Images

Chih-Hao Lin, Jia-Bin Huang, Zhengqin Li et al.

CVPR 2025posterarXiv:2401.12977
#3529

Gromov–Wasserstein Problem with Cyclic Symmetry

Shoichiro Takeda, Yasunori Akagi

CVPR 2025poster
#3530

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

song yiran, Qianyu Zhou, Xiangtai Li et al.

CVPR 2024posterarXiv:2401.02317
#3531

CASP: Consistency-aware Audio-induced Saliency Prediction Model for Omnidirectional Video

Zhaolin Wan, Han Qin, Zhiyang Li et al.

CVPR 2025poster
#3532

A Universal Scale-Adaptive Deformable Transformer for Image Restoration across Diverse Artifacts

Xuyi He, Yuhui Quan, Ruotao Xu et al.

CVPR 2025poster
#3533

Neural Inverse Rendering from Propagating Light

Anagh Malik, Benjamin Attal, Andrew Xie et al.

CVPR 2025posterarXiv:2506.05347
#3534

ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation

Ali Athar, Xueqing Deng, Liang-Chieh Chen

CVPR 2025posterarXiv:2412.09754
#3535

EarthLoc: Astronaut Photography Localization by Indexing Earth from Space

Gabriele Berton, Alex Stoken, Barbara Caputo et al.

CVPR 2024posterarXiv:2403.06758
#3536

A4A: Adapter for Adapter Transfer via All-for-All Mapping for Cross-Architecture Models

Keyu Tu, Mengqi Huang, Zhuowei Chen et al.

CVPR 2025poster
#3537

PairDETR : Joint Detection and Association of Human Bodies and Faces

Ammar Ali, Georgii Gaikov, Denis Rybalchenko et al.

CVPR 2024poster
#3538

Close Imitation of Expert Retouching for Black-and-White Photography

Seunghyun Shin, Jisu Shin, Jihwan Bae et al.

CVPR 2024poster
#3539

OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos

Dongyoung Choi, Hyeonjoong Jang, Min H. Kim

CVPR 2024posterarXiv:2404.00676
#3540

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Kai Yang, Jian Tao, Jiafei Lyu et al.

CVPR 2024posterarXiv:2311.13231
#3541

Towards Precise Embodied Dialogue Localization via Causality Guided Diffusion

Haoyu Wang, Le Wang, Sanping Zhou et al.

CVPR 2025poster
#3542

Reconstructing Hands in 3D with Transformers

Georgios Pavlakos, Dandan Shan, Ilija Radosavovic et al.

CVPR 2024posterarXiv:2312.05251
#3543

XFeat: Accelerated Features for Lightweight Image Matching

Guilherme Potje, Felipe Cadar, André Araujo et al.

CVPR 2024posterarXiv:2404.19174
#3544

Systematic Comparison of Semi-supervised and Self-supervised Learning for Medical Image Classification

Zhe Huang, Ruijie Jiang, Shuchin Aeron et al.

CVPR 2024posterarXiv:2307.08919
#3545

GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation

WEIMING ZHANG, Yexin Liu, Xu Zheng et al.

CVPR 2024posterarXiv:2403.16370
#3546

VRP-SAM: SAM with Visual Reference Prompt

Yanpeng Sun, Jiahui Chen, Shan Zhang et al.

CVPR 2024posterarXiv:2402.17726
#3547

Disentangling Safe and Unsafe Image Corruptions via Anisotropy and Locality

Ramchandran Muthukumar, Ambar Pal, Jeremias Sulam et al.

CVPR 2025poster
#3548

DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis

Jiapeng Tang, Yinyu Nie, Lev Markhasin et al.

CVPR 2024posterarXiv:2303.14207
#3549

Looking Similar Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning

Nikhil Singh, Chih-Wei Wu, Iroro Orife et al.

CVPR 2024posterarXiv:2304.05600
#3550

Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks

Uranik Berisha, Jens Mehnert, Alexandru Paul Condurache

CVPR 2025posterarXiv:2505.15414
#3551

Doppelgängers and Adversarial Vulnerability

George Kamberov

CVPR 2025highlightarXiv:2410.13193
#3552

YOLO-World: Real-Time Open-Vocabulary Object Detection

Tianheng Cheng, Lin Song, Yixiao Ge et al.

CVPR 2024posterarXiv:2401.17270
#3553

Bézier Everywhere All at Once: Learning Drivable Lanes as Bézier Graphs

Hugh Blayney, Hanlin Tian, Hamish Scott et al.

CVPR 2024poster
#3554

MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention

Yuhan Wang, Fangzhou Hong, Shuai Yang et al.

CVPR 2025posterarXiv:2503.08664
#3555

Taming Self-Training for Open-Vocabulary Object Detection

Shiyu Zhao, Samuel Schulter, Long Zhao et al.

CVPR 2024posterarXiv:2308.06412
#3556

Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning

Hao Jiang, Bingfeng Zhou, Yadong Mu

CVPR 2024poster
#3557

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer et al.

CVPR 2024posterarXiv:2311.15826
#3558

FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation

Zijia Lu, Ehsan Elhamifar

CVPR 2024poster
#3559

Matrix-Free Shared Intrinsics Bundle Adjustment

Daniel Safari

CVPR 2025poster
#3560

Seeing More with Less: Human-like Representations in Vision Models

Andrey Gizdov, Shimon Ullman, Daniel Harari

CVPR 2025highlight
#3561

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

Jianyi Wang, Zhijie Lin, Meng Wei et al.

CVPR 2025highlightarXiv:2501.01320
#3562

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang et al.

CVPR 2024posterarXiv:2312.02134
#3563

Chain of Semantics Programming in 3D Gaussian Splatting Representation for 3D Vision Grounding

Jiaxin Shi, Mingyue Xiang, Hao Sun et al.

CVPR 2025poster
#3564

ShapeMatcher: Self-Supervised Joint Shape Canonicalization Segmentation Retrieval and Deformation

Yan Di, Chenyangguang Zhang, Chaowei Wang et al.

CVPR 2024poster
#3565

Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

Rui Li, Tobias Fischer, Mattia Segu et al.

CVPR 2024posterarXiv:2404.03658
#3566

SVDTree: Semantic Voxel Diffusion for Single Image Tree Reconstruction

Yuan Li, Zhihao Liu, Bedrich Benes et al.

CVPR 2024poster
#3567

AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction

Lingteng Qiu, Shenhao Zhu, Qi Zuo et al.

CVPR 2025posterarXiv:2412.02684
#3568

Fuzzy Multimodal Learning for Trusted Cross-modal Retrieval

Siyuan Duan, Yuan Sun, Dezhong Peng et al.

CVPR 2025poster
#3569

Patch2Self2: Self-supervised Denoising on Coresets via Matrix Sketching

Shreyas Fadnavis, Agniva Chowdhury, Joshua Batson et al.

CVPR 2024poster
#3570

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

Ganggui Ding, Canyu Zhao, Wen Wang et al.

CVPR 2024posterarXiv:2405.13870
#3571

SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers

Nikaan Nikzad, YI LIAO, Yongsheng Gao et al.

CVPR 2025posterarXiv:2409.19850
#3572

NVILA: Efficient Frontier Visual Language Models

Zhijian Liu, Ligeng Zhu, Baifeng Shi et al.

CVPR 2025posterarXiv:2412.04468
#3573

Generative Unlearning for Any Identity

Juwon Seo, Sung-Hoon Lee, Tae-Young Lee et al.

CVPR 2024posterarXiv:2405.09879
#3574

Traceable Federated Continual Learning

Qiang Wang, Bingyan Liu, Yawen Li

CVPR 2024poster
#3575

Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange

Yanhao Wu, Tong Zhang, Wei Ke et al.

CVPR 2024posterarXiv:2404.07504
#3576

FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis

Wonjoon Jin, Qi Dai, Chong Luo et al.

CVPR 2025posterarXiv:2502.08244
#3577

Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition

Zihan Wang, Siyang Song, Cheng Luo et al.

CVPR 2024posterarXiv:2404.06443
#3578

Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation

Sixian Zhang, Xinyao Yu, Xinhang Song et al.

CVPR 2024poster
#3579

SVDinsTN: A Tensor Network Paradigm for Efficient Structure Search from Regularized Modeling Perspective

Yu-Bang Zheng, Xile Zhao, Junhua Zeng et al.

CVPR 2024highlightarXiv:2305.14912
#3580

Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection

Ting Lei, Shaofeng Yin, Yang Liu

CVPR 2024posterarXiv:2404.06194
#3581

Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration

Tony C. W. MOK, Zi Li, Yunhao Bai et al.

CVPR 2024highlightarXiv:2402.18933
#3582

PoseIRM: Enhance 3D Human Pose Estimation on Unseen Camera Settings via Invariant Risk Minimization

Yanlu Cai, Weizhong Zhang, Yuan Wu et al.

CVPR 2024poster
#3583

On the Estimation of Image-matching Uncertainty in Visual Place Recognition

Mubariz Zaffar, Liangliang Nan, Julian F. P. Kooij

CVPR 2024highlightarXiv:2404.00546
#3584

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

shiyu xuan, Qingpei Guo, Ming Yang et al.

CVPR 2024posterarXiv:2310.00582
#3585

LoS: Local Structure-Guided Stereo Matching

Kunhong Li, Longguang Wang, Ye Zhang et al.

CVPR 2024poster
#3586

UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation

Yinqiao Wang, Hao Xu, Pheng-Ann Heng et al.

CVPR 2025posterarXiv:2503.13303
#3587

RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation

Oded Bialer, Yuval Haitman

CVPR 2024posterarXiv:2404.18150
#3588

OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation

Jisoo Jeong, Hong Cai, Risheek Garrepalli et al.

CVPR 2024posterarXiv:2403.18092
#3589

FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer

Dongyeong Hwang, Hyunju Kim, Sunwoo Kim et al.

CVPR 2024posterarXiv:2403.12821
#3590

Mip-Splatting: Alias-free 3D Gaussian Splatting

Zehao Yu, Anpei Chen, Binbin Huang et al.

CVPR 2024posterarXiv:2311.16493
#3591

Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation

Guangyang Wu, Xiaohong Liu, Jun Jia et al.

CVPR 2024posterarXiv:2403.06452
#3592

Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images

Jie Mei, Chenyu Lin, Yu Qiu et al.

CVPR 2025posterarXiv:2503.17261
#3593

ProMark: Proactive Diffusion Watermarking for Causal Attribution

Vishal Asnani, John Collomosse, Tu Bui et al.

CVPR 2024posterarXiv:2403.09914
#3594

MMM: Generative Masked Motion Model

Ekkasit Pinyoanuntapong, Pu Wang, Minwoo Lee et al.

CVPR 2024highlightarXiv:2312.03596
#3595

Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts

Jiawen Zhu, Guansong Pang

CVPR 2024posterarXiv:2403.06495
#3596

DiffForensics: Leveraging Diffusion Prior to Image Forgery Detection and Localization

Zeqin Yu, Jiangqun Ni, Yuzhen Lin et al.

CVPR 2024poster
#3597

VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

Syed Talal Wasim, Muzammal Naseer, Salman Khan et al.

CVPR 2024poster
#3598

No Thing, Nothing: Highlighting Safety-Critical Classes for Robust LiDAR Semantic Segmentation in Adverse Weather

Junsung Park, HwiJeong Lee, Inha Kang et al.

CVPR 2025posterarXiv:2503.15910
#3599

Learning Partonomic 3D Reconstruction from Image Collections

Xiaoqian Ruan, Pei Yu, Dian Jia et al.

CVPR 2025poster
#3600

Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling

Liwen Wu, Sai Bi, Zexiang Xu et al.

CVPR 2024highlightarXiv:2405.14847