Most Cited 2024 "video recognition" Papers

12,324 papers found • Page 53 of 62

Filters:Most Cited 2024 video recognition Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

#10401

CausalPC: Improving the Robustness of Point Cloud Classification by Causal Effect Identification

Yuanmin Huang, Mi Zhang, Daizong Ding et al.

CVPR 2024

#10402

LiSA: LiDAR Localization with Semantic Awareness

Bochun Yang, Zijun Li, Wen Li et al.

CVPR 2024highlight

#10403

Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation

Jin Wang, Bingfeng Zhang, Jian Pang et al.

CVPR 2024arXiv:2405.08458

#10404

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

Ke Fan, Zechen Bai, Tianjun Xiao et al.

CVPR 2024arXiv:2406.09196

#10405

Learning Coupled Dictionaries from Unpaired Data for Image Super-Resolution

Longguang Wang, Juncheng Li, Yingqian Wang et al.

CVPR 2024

#10406

C3: High-Performance and Low-Complexity Neural Compression from a Single Image or Video

Hyunjik Kim, Matthias Bauer, Lucas Theis et al.

CVPR 2024arXiv:2312.02753

#10407

AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing

Fan Yang, Tianyi Chen, XIAOSHENG HE et al.

CVPR 2024arXiv:2312.02209

#10408

iToF-flow-based High Frame Rate Depth Imaging

Yu Meng, Zhou Xue, Xu Chang et al.

CVPR 2024

#10409

Rethinking Human Motion Prediction with Symplectic Integral

Haipeng Chen, Kedi L yu, Zhenguang Liu et al.

CVPR 2024

#10410

Detector-Free Structure from Motion

Xingyi He, Jiaming Sun, Yifan Wang et al.

CVPR 2024arXiv:2306.15669

#10411

Holodeck: Language Guided Generation of 3D Embodied AI Environments

Yue Yang, Fan-Yun Sun, Luca Weihs et al.

CVPR 2024arXiv:2312.09067

#10412

DiVAS: Video and Audio Synchronization with Dynamic Frame Rates

Clara Maria Fernandez Labrador, Mertcan Akcay, Eitan Abecassis et al.

CVPR 2024

#10413

Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos

Chen Liu, Peike Li, Qingtao Yu et al.

CVPR 2024

#10414

Inter-X: Towards Versatile Human-Human Interaction Analysis

Liang Xu, Xintao Lv, Yichao Yan et al.

CVPR 2024arXiv:2312.16051

#10415

Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

Yijun Yang, Tianyi Zhou, kanxue Li et al.

CVPR 2024arXiv:2311.16714

#10416

One-Shot Open Affordance Learning with Foundation Models

Gen Li, Deqing Sun, Laura Sevilla-Lara et al.

CVPR 2024arXiv:2311.17776

#10417

SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks

Xinyu Shi, Zecheng Hao, Zhaofei Yu

CVPR 2024arXiv:2403.14302

#10418

Tactile-Augmented Radiance Fields

Yiming Dou, Fengyu Yang, Yi Liu et al.

CVPR 2024arXiv:2405.04534

#10419

Mean-Shift Feature Transformer

Takumi Kobayashi

CVPR 2024

#10420

Consistent Prompting for Rehearsal-Free Continual Learning

Zhanxin Gao, Jun Cen, Xiaobin Chang

CVPR 2024arXiv:2403.08568

#10421

KVQ: Kwai Video Quality Assessment for Short-form Videos

Yiting Lu, Xin Li, Yajing Pei et al.

CVPR 2024arXiv:2402.07220

#10422

Purified and Unified Steganographic Network

GuoBiao Li, Sheng Li, Zicong Luo et al.

CVPR 2024arXiv:2402.17210

#10423

PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation

Ardian Umam, Cheng-Kun Yang, Min-Hung Chen et al.

CVPR 2024arXiv:2312.04016

#10424

Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model

Dian Zheng, Xiao-Ming Wu, Shuzhou Yang et al.

CVPR 2024arXiv:2403.11157

#10425

Fast Adaptation for Human Pose Estimation via Meta-Optimization

Shengxiang Hu, Huaijiang Sun, Bin Li et al.

CVPR 2024

#10426

Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection

Jiawen Zhu, Choubo Ding, Yu Tian et al.

CVPR 2024arXiv:2310.12790

#10427

L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream

Jingtao Sun, Yaonan Wang, Mingtao Feng et al.

CVPR 2024

#10428

MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling

Xuzhe Zhang, Yuhao Wu, Elsa Angelini et al.

CVPR 2024arXiv:2303.09373

#10429

IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM

Minghao Yin, Shangzhe Wu, Kai Han

CVPR 2024

#10430

Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens

Zhiwen Chen, Zhiyu Zhu, Yifan Zhang et al.

CVPR 2024

#10431

Boosting Image Quality Assessment through Efficient Transformer Adaptation with Local Feature Enhancement

Kangmin Xu, Liang Liao, Jing Xiao et al.

CVPR 2024

#10432

Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels

Zhuohong Li, Wei He, Jiepan Li et al.

CVPR 2024highlightarXiv:2403.02746

#10433

Multi-Modal Hallucination Control by Visual Information Grounding

Alessandro Favero, Luca Zancato, Matthew Trager et al.

CVPR 2024arXiv:2403.14003

#10434

Exploring Orthogonality in Open World Object Detection

Zhicheng Sun, Jinghan Li, Yadong Mu

CVPR 2024

#10435

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu et al.

CVPR 2024arXiv:2310.10624

#10436

SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer

Rui Zhu, Yingwei Pan, Yehao Li et al.

CVPR 2024arXiv:2403.17004

#10437

Traffic Scene Parsing through the TSP6K Dataset

Peng-Tao Jiang, Yuqi Yang, Yang Cao et al.

CVPR 2024arXiv:2303.02835

#10438

KPConvX: Modernizing Kernel Point Convolution with Kernel Attention

Hugues Thomas, Yao-Hung Hubert Tsai, Timothy Barfoot et al.

CVPR 2024arXiv:2405.13194

#10439

Latency Correction for Event-guided Deblurring and Frame Interpolation

Yixin Yang, Jinxiu Liang, Bohan Yu et al.

CVPR 2024

#10440

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Muyang Li, Tianle Cai, Jiaxin Cao et al.

CVPR 2024highlightarXiv:2402.19481

#10441

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

Juhong Min, Shyamal Buch, Arsha Nagrani et al.

CVPR 2024arXiv:2404.06511

#10442

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models

Nastaran Saadati, Minh Pham, Nasla Saleem et al.

CVPR 2024arXiv:2404.08079

#10443

NTO3D: Neural Target Object 3D Reconstruction with Segment Anything

Xiaobao Wei, Renrui Zhang, Jiarui Wu et al.

CVPR 2024arXiv:2309.12790

#10444

Text-Driven Image Editing via Learnable Regions

Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai et al.

CVPR 2024arXiv:2311.16432

#10445

ZERO-IG: Zero-Shot Illumination-Guided Joint Denoising and Adaptive Enhancement for Low-Light Images

Yiqi Shi, Duo Liu, Liguo Zhang et al.

CVPR 2024

#10446

Self-Supervised Representation Learning from Arbitrary Scenarios

Zhaowen Li, Yousong Zhu, Zhiyang Chen et al.

CVPR 2024

#10447

Rethinking Multi-domain Generalization with A General Learning Objective

Zhaorui Tan, Xi Yang, Kaizhu Huang

CVPR 2024arXiv:2402.18853

#10448

Adversarial Distillation Based on Slack Matching and Attribution Region Alignment

Shenglin Yin, Zhen Xiao, Mingxuan Song et al.

CVPR 2024

#10449

The Neglected Tails in Vision-Language Models

Shubham Parashar, Tian Liu, Zhiqiu Lin et al.

CVPR 2024arXiv:2401.12425

#10450

Multi-View Attentive Contextualization for Multi-View 3D Object Detection

Xianpeng Liu, Ce Zheng, Ming Qian et al.

CVPR 2024arXiv:2405.12200

#10451

SODA: Bottleneck Diffusion Models for Representation Learning

Drew Hudson, Daniel Zoran, Mateusz Malinowski et al.

CVPR 2024arXiv:2311.17901

#10452

AHIVE: Anatomy-aware Hierarchical Vision Encoding for Interactive Radiology Report Retrieval

Sixing Yan, William K. Cheung, Ivor Tsang et al.

CVPR 2024

#10453

SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh Deformation

Yanzhe Liu, Rong Chen, Yushi Li et al.

CVPR 2024

#10454

Enhancing the Power of OOD Detection via Sample-Aware Model Selection

Feng Xue, Zi He, Yuan Zhang et al.

CVPR 2024

#10455

Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations

Kewei Wang, Yizheng Wu, Jun Cen et al.

CVPR 2024arXiv:2403.13261

#10456

MMA: Multi-Modal Adapter for Vision-Language Models

Lingxiao Yang, Ru-Yuan Zhang, Yanchen Wang et al.

CVPR 2024

#10457

Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment

Zheren Fu, Lei Zhang, Hou Xia et al.

CVPR 2024

#10458

Grounding and Enhancing Grid-based Models for Neural Fields

Zelin Zhao, FENGLEI FAN, Wenlong Liao et al.

CVPR 2024arXiv:2403.20002

#10459

A Category Agnostic Model for Visual Rearrangment

Yuyi Liu, Xinhang Song, Weijie Li et al.

CVPR 2024

#10460

Towards More Unified In-context Visual Understanding

Dianmo Sheng, Dongdong Chen, Zhentao Tan et al.

CVPR 2024arXiv:2312.02520

#10461

Towards Progressive Multi-Frequency Representation for Image Warping

Jun Xiao, Zihang Lyu, Cong Zhang et al.

CVPR 2024

#10462

Fourier-basis Functions to Bridge Augmentation Gap: Rethinking Frequency Augmentation in Image Classification

Mei Vaish, Shunxin Wang, Nicola Strisciuglio

CVPR 2024arXiv:2403.01944

#10463

VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction

Jiaqi Lin, Zhihao Li, Xiao Tang et al.

CVPR 2024arXiv:2402.17427

#10464

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation

Yihua Cheng, Yaning Zhu, Zongji Wang et al.

CVPR 2024arXiv:2403.15664

#10465

Molecular Data Programming: Towards Molecule Pseudo-labeling with Systematic Weak Supervision

Xin Juan, Kaixiong Zhou, Ninghao Liu et al.

CVPR 2024

#10466

OTE: Exploring Accurate Scene Text Recognition Using One Token

Jianjun Xu, Yuxin Wang, Hongtao Xie et al.

CVPR 2024

#10467

TTA-EVF: Test-Time Adaptation for Event-based Video Frame Interpolation via Reliable Pixel and Sample Estimation

Hoonhee Cho, Taewoo Kim, Yuhwan Jeong et al.

CVPR 2024

#10468

HUGS: Human Gaussian Splats

Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel et al.

CVPR 2024arXiv:2311.17910

#10469

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

Yushi Hu, Otilia Stretcu, Chun-Ta Lu et al.

CVPR 2024arXiv:2312.03052

#10470

Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera

Jiye Lee, Hanbyul Joo

CVPR 2024arXiv:2401.00847

#10471

DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses

Chen Zhao, Tong Zhang, Zheng Dang et al.

CVPR 2024

#10472

AVID: Any-Length Video Inpainting with Diffusion Model

Zhixing Zhang, Bichen Wu, Xiaoyan Wang et al.

CVPR 2024arXiv:2312.03816

#10473

Hyper-MD: Mesh Denoising with Customized Parameters Aware of Noise Intensity and Geometric Characteristics

Xingtao Wang, Hongliang Wei, Xiaopeng Fan et al.

CVPR 2024

#10474

TULIP: Multi-camera 3D Precision Assessment of Parkinson’s Disease

Kyungdo Kim, Sihan Lyu, Sneha Mantri et al.

CVPR 2024

#10475

KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation

Jihua Peng, Yanghong Zhou, Tracy P Y Mok

CVPR 2024arXiv:2404.00658

#10476

MFP: Making Full Use of Probability Maps for Interactive Image Segmentation

Chaewon Lee, Seon-Ho Lee, Chang-Su Kim

CVPR 2024arXiv:2404.18448

#10477

Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining

Jiahao Nie, Yun Xing, Gongjie Zhang et al.

CVPR 2024arXiv:2401.08407

#10478

Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning

Zhengwei Fang, Rui Wang, Tao Huang et al.

CVPR 2024highlightarXiv:2209.11964

#10479

An Empirical Study of Scaling Law for Scene Text Recognition

Miao Rang, Zhenni Bi, Chuanjian Liu et al.

CVPR 2024

#10480

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

Xianqi Wang, Gangwei Xu, Hao Jia et al.

CVPR 2024highlightarXiv:2403.00486

#10481

When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation

Xiaoming Li, Xinyu Hou, Chen Change Loy

CVPR 2024

#10482

Differentiable Neural Surface Refinement for Modeling Transparent Objects

Weijian Deng, Dylan Campbell, Chunyi Sun et al.

CVPR 2024

#10483

Low-power Continuous Remote Behavioral Localization with Event Cameras

Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez et al.

CVPR 2024arXiv:2312.03799

#10484

Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting

Taeho Kang, Youngki Lee

CVPR 2024highlightarXiv:2402.18330

#10485

AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One

Mike Ranzinger, Greg Heinrich, Jan Kautz et al.

CVPR 2024arXiv:2312.06709

#10486

Towards Co-Evaluation of Cameras HDR and Algorithms for Industrial-Grade 6DoF Pose Estimation

Agastya Kalra, Guy Stoppi, Dmitrii Marin et al.

CVPR 2024

#10487

Tune-An-Ellipse: CLIP Has Potential to Find What You Want

Jinheng Xie, Songhe Deng, Bing Li et al.

CVPR 2024highlight

#10488

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

song yiran, Qianyu Zhou, Xiangtai Li et al.

CVPR 2024arXiv:2401.02317

#10489

EarthLoc: Astronaut Photography Localization by Indexing Earth from Space

Gabriele Berton, Alex Stoken, Barbara Caputo et al.

CVPR 2024arXiv:2403.06758

#10490

PairDETR : Joint Detection and Association of Human Bodies and Faces

Ammar Ali, Georgii Gaikov, Denis Rybalchenko et al.

CVPR 2024

#10491

Close Imitation of Expert Retouching for Black-and-White Photography

Seunghyun Shin, Jisu Shin, Jihwan Bae et al.

CVPR 2024

#10492

OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos

Dongyoung Choi, Hyeonjoong Jang, Min H. Kim

CVPR 2024arXiv:2404.00676

#10493

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Kai Yang, Jian Tao, Jiafei Lyu et al.

CVPR 2024arXiv:2311.13231

#10494

Reconstructing Hands in 3D with Transformers

Georgios Pavlakos, Dandan Shan, Ilija Radosavovic et al.

CVPR 2024arXiv:2312.05251

#10495

XFeat: Accelerated Features for Lightweight Image Matching

Guilherme Potje, Felipe Cadar, André Araujo et al.

CVPR 2024arXiv:2404.19174

#10496

Systematic Comparison of Semi-supervised and Self-supervised Learning for Medical Image Classification

Zhe Huang, Ruijie Jiang, Shuchin Aeron et al.

CVPR 2024arXiv:2307.08919

#10497

GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation

WEIMING ZHANG, Yexin Liu, Xu Zheng et al.

CVPR 2024arXiv:2403.16370

#10498

VRP-SAM: SAM with Visual Reference Prompt

Yanpeng Sun, Jiahui Chen, Shan Zhang et al.

CVPR 2024arXiv:2402.17726

#10499

DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis

Jiapeng Tang, Yinyu Nie, Lev Markhasin et al.

CVPR 2024arXiv:2303.14207

#10500

Looking Similar Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning

Nikhil Singh, Chih-Wei Wu, Iroro Orife et al.

CVPR 2024arXiv:2304.05600

#10501

YOLO-World: Real-Time Open-Vocabulary Object Detection

Tianheng Cheng, Lin Song, Yixiao Ge et al.

CVPR 2024arXiv:2401.17270

#10502

Bézier Everywhere All at Once: Learning Drivable Lanes as Bézier Graphs

Hugh Blayney, Hanlin Tian, Hamish Scott et al.

CVPR 2024

#10503

Taming Self-Training for Open-Vocabulary Object Detection

Shiyu Zhao, Samuel Schulter, Long Zhao et al.

CVPR 2024arXiv:2308.06412

#10504

Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning

Hao Jiang, Bingfeng Zhou, Yadong Mu

CVPR 2024

#10505

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer et al.

CVPR 2024arXiv:2311.15826

#10506

FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation

Zijia Lu, Ehsan Elhamifar

CVPR 2024

#10507

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang et al.

CVPR 2024arXiv:2312.02134

#10508

ShapeMatcher: Self-Supervised Joint Shape Canonicalization Segmentation Retrieval and Deformation

Yan Di, Chenyangguang Zhang, Chaowei Wang et al.

CVPR 2024

#10509

Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

Rui Li, Tobias Fischer, Mattia Segu et al.

CVPR 2024arXiv:2404.03658

#10510

SVDTree: Semantic Voxel Diffusion for Single Image Tree Reconstruction

Yuan Li, Zhihao Liu, Bedrich Benes et al.

CVPR 2024

#10511

Error Detection in Egocentric Procedural Task Videos

Shih-Po Lee, Zijia Lu, Zekun Zhang et al.

CVPR 2024

#10512

Patch2Self2: Self-supervised Denoising on Coresets via Matrix Sketching

Shreyas Fadnavis, Agniva Chowdhury, Joshua Batson et al.

CVPR 2024

#10513

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

Ganggui Ding, Canyu Zhao, Wen Wang et al.

CVPR 2024arXiv:2405.13870

#10514

Generative Unlearning for Any Identity

Juwon Seo, Sung-Hoon Lee, Tae-Young Lee et al.

CVPR 2024arXiv:2405.09879

#10515

Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange

Yanhao Wu, Tong Zhang, Wei Ke et al.

CVPR 2024arXiv:2404.07504

#10516

Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition

Zihan Wang, Siyang Song, Cheng Luo et al.

CVPR 2024arXiv:2404.06443

#10517

Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation

Sixian Zhang, Xinyao Yu, Xinhang Song et al.

CVPR 2024

#10518

SVDinsTN: A Tensor Network Paradigm for Efficient Structure Search from Regularized Modeling Perspective

Yu-Bang Zheng, Xile Zhao, Junhua Zeng et al.

CVPR 2024highlightarXiv:2305.14912

#10519

Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection

Ting Lei, Shaofeng Yin, Yang Liu

CVPR 2024arXiv:2404.06194

#10520

Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration

Tony C. W. MOK, Zi Li, Yunhao Bai et al.

CVPR 2024highlightarXiv:2402.18933

#10521

PoseIRM: Enhance 3D Human Pose Estimation on Unseen Camera Settings via Invariant Risk Minimization

Yanlu Cai, Weizhong Zhang, Yuan Wu et al.

CVPR 2024

#10522

On the Estimation of Image-matching Uncertainty in Visual Place Recognition

Mubariz Zaffar, Liangliang Nan, Julian F. P. Kooij

CVPR 2024highlightarXiv:2404.00546

#10523

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

shiyu xuan, Qingpei Guo, Ming Yang et al.

CVPR 2024arXiv:2310.00582

#10524

LoS: Local Structure-Guided Stereo Matching

Kunhong Li, Longguang Wang, Ye Zhang et al.

CVPR 2024

#10525

RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation

Oded Bialer, Yuval Haitman

CVPR 2024arXiv:2404.18150

#10526

OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation

Jisoo Jeong, Hong Cai, Risheek Garrepalli et al.

CVPR 2024arXiv:2403.18092

#10527

FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer

Dongyeong Hwang, Hyunju Kim, Sunwoo Kim et al.

CVPR 2024arXiv:2403.12821

#10528

Mip-Splatting: Alias-free 3D Gaussian Splatting

Zehao Yu, Anpei Chen, Binbin Huang et al.

CVPR 2024arXiv:2311.16493

#10529

Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation

Guangyang Wu, Xiaohong Liu, Jun Jia et al.

CVPR 2024arXiv:2403.06452

#10530

ProMark: Proactive Diffusion Watermarking for Causal Attribution

Vishal Asnani, John Collomosse, Tu Bui et al.

CVPR 2024arXiv:2403.09914

#10531

MMM: Generative Masked Motion Model

Ekkasit Pinyoanuntapong, Pu Wang, Minwoo Lee et al.

CVPR 2024highlightarXiv:2312.03596

#10532

Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts

Jiawen Zhu, Guansong Pang

CVPR 2024arXiv:2403.06495

#10533

DiffForensics: Leveraging Diffusion Prior to Image Forgery Detection and Localization

Zeqin Yu, Jiangqun Ni, Yuzhen Lin et al.

CVPR 2024

#10534

VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

Syed Talal Wasim, Muzammal Naseer, Salman Khan et al.

CVPR 2024

#10535

Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling

Liwen Wu, Sai Bi, Zexiang Xu et al.

CVPR 2024highlightarXiv:2405.14847

#10536

SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

Gang Zhang, Chen Junnan, Guohuan Gao et al.

CVPR 2024arXiv:2403.05817

#10537

Sheared Backpropagation for Fine-tuning Foundation Models

Zhiyuan Yu, Li Shen, Liang Ding et al.

CVPR 2024

#10538

On the Content Bias in Fréchet Video Distance

Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar et al.

CVPR 2024arXiv:2404.12391

#10539

Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?

Aritra Dutta, Srijan Das, Jacob Nielsen et al.

CVPR 2024arXiv:2312.04548

#10540

VINECS: Video-based Neural Character Skinning

Zhouyingcheng Liao, Vladislav Golyanik, Marc Habermann et al.

CVPR 2024arXiv:2307.00842

#10541

Plug and Play Active Learning for Object Detection

Chenhongyi Yang, Lichao Huang, Elliot Crowley

CVPR 2024arXiv:2211.11612

#10542

Plug-and-Play Diffusion Distillation

Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte et al.

CVPR 2024arXiv:2406.01954

#10543

CLIB-FIQA: Face Image Quality Assessment with Confidence Calibration

Fu-Zhao Ou, Chongyi Li, Shiqi Wang et al.

CVPR 2024

#10544

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

Yuiga Wada, Kanta Kaneda, Daichi Saito et al.

CVPR 2024highlightarXiv:2402.18091

#10545

XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold

Guangyu Wang, Jinzhi Zhang, Fan Wang et al.

CVPR 2024arXiv:2403.19517

#10546

Differentiable Micro-Mesh Construction

Yishun Dou, Zhong Zheng, Qiaoqiao Jin et al.

CVPR 2024

#10547

HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

Mengqi Zhang, Yang Fu, Zheng Ding et al.

CVPR 2024arXiv:2403.12011

#10548

CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement

Qiang Zhu, Jinhua Hao, Yukang Ding et al.

CVPR 2024arXiv:2403.10362

#10549

ProxyCap: Real-time Monocular Full-body Capture in World Space via Human-Centric Proxy-to-Motion Learning

Yuxiang Zhang, Hongwen Zhang, Liangxiao Hu et al.

CVPR 2024arXiv:2307.01200

#10550

Learning from Synthetic Human Group Activities

Che-Jui Chang, Danrui Li, Deep Patel et al.

CVPR 2024arXiv:2306.16772

#10551

Can’t Make an Omelette Without Breaking Some Eggs: Plausible Action Anticipation Using Large Video-Language Models

Himangi Mittal, Nakul Agarwal, Shao-Yuan Lo et al.

CVPR 2024

#10552

Unsupervised 3D Structure Inference from Category-Specific Image Collections

Weikang Wang, Dongliang Cao, Florian Bernard

CVPR 2024

#10553

Video2Game: Real-time Interactive Realistic and Browser-Compatible Environment from a Single Video

Hongchi Xia, Chih-Hao Lin, Wei-Chiu Ma et al.

CVPR 2024

#10554

Identifying Important Group of Pixels using Interactions

Kosuke Sumiyasu, Kazuhiko Kawamoto, Hiroshi Kera

CVPR 2024arXiv:2401.03785

#10555

Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering

Jiawei Yao, Qi Qian, Juhua Hu

CVPR 2024arXiv:2404.15655

#10556

Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation

Hanyang Chi, Jian Pang, Bingfeng Zhang et al.

CVPR 2024arXiv:2405.00378

#10557

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

Yukang Cao, Yan-Pei Cao, Kai Han et al.

CVPR 2024arXiv:2304.00916

#10558

Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal

Yijun Yang, Hongtao Wu, Angelica I. Aviles-Rivero et al.

CVPR 2024arXiv:2403.07684

#10559

Are Conventional SNNs Really Efficient? A Perspective from Network Quantization

Guobin Shen, Dongcheng Zhao, Tenglong Li et al.

CVPR 2024highlight

#10560

RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation

Zeyuan Yang, LIU JIAGENG, Peihao Chen et al.

CVPR 2024

#10561

Sharingan: A Transformer Architecture for Multi-Person Gaze Following

Samy Tafasca, Anshul Gupta, Jean-marc Odobez

CVPR 2024

#10562

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation

Bohao Peng, Xiaoyang Wu, Li Jiang et al.

CVPR 2024arXiv:2403.14418

#10563

Dynamic Support Information Mining for Category-Agnostic Pose Estimation

Pengfei Ren, Yuanyuan Gao, Haifeng Sun et al.

CVPR 2024

#10564

Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness

Sibo Wang, Jie Zhang, Zheng Yuan et al.

CVPR 2024arXiv:2401.04350

#10565

MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation

Zhicheng Zhang, Pancheng Zhao, Eunil Park et al.

CVPR 2024

#10566

CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection

Jiayi Zhu, Qing Guo, Felix Juefei Xu et al.

CVPR 2024arXiv:2403.18554

#10567

Neural Clustering based Visual Representation Learning

Guikun Chen, Xia Li, Yi Yang et al.

CVPR 2024arXiv:2403.17409

#10568

ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models

Jeong-gi Kwak, Erqun Dong, Yuhe Jin et al.

CVPR 2024highlightarXiv:2312.01305

#10569

CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training

Yuxin Guo, Siyang Sun, Shuailei Ma et al.

CVPR 2024

#10570

CapHuman: Capture Your Moments in Parallel Universes

Chao Liang, Fan Ma, Linchao Zhu et al.

CVPR 2024arXiv:2402.00627

#10571

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

Yicheng Xiao, Zhuoyan Luo, Yong Liu et al.

CVPR 2024arXiv:2311.16464

#10572

ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images

Nicolas Bourriez, Ihab Bendidi, Cohen Ethan et al.

CVPR 2024arXiv:2311.15264

#10573

Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation

Hang Li, Chengzhi Shen, Philip H.S. Torr et al.

CVPR 2024arXiv:2311.17216

#10574

VS: Reconstructing Clothed 3D Human from Single Image via Vertex Shift

Leyuan Liu, Yuhan Li, Yunqi Gao et al.

CVPR 2024

#10575

Towards Automatic Power Battery Detection: New Challenge Benchmark Dataset and Baseline

Xiaoqi Zhao, Youwei Pang, Zhenyu Chen et al.

CVPR 2024arXiv:2312.02528

#10576

Point Transformer V3: Simpler Faster Stronger

Xiaoyang Wu, Li Jiang, Peng-Shuai Wang et al.

CVPR 2024

#10577

Improving Distant 3D Object Detection Using 2D Box Supervision

Zetong Yang, Zhiding Yu, Christopher Choy et al.

CVPR 2024arXiv:2403.09230

#10578

Infrared Small Target Detection with Scale and Location Sensitivity

Qiankun Liu, Rui Liu, Bolun Zheng et al.

CVPR 2024arXiv:2403.19366

#10579

Wonder3D: Single Image to 3D using Cross-Domain Diffusion

Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin et al.

CVPR 2024highlightarXiv:2310.15008

#10580

Honeybee: Locality-enhanced Projector for Multimodal LLM

Junbum Cha, Woo-Young Kang, Jonghwan Mun et al.

CVPR 2024highlightarXiv:2312.06742

#10581

ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models

Meng-Li Shih, Wei-Chiu Ma, Lorenzo Boyice et al.

CVPR 2024arXiv:2406.06133

#10582

Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation

Hoang Chuong Nguyen, Tianyu Wang, Jose M. Alvarez et al.

CVPR 2024arXiv:2404.14908

#10583

SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers

Jonathan F. Carter, Joao Jorge, Oliver Gibson et al.

CVPR 2024highlightarXiv:2404.03831

#10584

Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences

Seungwook Kim, Kejie Li, Xueqing Deng et al.

CVPR 2024arXiv:2404.10603

#10585

Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

Haochen Han, Qinghua Zheng, Guang Dai et al.

CVPR 2024arXiv:2403.05105

#10586

EVS-assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling

Rui Jiang, Fangwen Tu, Yixuan Long et al.

CVPR 2024

#10587

Open-World Semantic Segmentation Including Class Similarity

Matteo Sodano, Federico Magistri, Lucas Nunes et al.

CVPR 2024arXiv:2403.07532

#10588

Empowering Resampling Operation for Ultra-High-Definition Image Enhancement with Model-Aware Guidance

Yu, Jie Huang, Li et al.

CVPR 2024

#10589

READ: Retrieval-Enhanced Asymmetric Diffusion for Motion Planning

Takeru Oba, Matthew Walter, Norimichi Ukita

CVPR 2024

#10590

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

Rongjie Li, Songyang Zhang, Dahua Lin et al.

CVPR 2024arXiv:2404.00906

#10591

MeshPose: Unifying DensePose and 3D Body Mesh Reconstruction

Eric-Tuan Le, Antonios Kakolyris, Petros Koutras et al.

CVPR 2024

#10592

Bayesian Differentiable Physics for Cloth Digitalization

Deshan Gong, Ningtao Mao, He Wang

CVPR 2024arXiv:2402.17664

#10593

MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation

Xiaolong Deng, Huisi Wu, Runhao Zeng et al.

CVPR 2024

#10594

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

Zeyinzi Jiang, Chaojie Mao, Yulin Pan et al.

CVPR 2024highlightarXiv:2312.11392

#10595

OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

Xinyu Zhan, Lixin Yang, Yifei Zhao et al.

CVPR 2024arXiv:2403.19417

#10596

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

Jiasen Lu, Christopher Clark, Sangho Lee et al.

CVPR 2024highlightarXiv:2312.17172

#10597

PTQ4SAM: Post-Training Quantization for Segment Anything

Chengtao Lv, Hong Chen, Jinyang Guo et al.

CVPR 2024arXiv:2405.03144

#10598

Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning

Zichen Miao, Jiang Wang, Ze Wang et al.

CVPR 2024

#10599

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Zetong Yang, Li Chen, Yanan Sun et al.

CVPR 2024highlightarXiv:2312.17655

#10600

Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving

JINLONG LI, Baolu Li, Zhengzhong Tu et al.

CVPR 2024arXiv:2404.04804

← Previous

1...51 52 53 54 55...62