Most Cited 2024 "deterministic strategies" Papers

12,324 papers found • Page 56 of 62

#11001

NeRFiller: Completing Scenes via Generative 3D Inpainting

Ethan Weber, Aleksander Holynski, Varun Jampani et al.

CVPR 2024arXiv:2312.04560
#11002

PeVL: Pose-Enhanced Vision-Language Model for Fine-Grained Human Action Recognition

Haosong Zhang, Mei Leong, Liyuan Li et al.

CVPR 2024
#11003

MPOD123: One Image to 3D Content Generation Using Mask-enhanced Progressive Outline-to-Detail Optimization

Jimin Xu, Tianbao Wang, Tao Jin et al.

CVPR 2024
#11004

Look-Up Table Compression for Efficient Image Restoration

Yinglong Li, Jiacheng Li, Zhiwei Xiong

CVPR 2024highlight
#11005

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

Wenhao Li, Mengyuan Liu, Hong Liu et al.

CVPR 2024highlightarXiv:2311.12028
#11006

RepAn: Enhanced Annealing through Re-parameterization

Xiang Fei, Xiawu Zheng, Yan Wang et al.

CVPR 2024
#11007

PAPR in Motion: Seamless Point-level 3D Scene Interpolation

Shichong Peng, Yanshu Zhang, Ke Li

CVPR 2024highlightarXiv:2406.05533
#11008

Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods

Chenfan Qu, Yiwu Zhong, Chongyu Liu et al.

CVPR 2024
#11009

Dense Vision Transformer Compression with Few Samples

Hanxiao Zhang, Yifan Zhou, Guo-Hua Wang

CVPR 2024arXiv:2403.18708
#11010

IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing

Shaofei Wang, Bozidar Antic, Andreas Geiger et al.

CVPR 2024arXiv:2312.05210
#11011

Exploring Pose-Aware Human-Object Interaction via Hybrid Learning

EASTMAN Z Y WU, Yali Li, Yuan Wang et al.

CVPR 2024
#11012

All in One Framework for Multimodal Re-identification in the Wild

He Li, Mang Ye, Ming Zhang et al.

CVPR 2024arXiv:2405.04741
#11013

Bilateral Adaptation for Human-Object Interaction Detection with Occlusion-Robustness

Guangzhi Wang, Yangyang Guo, Ziwei Xu et al.

CVPR 2024
#11014

TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model

Hantao Yao, Rui Zhang, Changsheng Xu

CVPR 2024
#11015

RMT: Retentive Networks Meet Vision Transformers

Qihang Fan, Huaibo Huang, Mingrui Chen et al.

CVPR 2024arXiv:2309.11523
#11016

FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning

Gihun Lee, Minchan Jeong, SangMook Kim et al.

CVPR 2024arXiv:2308.12532
#11017

Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs

Lin Song, Yukang Chen, Shuai Yang et al.

CVPR 2024
#11018

LAENeRF: Local Appearance Editing for Neural Radiance Fields

Lukas Radl, Michael Steiner, Andreas Kurz et al.

CVPR 2024arXiv:2312.09913
#11019

Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting

Haiwei Chen, Yajie Zhao

CVPR 2024arXiv:2403.18186
#11020

Improved Visual Grounding through Self-Consistent Explanations

Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang et al.

CVPR 2024arXiv:2312.04554
#11021

AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture Search

Junghyup Lee, Bumsub Ham

CVPR 2024arXiv:2403.19232
#11022

On the Faithfulness of Vision Transformer Explanations

Junyi Wu, Weitai Kang, Hao Tang et al.

CVPR 2024arXiv:2404.01415
#11023

CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization

Yao Ni, Piotr Koniusz

CVPR 2024arXiv:2404.00521
#11024

OneFormer3D: One Transformer for Unified Point Cloud Segmentation

Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin et al.

CVPR 2024arXiv:2311.14405
#11025

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

Minghua Liu, Ruoxi Shi, Linghao Chen et al.

CVPR 2024arXiv:2311.07885
#11026

C2KD: Bridging the Modality Gap for Cross-Modal Knowledge Distillation

Fushuo Huo, Wenchao Xu, Jingcai Guo et al.

CVPR 2024highlight
#11027

StrokeFaceNeRF: Stroke-based Facial Appearance Editing in Neural Radiance Field

Xiao-juan Li, Dingxi Zhang, Shu-Yu Chen et al.

CVPR 2024
#11028

Neural Modes: Self-supervised Learning of Nonlinear Modal Subspaces

Jiahong Wang, Yinwei DU, Stelian Coros et al.

CVPR 2024arXiv:2404.17620
#11029

WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

Soyong Shin, Juyong Kim, Eni Halilaj et al.

CVPR 2024arXiv:2312.07531
#11030

CLOAF: CoLlisiOn-Aware Human Flow

Andrey Davydov, Martin Engilberge, Mathieu Salzmann et al.

CVPR 2024arXiv:2403.09050
#11031

FedUV: Uniformity and Variance for Heterogeneous Federated Learning

Ha Min Son, Moon-Hyun Kim, Tai-Myoung Chung et al.

CVPR 2024arXiv:2402.18372
#11032

Learning Occupancy for Monocular 3D Object Detection

Liang Peng, Junkai Xu, Haoran Cheng et al.

CVPR 2024arXiv:2305.15694
#11033

Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow

Hanyu Zhou, Yi Chang, Zhiwei Shi

CVPR 2024arXiv:2403.07432
#11034

Language-driven Grasp Detection

An Dinh Vuong, Minh Nhat VU, Baoru Huang et al.

CVPR 2024arXiv:2406.09489
#11035

Each Test Image Deserves A Specific Prompt: Continual Test-Time Adaptation for 2D Medical Image Segmentation

Ziyang Chen, Yongsheng Pan, Yiwen Ye et al.

CVPR 2024arXiv:2311.18363
#11036

Abductive Ego-View Accident Video Understanding for Safe Driving Perception

Jianwu Fang, Lei-lei Li, Junfei Zhou et al.

CVPR 2024highlightarXiv:2403.00436
#11037

Prompting Vision Foundation Models for Pathology Image Analysis

CHONG YIN, Siqi Liu, Kaiyang Zhou et al.

CVPR 2024
#11038

Unmixing Before Fusion: A Generalized Paradigm for Multi-Source-based Hyperspectral Image Synthesis

Yang Yu, Erting Pan, Xinya Wang et al.

CVPR 2024
#11039

Navigating Beyond Dropout: An Intriguing Solution towards Generalizable Image Super Resolution

Hongjun Wang, Jiyuan Chen, Yinqiang Zheng et al.

CVPR 2024arXiv:2402.18929
#11040

Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch

Xidong Wu, Shangqian Gao, Zeyu Zhang et al.

CVPR 2024arXiv:2403.14729
#11041

SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image

Yunhao Li, Xiaodong Wang, Ping Wang et al.

CVPR 2024highlightarXiv:2403.20018
#11042

Learning to Control Camera Exposure via Reinforcement Learning

Kyunghyun Lee, Ukcheol Shin, Byeong-Uk Lee

CVPR 2024arXiv:2404.01636
#11043

Regressor-Segmenter Mutual Prompt Learning for Crowd Counting

Mingyue Guo, Li Yuan, Zhaoyi Yan et al.

CVPR 2024arXiv:2312.01711
#11044

Vector Graphics Generation via Mutually Impulsed Dual-domain Diffusion

Zhongyin Zhao, Ye Chen, Zhangli Hu et al.

CVPR 2024
#11045

Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation

Dongliang Cao, Marvin Eisenberger, Nafie El Amrani et al.

CVPR 2024arXiv:2402.18920
#11046

FocSAM: Delving Deeply into Focused Objects in Segmenting Anything

You Huang, Zongyu Lan, Liujuan Cao et al.

CVPR 2024arXiv:2405.18706
#11047

Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior

Chen Cheng, Xiaofeng Yang, Fan Yang et al.

CVPR 2024arXiv:2403.09140
#11048

Learning to Transform Dynamically for Better Adversarial Transferability

Rongyi Zhu, Zeliang Zhang, Susan Liang et al.

CVPR 2024arXiv:2405.14077
#11049

Spin-UP: Spin Light for Natural Light Uncalibrated Photometric Stereo

Zongrui Li, Zhan Lu, Haojie Yan et al.

CVPR 2024arXiv:2404.01612
#11050

SEAS: ShapE-Aligned Supervision for Person Re-Identification

Haidong Zhu, Pranav Budhwant, Zhaoheng Zheng et al.

CVPR 2024
#11051

Learning to Select Views for Efficient Multi-View Understanding

Yunzhong Hou, Stephen Gould, Liang Zheng

CVPR 2024
#11052

LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes

Shanlin Sun, Bingbing Zhuang, Ziyu Jiang et al.

CVPR 2024highlightarXiv:2405.00900
#11053

Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships

Rangel Daroya, Aaron Sun, Subhransu Maji

CVPR 2024highlightarXiv:2403.17173
#11054

UniGS: Unified Representation for Image Generation and Segmentation

Lu Qi, Lehan Yang, Weidong Guo et al.

CVPR 2024arXiv:2312.01985
#11055

ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations

Rwiddhi Chakraborty, Adrian de Sena Sletten, Michael C. Kampffmeyer

CVPR 2024arXiv:2403.13870
#11056

DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling

Miguel Fainstein, Viviana Siless, Emmanuel Iarussi

CVPR 2024arXiv:2402.08876
#11057

UV-IDM: Identity-Conditioned Latent Diffusion Model for Face UV-Texture Generation

Hong Li, Yutang Feng, Song Xue et al.

CVPR 2024
#11058

PBWR: Parametric-Building-Wireframe Reconstruction from Aerial LiDAR Point Clouds

Shangfeng Huang, Ruisheng Wang, Bo Guo et al.

CVPR 2024
#11059

GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation Demonstration and Imitation

Zifan Wang, Junyu Chen, Ziqing Chen et al.

CVPR 2024
#11060

Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth

Zhaoyang Sun, Shengwu Xiong, Yaxiong Chen et al.

CVPR 2024arXiv:2405.17240
#11061

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

Yanwu Xu, Yang Zhao, Zhisheng Xiao et al.

CVPR 2024highlightarXiv:2311.09257
#11062

SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation

Keqi Chen, vinkle srivastav, Nicolas Padoy

CVPR 2024arXiv:2404.02041
#11063

Context-Aware Integration of Language and Visual References for Natural Language Tracking

Yanyan Shao, Shuting He, Qi Ye et al.

CVPR 2024arXiv:2403.19975
#11064

Open-Vocabulary Segmentation with Semantic-Assisted Calibration

Yong Liu, Sule Bai, Guanbin Li et al.

CVPR 2024arXiv:2312.04089
#11065

Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts

Cansu Korkmaz, Ahmet Murat Tekalp, Zafer Dogan

CVPR 2024arXiv:2402.19215
#11066

RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception

Ruiyang Hao, Siqi Fan, Yingru Dai et al.

CVPR 2024arXiv:2403.10145
#11067

Task-Customized Mixture of Adapters for General Image Fusion

Pengfei Zhu, Yang Sun, Bing Cao et al.

CVPR 2024arXiv:2403.12494
#11068

PointBeV: A Sparse Approach for BeV Predictions

Loick Chambon, Éloi Zablocki, Mickaël Chen et al.

CVPR 2024arXiv:2312.00703
#11069

Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution

Zhikai Chen, Fuchen Long, Zhaofan Qiu et al.

CVPR 2024arXiv:2403.17000
#11070

Ensemble Diversity Facilitates Adversarial Transferability

Bowen Tang, Zheng Wang, Yi Bin et al.

CVPR 2024
#11071

CFAT: Unleashing Triangular Windows for Image Super-resolution

Abhisek Ray, Gaurav Kumar, Maheshkumar Kolekar

CVPR 2024highlight
#11072

Convolutional Prompting meets Language Models for Continual Learning

Anurag Roy, Riddhiman Moulick, Vinay Verma et al.

CVPR 2024arXiv:2403.20317
#11073

Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

Zihua Zhao, Mengxi Chen, Tianjie Dai et al.

CVPR 2024arXiv:2405.16996
#11074

Contextual Augmented Global Contrast for Multimodal Intent Recognition

Kaili Sun, Zhiwen Xie, Mang Ye et al.

CVPR 2024
#11075

Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering

Vivek Gopalakrishnan, Neel Dey, Polina Golland

CVPR 2024arXiv:2312.06358
#11076

Relaxed Contrastive Learning for Federated Learning

Seonguk Seo, Jinkyu Kim, Geeho Kim et al.

CVPR 2024arXiv:2401.04928
#11077

EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto et al.

CVPR 2024arXiv:2311.15879
#11078

LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation

Kibum Kim, Kanghoon Yoon, Jaehyeong Jeon et al.

CVPR 2024arXiv:2310.10404
#11079

Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding

Guofeng Mei, Luigi Riz, Yiming Wang et al.

CVPR 2024highlightarXiv:2312.02244
#11080

Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer Examples

Yuyang Yu, Bangzhen Liu, Chenxi Zheng et al.

CVPR 2024
#11081

Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations

Daan de Geus, Gijs Dubbelman

CVPR 2024arXiv:2406.10114
#11082

FreeKD: Knowledge Distillation via Semantic Frequency Prompt

Yuan Zhang, Tao Huang, Jiaming Liu et al.

CVPR 2024arXiv:2311.12079
#11083

Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning

Wei Zhang, Chaoqun Wan, Tongliang Liu et al.

CVPR 2024
#11084

Targeted Representation Alignment for Open-World Semi-Supervised Learning

Ruixuan Xiao, Lei Feng, Kai Tang et al.

CVPR 2024
#11085

SNIDA: Unlocking Few-Shot Object Detection with Non-linear Semantic Decoupling Augmentation

Yanjie Wang, Xu Zou, Luxin Yan et al.

CVPR 2024
#11086

Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching

Peng Xu, Zhiyu Xiang, Chengyu Qiao et al.

CVPR 2024arXiv:2306.15612
#11087

Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance

Junkai Fan, Jiangwei Weng, Kun Wang et al.

CVPR 2024arXiv:2405.09996
#11088

Exploring Region-Word Alignment in Built-in Detector for Open-Vocabulary Object Detection

Heng Zhang, Qiuyu Zhao, Linyu Zheng et al.

CVPR 2024
#11089

L0-Sampler: An L0 Model Guided Volume Sampling for NeRF

Liangchen Li, Juyong Zhang

CVPR 2024
#11090

Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features

Niladri Shekhar Dutt, Sanjeev Muralikrishnan, Niloy J. Mitra

CVPR 2024arXiv:2311.17024
#11091

Unsupervised Occupancy Learning from Sparse Point Cloud

Amine Ouasfi, Adnane Boukhayma

CVPR 2024highlightarXiv:2404.02759
#11092

GLOW: Global Layout Aware Attacks on Object Detection

Jun Bao, Buyu Liu, Kui Ren et al.

CVPR 2024arXiv:2302.14166
#11093

Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning

Yun Li, Zhe Liu, Hang Chen et al.

CVPR 2024arXiv:2402.17251
#11094

Neural Underwater Scene Representation

Yunkai Tang, Chengxuan Zhu, Renjie Wan et al.

CVPR 2024
#11095

Scaled Decoupled Distillation

Shicai Wei, Chunbo Luo, Yang Luo

CVPR 2024
#11096

VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens

Fan Ma, Xiaojie Jin, Heng Wang et al.

CVPR 2024arXiv:2312.08870
#11097

Hierarchical Intra-modal Correlation Learning for Label-free 3D Semantic Segmentation

Xin Kang, Lei Chu, Jiahao Li et al.

CVPR 2024
#11098

PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving

Xinshuo Weng, Boris Ivanovic, Yan Wang et al.

CVPR 2024
#11099

Towards Generalizable Tumor Synthesis

Qi Chen, Xiaoxi Chen, Haorui Song et al.

CVPR 2024arXiv:2402.19470
#11100

Adaptive Hyper-graph Aggregation for Modality-Agnostic Federated Learning

Fan Qi, Shuai Li

CVPR 2024
#11101

Bi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-based 3D Semantic Scene Completion

Yujie Xue, Ruihui Li, F anWu et al.

CVPR 2024
#11102

Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment

Angchi Xu, Wei-Shi Zheng

CVPR 2024arXiv:2403.19225
#11103

Depth-Aware Concealed Crop Detection in Dense Agricultural Scenes

Liqiong Wang, Jinyu Yang, Yanfu Zhang et al.

CVPR 2024
#11104

FC-GNN: Recovering Reliable and Accurate Correspondences from Interferences

Haobo Xu, Jun Zhou, Hua Yang et al.

CVPR 2024
#11105

MoMask: Generative Masked Modeling of 3D Human Motions

chuan guo, Yuxuan Mu, Muhammad Gohar Javed et al.

CVPR 2024arXiv:2312.00063
#11106

CapsFusion: Rethinking Image-Text Data at Scale

Qiying Yu, Quan Sun, Xiaosong Zhang et al.

CVPR 2024arXiv:2310.20550
#11107

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition

Xiaohan Ding, Yiyuan Zhang, Yixiao Ge et al.

CVPR 2024arXiv:2311.15599
#11108

A General and Efficient Training for Transformer via Token Expansion

Wenxuan Huang, Yunhang Shen, Jiao Xie et al.

CVPR 2024arXiv:2404.00672
#11109

BigGait: Learning Gait Representation You Want by Large Vision Models

Dingqiang Ye, Chao Fan, Jingzhe Ma et al.

CVPR 2024arXiv:2402.19122
#11110

Event-based Visible and Infrared Fusion via Multi-task Collaboration

Mengyue Geng, Lin Zhu, Lizhi Wang et al.

CVPR 2024
#11111

Breathing Life Into Sketches Using Text-to-Video Priors

Rinon Gal, Yael Vinker, Yuval Alaluf et al.

CVPR 2024highlightarXiv:2311.13608
#11112

Gaussian Shell Maps for Efficient 3D Human Generation

Rameen Abdal, Wang Yifan, Zifan Shi et al.

CVPR 2024arXiv:2311.17857
#11113

Byzantine-robust Decentralized Federated Learning via Dual-domain Clustering and Trust Bootstrapping

Peng Sun, Xinyang Liu, Zhibo Wang et al.

CVPR 2024
#11114

MotionEditor: Editing Video Motion via Content-Aware Diffusion

Shuyuan Tu, Qi Dai, Zhi-Qi Cheng et al.

CVPR 2024arXiv:2311.18830
#11115

State Space Models for Event Cameras

Nikola Zubic, Mathias Gehrig, Davide Scaramuzza

CVPR 2024
#11116

DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation

Xiaoliang Ju, Zhaoyang Huang, Yijin Li et al.

CVPR 2024arXiv:2306.00519
#11117

Towards Calibrated Multi-label Deep Neural Networks

Jiacheng Cheng, Nuno Vasconcelos

CVPR 2024
#11118

TIM: A Time Interval Machine for Audio-Visual Action Recognition

Jacob Chalk, Jaesung Huh, Evangelos Kazakos et al.

CVPR 2024arXiv:2404.05559
#11119

Test-Time Linear Out-of-Distribution Detection

Ke Fan, Tong Liu, Xingyu Qiu et al.

CVPR 2024
#11120

Exploiting Style Latent Flows for Generalizing Deepfake Video Detection

Jongwook Choi, Taehoon Kim, Yonghyun Jeong et al.

CVPR 2024arXiv:2403.06592
#11121

LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example

Soyeon Yoon, Kwan Yun, Kwanggyoon Seo et al.

CVPR 2024highlightarXiv:2403.15227
#11122

Leveraging Predicate and Triplet Learning for Scene Graph Generation

Jiankai Li, Yunhong Wang, Xiefan Guo et al.

CVPR 2024arXiv:2406.02038
#11123

Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling

Leon Sick, Dominik Engel, Pedro Hermosilla et al.

CVPR 2024arXiv:2309.12378
#11124

HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models

Mengcheng Li, Hongwen Zhang, Yuxiang Zhang et al.

CVPR 2024highlightarXiv:2406.01334
#11125

Enhancing Visual Continual Learning with Language-Guided Supervision

Bolin Ni, Hongbo Zhao, Chenghao Zhang et al.

CVPR 2024arXiv:2403.16124
#11126

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

Yandan Yang, Baoxiong Jia, Peiyuan Zhi et al.

CVPR 2024highlightarXiv:2404.09465
#11127

Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer

Yuang Ai, Xiaoqiang Zhou, Huaibo Huang et al.

CVPR 2024arXiv:2303.17783
#11128

Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge

Haoxiang Ma, Modi Shi, Boyang GAO et al.

CVPR 2024arXiv:2404.01727
#11129

Making Vision Transformers Truly Shift-Equivariant

Renan A. Rojas-Gomez, Teck-Yian Lim, Minh Do et al.

CVPR 2024arXiv:2305.16316
#11130

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

Yujie Wei, Shiwei Zhang, Zhiwu Qing et al.

CVPR 2024arXiv:2312.04433
#11131

RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses

bedrettin cetinkaya, Sinan Kalkan, Emre Akbas

CVPR 2024arXiv:2403.01795
#11132

Fine-Grained Bipartite Concept Factorization for Clustering

Chong Peng, Pengfei Zhang, Yongyong Chen et al.

CVPR 2024
#11133

Generalized Event Cameras

Varun Sundar, Matthew Dutson, Andrei Ardelean et al.

CVPR 2024arXiv:2407.02683
#11134

Multimodal Prompt Perceiver: Empower Adaptiveness Generalizability and Fidelity for All-in-One Image Restoration

Yuang Ai, Huaibo Huang, Xiaoqiang Zhou et al.

CVPR 2024arXiv:2312.02918
#11135

BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection

Wenjie Wang, Yehao Lu, Guangcong Zheng et al.

CVPR 2024arXiv:2406.08785
#11136

Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval

Rohan Sarkar, Avinash Kak

CVPR 2024arXiv:2403.00272
#11137

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

Zhen Zhao, Jingqun Tang, Chunhui Lin et al.

CVPR 2024arXiv:2311.13120
#11138

NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation

Vikas Thamizharasan, Difan Liu, Matthew Fisher et al.

CVPR 2024arXiv:2405.15217
#11139

Hyperbolic Anomaly Detection

Huimin Li, Zhentao Chen, Yunhao Xu et al.

CVPR 2024
#11140

Selective Nonlinearities Removal from Digital Signals

Krzysztof Maliszewski, Magdalena Urbanska, Varvara Vetrova et al.

CVPR 2024arXiv:2403.09731
#11141

Backdoor Defense via Test-Time Detecting and Repairing

Jiyang Guan, Jian Liang, Ran He

CVPR 2024
#11142

Towards a Perceptual Evaluation Framework for Lighting Estimation

Justine Giroux, Mohammad Reza Karimi Dastjerdi, Yannick Hold-Geoffroy et al.

CVPR 2024arXiv:2312.04334
#11143

DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

Zixuan Wang, Jia Jia, Shikun Sun et al.

CVPR 2024arXiv:2403.13667
#11144

HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative

CONG MA, Qiao Lei, Chengkai Zhu et al.

CVPR 2024arXiv:2403.02640
#11145

What Sketch Explainability Really Means for Downstream Tasks?

Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Ayan Kumar Bhunia et al.

CVPR 2024arXiv:2403.09480
#11146

Leveraging Frame Affinity for sRGB-to-RAW Video De-rendering

Chen Zhang, Wencheng Han, Yang Zhou et al.

CVPR 2024
#11147

Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation

Ba Hung Ngo, Nhat-Tuong Do-Tran, Tuan-Ngoc Nguyen et al.

CVPR 2024arXiv:2403.18360
#11148

GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo

Jiang Wu, Rui Li, Haofei Xu et al.

CVPR 2024arXiv:2404.07992
#11149

From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation

Javier Tirado-Garín, Javier Civera

CVPR 2024highlightarXiv:2312.05995
#11150

CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images

Aaron Gokaslan, A. Feder Cooper, Jasmine Collins et al.

CVPR 2024
#11151

Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing

Boqiang Zhang, Hongtao Xie, Zuan Gao et al.

CVPR 2024arXiv:2405.04377
#11152

Memory-based Adapters for Online 3D Scene Perception

Xiuwei Xu, Chong Xia, Ziwei Wang et al.

CVPR 2024arXiv:2403.06974
#11153

Cross-spectral Gated-RGB Stereo Depth Estimation

Samuel Brucker, Stefanie Walz, Mario Bijelic et al.

CVPR 2024highlightarXiv:2405.12759
#11154

EASE-DETR: Easing the Competition among Object Queries

Yulu Gao, Yifan Sun, Xudong Ding et al.

CVPR 2024
#11155

GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding

Zi-Ting Chou, Sheng-Yu Huang, I-Jieh Liu et al.

CVPR 2024arXiv:2403.03608
#11156

CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model

Jianhao Zeng, Dan Song, Weizhi Nie et al.

CVPR 2024arXiv:2311.18405
#11157

Readout Guidance: Learning Control from Diffusion Features

Grace Luo, Trevor Darrell, Oliver Wang et al.

CVPR 2024highlightarXiv:2312.02150
#11158

Action Detection via an Image Diffusion Process

Lin Geng Foo, Tianjiao Li, Hossein Rahmani et al.

CVPR 2024arXiv:2404.01051
#11159

Transcriptomics-guided Slide Representation Learning in Computational Pathology

Guillaume Jaume, Lukas Oldenburg, Anurag Vaidya et al.

CVPR 2024arXiv:2405.11618
#11160

SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field

Lizhe Liu, Bohua Wang, Hongwei Xie et al.

CVPR 2024highlightarXiv:2403.14366
#11161

MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning

Zhe Li, Laurence Yang, Bocheng Ren et al.

CVPR 2024arXiv:2402.02045
#11162

Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations

Lei Fan, Jianxiong Zhou, Xiaoying Xing et al.

CVPR 2024arXiv:2311.17938
#11163

DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video

Huiqiang Sun, Xingyi Li, Liao Shen et al.

CVPR 2024arXiv:2403.10103
#11164

SAOR: Single-View Articulated Object Reconstruction

Mehmet Aygun, Oisin Mac Aodha

CVPR 2024
#11165

GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos

Tomas Soucek, Dima Damen, Michael Wray et al.

CVPR 2024
#11166

Density-Adaptive Model Based on Motif Matrix for Multi-Agent Trajectory Prediction

Di Wen, Haoran Xu, Zhaocheng He et al.

CVPR 2024
#11167

Towards Accurate Post-training Quantization for Diffusion Models

Changyuan Wang, Ziwei Wang, Xiuwei Xu et al.

CVPR 2024highlightarXiv:2305.18723
#11168

MoST: Multi-Modality Scene Tokenization for Motion Prediction

Norman Mu, Jingwei Ji, Zhenpei Yang et al.

CVPR 2024arXiv:2404.19531
#11169

Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling

Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang et al.

CVPR 2024highlightarXiv:2406.03723
#11170

MultiDiff: Consistent Novel View Synthesis from a Single Image

Norman Müller, Katja Schwarz, Barbara Roessle et al.

CVPR 2024arXiv:2406.18524
#11171

Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning

Menghao Zhang, Jingyu Wang, Qi Qi et al.

CVPR 2024highlight
#11172

Uncertainty-aware Action Decoupling Transformer for Action Anticipation

Hongji Guo, Nakul Agarwal, Shao-Yuan Lo et al.

CVPR 2024highlight
#11173

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

Kuan-Chih Huang, Weijie Lyu, Ming-Hsuan Yang et al.

CVPR 2024arXiv:2312.08371
#11174

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Haoxin Chen, Yong Zhang, Xiaodong Cun et al.

CVPR 2024arXiv:2401.09047
#11175

TextNeRF: A Novel Scene-Text Image Synthesis Method based on Neural Radiance Fields

Jialei Cui, Jianwei Du, Wenzhuo Liu et al.

CVPR 2024
#11176

An Asymmetric Augmented Self-Supervised Learning Method for Unsupervised Fine-Grained Image Hashing

Feiran Hu, Chenlin Zhang, Jiangliang GUO et al.

CVPR 2024
#11177

MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model

Kaiyu Song, Hanjiang Lai, Yan Pan et al.

CVPR 2024arXiv:2312.04802
#11178

Action Scene Graphs for Long-Form Understanding of Egocentric Videos

Ivan Rodin, Antonino Furnari, Kyle Min et al.

CVPR 2024arXiv:2312.03391
#11179

DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking

Fei Xie, Zhongdao Wang, Chao Ma

CVPR 2024
#11180

EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models

Jingyuan Yang, Jiawei Feng, Hui Huang

CVPR 2024arXiv:2401.04608
#11181

SpiderMatch: 3D Shape Matching with Global Optimality and Geometric Consistency

Paul Roetzer, Florian Bernard

CVPR 2024
#11182

Realigning Confidence with Temporal Saliency Information for Point-Level Weakly-Supervised Temporal Action Localization

Ziying Xia, Jian Cheng, Siyu Liu et al.

CVPR 2024
#11183

3D Facial Expressions through Analysis-by-Neural-Synthesis

George Retsinas, Panagiotis Filntisis, Radek Danecek et al.

CVPR 2024arXiv:2404.04104
#11184

Segment and Caption Anything

Xiaoke Huang, Jianfeng Wang, Yansong Tang et al.

CVPR 2024arXiv:2312.00869
#11185

Brush2Prompt: Contextual Prompt Generator for Object Inpainting

Mang Tik Chiu, Yuqian Zhou, Lingzhi Zhang et al.

CVPR 2024
#11186

G^3-LQ: Marrying Hyperbolic Alignment with Explicit Semantic-Geometric Modeling for 3D Visual Grounding

Yuan Wang, Yali Li, Shengjin Wang

CVPR 2024
#11187

Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images

Chaoqin Huang, Aofan Jiang, Jinghao Feng et al.

CVPR 2024highlightarXiv:2403.12570
#11188

NightCC: Nighttime Color Constancy via Adaptive Channel Masking

Shuwei Li, Robby T. Tan

CVPR 2024
#11189

Sparse Views Near Light: A Practical Paradigm for Uncalibrated Point-light Photometric Stereo

Mohammed Brahimi, Bjoern Haefner, Zhenzhang Ye et al.

CVPR 2024arXiv:2404.00098
#11190

Total Selfie: Generating Full-Body Selfies

Bowei Chen, Brian Curless, Ira Kemelmacher-Shlizerman et al.

CVPR 2024highlightarXiv:2308.14740
#11191

LayoutFormer: Hierarchical Text Detection Towards Scene Text Understanding

Min Liang, Jia-Wei Ma, Xiaobin Zhu et al.

CVPR 2024
#11192

On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

Peng Sun, Bei Shi, Daiwei Yu et al.

CVPR 2024arXiv:2312.03526
#11193

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

Xian Liu, Xiaohang Zhan, Jiaxiang Tang et al.

CVPR 2024highlightarXiv:2311.17061
#11194

Depth Prompting for Sensor-Agnostic Depth Estimation

Jin-Hwi Park, Chanhwi Jeong, Junoh Lee et al.

CVPR 2024arXiv:2405.11867
#11195

Modality-Collaborative Test-Time Adaptation for Action Recognition

Baochen Xiong, Xiaoshan Yang, Yaguang Song et al.

CVPR 2024
#11196

DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback

Yangyi Chen, Karan Sikka, Michael Cogswell et al.

CVPR 2024arXiv:2311.10081
#11197

Rethinking Inductive Biases for Surface Normal Estimation

Gwangbin Bae, Andrew J. Davison

CVPR 2024arXiv:2403.00712
#11198

Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation

Mohammad Amin Shabani, Zhaowen Wang, Difan Liu et al.

CVPR 2024
#11199

Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery

Siddharth Tourani, Ahmed Alwheibi, Arif Mahmood et al.

CVPR 2024highlightarXiv:2403.16194
#11200

OVMR: Open-Vocabulary Recognition with Multi-Modal References

Zehong Ma, Shiliang Zhang, Longhui Wei et al.

CVPR 2024arXiv:2406.04675