Most Cited 2024 "gui agents" Papers

12,324 papers found • Page 56 of 62

#11001

Hierarchical Intra-modal Correlation Learning for Label-free 3D Semantic Segmentation

Xin Kang, Lei Chu, Jiahao Li et al.

CVPR 2024poster
#11002

PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving

Xinshuo Weng, Boris Ivanovic, Yan Wang et al.

CVPR 2024poster
#11003

Towards Generalizable Tumor Synthesis

Qi Chen, Xiaoxi Chen, Haorui Song et al.

CVPR 2024posterarXiv:2402.19470
#11004

Adaptive Hyper-graph Aggregation for Modality-Agnostic Federated Learning

Fan Qi, Shuai Li

CVPR 2024poster
#11005

Bi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-based 3D Semantic Scene Completion

Yujie Xue, Ruihui Li, F anWu et al.

CVPR 2024poster
#11006

Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment

Angchi Xu, Wei-Shi Zheng

CVPR 2024posterarXiv:2403.19225
#11007

Depth-Aware Concealed Crop Detection in Dense Agricultural Scenes

Liqiong Wang, Jinyu Yang, Yanfu Zhang et al.

CVPR 2024poster
#11008

FC-GNN: Recovering Reliable and Accurate Correspondences from Interferences

Haobo Xu, Jun Zhou, Hua Yang et al.

CVPR 2024poster
#11009

MoMask: Generative Masked Modeling of 3D Human Motions

chuan guo, Yuxuan Mu, Muhammad Gohar Javed et al.

CVPR 2024posterarXiv:2312.00063
#11010

CapsFusion: Rethinking Image-Text Data at Scale

Qiying Yu, Quan Sun, Xiaosong Zhang et al.

CVPR 2024posterarXiv:2310.20550
#11011

A General and Efficient Training for Transformer via Token Expansion

Wenxuan Huang, Yunhang Shen, Jiao Xie et al.

CVPR 2024posterarXiv:2404.00672
#11012

BigGait: Learning Gait Representation You Want by Large Vision Models

Dingqiang Ye, Chao Fan, Jingzhe Ma et al.

CVPR 2024posterarXiv:2402.19122
#11013

Event-based Visible and Infrared Fusion via Multi-task Collaboration

Mengyue Geng, Lin Zhu, Lizhi Wang et al.

CVPR 2024poster
#11014

Breathing Life Into Sketches Using Text-to-Video Priors

Rinon Gal, Yael Vinker, Yuval Alaluf et al.

CVPR 2024highlightarXiv:2311.13608
#11015

Gaussian Shell Maps for Efficient 3D Human Generation

Rameen Abdal, Wang Yifan, Zifan Shi et al.

CVPR 2024posterarXiv:2311.17857
#11016

Byzantine-robust Decentralized Federated Learning via Dual-domain Clustering and Trust Bootstrapping

Peng Sun, Xinyang Liu, Zhibo Wang et al.

CVPR 2024poster
#11017

MotionEditor: Editing Video Motion via Content-Aware Diffusion

Shuyuan Tu, Qi Dai, Zhi-Qi Cheng et al.

CVPR 2024posterarXiv:2311.18830
#11018

State Space Models for Event Cameras

Nikola Zubic, Mathias Gehrig, Davide Scaramuzza

CVPR 2024poster
#11019

DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation

Xiaoliang Ju, Zhaoyang Huang, Yijin Li et al.

CVPR 2024posterarXiv:2306.00519
#11020

Towards Calibrated Multi-label Deep Neural Networks

Jiacheng Cheng, Nuno Vasconcelos

CVPR 2024poster
#11021

TIM: A Time Interval Machine for Audio-Visual Action Recognition

Jacob Chalk, Jaesung Huh, Evangelos Kazakos et al.

CVPR 2024posterarXiv:2404.05559
#11022

Test-Time Linear Out-of-Distribution Detection

Ke Fan, Tong Liu, Xingyu Qiu et al.

CVPR 2024poster
#11023

Exploiting Style Latent Flows for Generalizing Deepfake Video Detection

Jongwook Choi, Taehoon Kim, Yonghyun Jeong et al.

CVPR 2024posterarXiv:2403.06592
#11024

LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example

Soyeon Yoon, Kwan Yun, Kwanggyoon Seo et al.

CVPR 2024highlightarXiv:2403.15227
#11025

Leveraging Predicate and Triplet Learning for Scene Graph Generation

Jiankai Li, Yunhong Wang, Xiefan Guo et al.

CVPR 2024posterarXiv:2406.02038
#11026

Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling

Leon Sick, Dominik Engel, Pedro Hermosilla et al.

CVPR 2024posterarXiv:2309.12378
#11027

HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models

Mengcheng Li, Hongwen Zhang, Yuxiang Zhang et al.

CVPR 2024highlightarXiv:2406.01334
#11028

Enhancing Visual Continual Learning with Language-Guided Supervision

Bolin Ni, Hongbo Zhao, Chenghao Zhang et al.

CVPR 2024posterarXiv:2403.16124
#11029

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

Yandan Yang, Baoxiong Jia, Peiyuan Zhi et al.

CVPR 2024highlightarXiv:2404.09465
#11030

Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer

Yuang Ai, Xiaoqiang Zhou, Huaibo Huang et al.

CVPR 2024posterarXiv:2303.17783
#11031

Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge

Haoxiang Ma, Modi Shi, Boyang GAO et al.

CVPR 2024posterarXiv:2404.01727
#11032

Making Vision Transformers Truly Shift-Equivariant

Renan A. Rojas-Gomez, Teck-Yian Lim, Minh Do et al.

CVPR 2024posterarXiv:2305.16316
#11033

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

Yujie Wei, Shiwei Zhang, Zhiwu Qing et al.

CVPR 2024posterarXiv:2312.04433
#11034

RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses

bedrettin cetinkaya, Sinan Kalkan, Emre Akbas

CVPR 2024posterarXiv:2403.01795
#11035

Fine-Grained Bipartite Concept Factorization for Clustering

Chong Peng, Pengfei Zhang, Yongyong Chen et al.

CVPR 2024poster
#11036

Generalized Event Cameras

Varun Sundar, Matthew Dutson, Andrei Ardelean et al.

CVPR 2024posterarXiv:2407.02683
#11037

Multimodal Prompt Perceiver: Empower Adaptiveness Generalizability and Fidelity for All-in-One Image Restoration

Yuang Ai, Huaibo Huang, Xiaoqiang Zhou et al.

CVPR 2024posterarXiv:2312.02918
#11038

BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection

Wenjie Wang, Yehao Lu, Guangcong Zheng et al.

CVPR 2024posterarXiv:2406.08785
#11039

Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval

Rohan Sarkar, Avinash Kak

CVPR 2024posterarXiv:2403.00272
#11040

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

Zhen Zhao, Jingqun Tang, Chunhui Lin et al.

CVPR 2024posterarXiv:2311.13120
#11041

NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation

Vikas Thamizharasan, Difan Liu, Matthew Fisher et al.

CVPR 2024posterarXiv:2405.15217
#11042

Hyperbolic Anomaly Detection

Huimin Li, Zhentao Chen, Yunhao Xu et al.

CVPR 2024poster
#11043

Selective Nonlinearities Removal from Digital Signals

Krzysztof Maliszewski, Magdalena Urbanska, Varvara Vetrova et al.

CVPR 2024posterarXiv:2403.09731
#11044

Backdoor Defense via Test-Time Detecting and Repairing

Jiyang Guan, Jian Liang, Ran He

CVPR 2024poster
#11045

Towards a Perceptual Evaluation Framework for Lighting Estimation

Justine Giroux, Mohammad Reza Karimi Dastjerdi, Yannick Hold-Geoffroy et al.

CVPR 2024posterarXiv:2312.04334
#11046

DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

Zixuan Wang, Jia Jia, Shikun Sun et al.

CVPR 2024posterarXiv:2403.13667
#11047

HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative

CONG MA, Qiao Lei, Chengkai Zhu et al.

CVPR 2024posterarXiv:2403.02640
#11048

What Sketch Explainability Really Means for Downstream Tasks?

Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Ayan Kumar Bhunia et al.

CVPR 2024posterarXiv:2403.09480
#11049

Leveraging Frame Affinity for sRGB-to-RAW Video De-rendering

Chen Zhang, Wencheng Han, Yang Zhou et al.

CVPR 2024poster
#11050

Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation

Ba Hung Ngo, Nhat-Tuong Do-Tran, Tuan-Ngoc Nguyen et al.

CVPR 2024posterarXiv:2403.18360
#11051

GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo

Jiang Wu, Rui Li, Haofei Xu et al.

CVPR 2024posterarXiv:2404.07992
#11052

From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation

Javier Tirado-Garín, Javier Civera

CVPR 2024highlightarXiv:2312.05995
#11053

CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images

Aaron Gokaslan, A. Feder Cooper, Jasmine Collins et al.

CVPR 2024poster
#11054

Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing

Boqiang Zhang, Hongtao Xie, Zuan Gao et al.

CVPR 2024posterarXiv:2405.04377
#11055

Memory-based Adapters for Online 3D Scene Perception

Xiuwei Xu, Chong Xia, Ziwei Wang et al.

CVPR 2024posterarXiv:2403.06974
#11056

Cross-spectral Gated-RGB Stereo Depth Estimation

Samuel Brucker, Stefanie Walz, Mario Bijelic et al.

CVPR 2024highlightarXiv:2405.12759
#11057

EASE-DETR: Easing the Competition among Object Queries

Yulu Gao, Yifan Sun, Xudong Ding et al.

CVPR 2024poster
#11058

GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding

Zi-Ting Chou, Sheng-Yu Huang, I-Jieh Liu et al.

CVPR 2024posterarXiv:2403.03608
#11059

CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model

Jianhao Zeng, Dan Song, Weizhi Nie et al.

CVPR 2024posterarXiv:2311.18405
#11060

Readout Guidance: Learning Control from Diffusion Features

Grace Luo, Trevor Darrell, Oliver Wang et al.

CVPR 2024highlightarXiv:2312.02150
#11061

Action Detection via an Image Diffusion Process

Lin Geng Foo, Tianjiao Li, Hossein Rahmani et al.

CVPR 2024posterarXiv:2404.01051
#11062

Transcriptomics-guided Slide Representation Learning in Computational Pathology

Guillaume Jaume, Lukas Oldenburg, Anurag Vaidya et al.

CVPR 2024posterarXiv:2405.11618
#11063

SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field

Lizhe Liu, Bohua Wang, Hongwei Xie et al.

CVPR 2024highlightarXiv:2403.14366
#11064

MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning

Zhe Li, Laurence Yang, Bocheng Ren et al.

CVPR 2024posterarXiv:2402.02045
#11065

Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations

Lei Fan, Jianxiong Zhou, Xiaoying Xing et al.

CVPR 2024posterarXiv:2311.17938
#11066

DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video

Huiqiang Sun, Xingyi Li, Liao Shen et al.

CVPR 2024posterarXiv:2403.10103
#11067

SAOR: Single-View Articulated Object Reconstruction

Mehmet Aygun, Oisin Mac Aodha

CVPR 2024poster
#11068

GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos

Tomas Soucek, Dima Damen, Michael Wray et al.

CVPR 2024poster
#11069

Density-Adaptive Model Based on Motif Matrix for Multi-Agent Trajectory Prediction

Di Wen, Haoran Xu, Zhaocheng He et al.

CVPR 2024poster
#11070

Towards Accurate Post-training Quantization for Diffusion Models

Changyuan Wang, Ziwei Wang, Xiuwei Xu et al.

CVPR 2024highlightarXiv:2305.18723
#11071

MoST: Multi-Modality Scene Tokenization for Motion Prediction

Norman Mu, Jingwei Ji, Zhenpei Yang et al.

CVPR 2024posterarXiv:2404.19531
#11072

Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling

Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang et al.

CVPR 2024highlightarXiv:2406.03723
#11073

MultiDiff: Consistent Novel View Synthesis from a Single Image

Norman Müller, Katja Schwarz, Barbara Roessle et al.

CVPR 2024posterarXiv:2406.18524
#11074

Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning

Menghao Zhang, Jingyu Wang, Qi Qi et al.

CVPR 2024highlight
#11075

Uncertainty-aware Action Decoupling Transformer for Action Anticipation

Hongji Guo, Nakul Agarwal, Shao-Yuan Lo et al.

CVPR 2024highlight
#11076

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

Kuan-Chih Huang, Weijie Lyu, Ming-Hsuan Yang et al.

CVPR 2024posterarXiv:2312.08371
#11077

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Haoxin Chen, Yong Zhang, Xiaodong Cun et al.

CVPR 2024posterarXiv:2401.09047
#11078

TextNeRF: A Novel Scene-Text Image Synthesis Method based on Neural Radiance Fields

Jialei Cui, Jianwei Du, Wenzhuo Liu et al.

CVPR 2024poster
#11079

An Asymmetric Augmented Self-Supervised Learning Method for Unsupervised Fine-Grained Image Hashing

Feiran Hu, Chenlin Zhang, Jiangliang GUO et al.

CVPR 2024poster
#11080

MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model

Kaiyu Song, Hanjiang Lai, Yan Pan et al.

CVPR 2024posterarXiv:2312.04802
#11081

Action Scene Graphs for Long-Form Understanding of Egocentric Videos

Ivan Rodin, Antonino Furnari, Kyle Min et al.

CVPR 2024posterarXiv:2312.03391
#11082

DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking

Fei Xie, Zhongdao Wang, Chao Ma

CVPR 2024poster
#11083

EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models

Jingyuan Yang, Jiawei Feng, Hui Huang

CVPR 2024posterarXiv:2401.04608
#11084

SpiderMatch: 3D Shape Matching with Global Optimality and Geometric Consistency

Paul Roetzer, Florian Bernard

CVPR 2024poster
#11085

Realigning Confidence with Temporal Saliency Information for Point-Level Weakly-Supervised Temporal Action Localization

Ziying Xia, Jian Cheng, Siyu Liu et al.

CVPR 2024poster
#11086

3D Facial Expressions through Analysis-by-Neural-Synthesis

George Retsinas, Panagiotis Filntisis, Radek Danecek et al.

CVPR 2024posterarXiv:2404.04104
#11087

Segment and Caption Anything

Xiaoke Huang, Jianfeng Wang, Yansong Tang et al.

CVPR 2024posterarXiv:2312.00869
#11088

Brush2Prompt: Contextual Prompt Generator for Object Inpainting

Mang Tik Chiu, Yuqian Zhou, Lingzhi Zhang et al.

CVPR 2024poster
#11089

G^3-LQ: Marrying Hyperbolic Alignment with Explicit Semantic-Geometric Modeling for 3D Visual Grounding

Yuan Wang, Yali Li, Shengjin Wang

CVPR 2024poster
#11090

Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images

Chaoqin Huang, Aofan Jiang, Jinghao Feng et al.

CVPR 2024highlightarXiv:2403.12570
#11091

NightCC: Nighttime Color Constancy via Adaptive Channel Masking

Shuwei Li, Robby T. Tan

CVPR 2024poster
#11092

Sparse Views Near Light: A Practical Paradigm for Uncalibrated Point-light Photometric Stereo

Mohammed Brahimi, Bjoern Haefner, Zhenzhang Ye et al.

CVPR 2024posterarXiv:2404.00098
#11093

Total Selfie: Generating Full-Body Selfies

Bowei Chen, Brian Curless, Ira Kemelmacher-Shlizerman et al.

CVPR 2024highlightarXiv:2308.14740
#11094

LayoutFormer: Hierarchical Text Detection Towards Scene Text Understanding

Min Liang, Jia-Wei Ma, Xiaobin Zhu et al.

CVPR 2024poster
#11095

On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

Peng Sun, Bei Shi, Daiwei Yu et al.

CVPR 2024posterarXiv:2312.03526
#11096

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

Xian Liu, Xiaohang Zhan, Jiaxiang Tang et al.

CVPR 2024highlightarXiv:2311.17061
#11097

Depth Prompting for Sensor-Agnostic Depth Estimation

Jin-Hwi Park, Chanhwi Jeong, Junoh Lee et al.

CVPR 2024posterarXiv:2405.11867
#11098

Modality-Collaborative Test-Time Adaptation for Action Recognition

Baochen Xiong, Xiaoshan Yang, Yaguang Song et al.

CVPR 2024poster
#11099

DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback

Yangyi Chen, Karan Sikka, Michael Cogswell et al.

CVPR 2024posterarXiv:2311.10081
#11100

Rethinking Inductive Biases for Surface Normal Estimation

Gwangbin Bae, Andrew J. Davison

CVPR 2024posterarXiv:2403.00712
#11101

Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation

Mohammad Amin Shabani, Zhaowen Wang, Difan Liu et al.

CVPR 2024poster
#11102

Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery

Siddharth Tourani, Ahmed Alwheibi, Arif Mahmood et al.

CVPR 2024highlightarXiv:2403.16194
#11103

OVMR: Open-Vocabulary Recognition with Multi-Modal References

Zehong Ma, Shiliang Zhang, Longhui Wei et al.

CVPR 2024posterarXiv:2406.04675
#11104

AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation

Taeckyung Lee, Sorn Chottananurak, Taesik Gong et al.

CVPR 2024posterarXiv:2404.01351
#11105

A Simple Recipe for Language-guided Domain Generalized Segmentation

Mohammad Fahes, TUAN-HUNG VU, Andrei Bursuc et al.

CVPR 2024posterarXiv:2311.17922
#11106

An Edit Friendly DDPM Noise Space: Inversion and Manipulations

Inbar Huberman-Spiegelglas, Vladimir Kulikov, Tomer Michaeli

CVPR 2024posterarXiv:2304.06140
#11107

AdaShift: Learning Discriminative Self-Gated Neural Feature Activation With an Adaptive Shift Factor

Sudong Cai

CVPR 2024poster
#11108

PredToken: Predicting Unknown Tokens and Beyond with Coarse-to-Fine Iterative Decoding

Xuesong Nie, Haoyuan Jin, Yunfeng Yan et al.

CVPR 2024poster
#11109

Holistic Features are almost Sufficient for Text-to-Video Retrieval

Kaibin Tian, Ruixiang Zhao, Zijie Xin et al.

CVPR 2024poster
#11110

Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection

Taeheon Kim, Sebin Shin, Youngjoon Yu et al.

CVPR 2024posterarXiv:2403.01300
#11111

Seeing the Unseen: Visual Common Sense for Semantic Placement

Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra et al.

CVPR 2024posterarXiv:2401.07770
#11112

Diffuse Attend and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion

Junjiao Tian, Lavisha Aggarwal, Andrea Colaco et al.

CVPR 2024poster
#11113

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani et al.

CVPR 2024posterarXiv:2404.06609
#11114

WonderJourney: Going from Anywhere to Everywhere

Hong-Xing Yu, Haoyi Duan, Junhwa Hur et al.

CVPR 2024posterarXiv:2312.03884
#11115

CLIP-Driven Open-Vocabulary 3D Scene Graph Generation via Cross-Modality Contrastive Learning

Lianggangxu Chen, Xuejiao Wang, Jiale Lu et al.

CVPR 2024highlight
#11116

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation

Luca Barsellotti, Roberto Amoroso, Marcella Cornia et al.

CVPR 2024posterarXiv:2404.06542
#11117

HRVDA: High-Resolution Visual Document Assistant

Chaohu Liu, Kun Yin, Haoyu Cao et al.

CVPR 2024posterarXiv:2404.06918
#11118

A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation

Qucheng Peng, Ce Zheng, Chen Chen

CVPR 2024posterarXiv:2403.11310
#11119

Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model

Runmin Dong, Shuai Yuan, Bin Luo et al.

CVPR 2024posterarXiv:2403.17460
#11120

Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models

Zijin Yang, Kai Zeng, Kejiang Chen et al.

CVPR 2024posterarXiv:2404.04956
#11121

Multimodal Sense-Informed Forecasting of 3D Human Motions

Zhenyu Lou, Qiongjie Cui, Haofan Wang et al.

CVPR 2024poster
#11122

Resolution Limit of Single-Photon LiDAR

Stanley H. Chan, Hashan K Weerasooriya, Weijian Zhang et al.

CVPR 2024posterarXiv:2403.17719
#11123

Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration

Mingyuan Meng, Dagan Feng, Lei Bi et al.

CVPR 2024posterarXiv:2406.00123
#11124

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Kristen Grauman, Andrew Westbury, Lorenzo Torresani et al.

CVPR 2024posterarXiv:2311.18259
#11125

CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention

Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali et al.

CVPR 2024highlightarXiv:2402.17678
#11126

LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection

Dat NGUYEN, Nesryne Mejri, Inder Pal Singh et al.

CVPR 2024posterarXiv:2401.13856
#11127

The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing

Denis Bobkov, Vadim Titov, Aibek Alanov et al.

CVPR 2024posterarXiv:2406.10601
#11128

Adaptive Random Feature Regularization on Fine-tuning Deep Neural Networks

Shin&#x27, ya Yamaguchi, Sekitoshi Kanai et al.

CVPR 2024poster
#11129

Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement

Zaid Khan, Vijay Kumar BG, Samuel Schulter et al.

CVPR 2024posterarXiv:2404.04627
#11130

Generating Enhanced Negatives for Training Language-Based Object Detectors

Shiyu Zhao, Long Zhao, Vijay Kumar BG et al.

CVPR 2024posterarXiv:2401.00094
#11131

Joint-Task Regularization for Partially Labeled Multi-Task Learning

Kento Nishi, Junsik Kim, Wanhua Li et al.

CVPR 2024posterarXiv:2404.01976
#11132

MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation

Sumanth Udupa, Prajwal Gurunath, Aniruddh Sikdar et al.

CVPR 2024posterarXiv:2311.18331
#11133

Object Recognition as Next Token Prediction

Kaiyu Yue, Bor-Chun Chen, Jonas Geiping et al.

CVPR 2024highlightarXiv:2312.02142
#11134

MuGE: Multiple Granularity Edge Detection

Caixia Zhou, Yaping Huang, Mengyang Pu et al.

CVPR 2024poster
#11135

Local-consistent Transformation Learning for Rotation-invariant Point Cloud Analysis

Yiyang Chen, Lunhao Duan, Shanshan Zhao et al.

CVPR 2024posterarXiv:2403.11113
#11136

Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis

Zhan Li, Zhang Chen, Zhong Li et al.

CVPR 2024posterarXiv:2312.16812
#11137

LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning

Siyuan Cheng, Guanhong Tao, Yingqi Liu et al.

CVPR 2024posterarXiv:2403.17188
#11138

The More You See in 2D the More You Perceive in 3D

Xinyang Han, Zelin Gao, Angjoo Kanazawa et al.

CVPR 2024highlightarXiv:2404.03652
#11139

What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

Brian Chen, Nina Shvetsova, Andrew Rouditchenko et al.

CVPR 2024posterarXiv:2303.16990
#11140

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Jack Urbanek, Florian Bordes, Pietro Astolfi et al.

CVPR 2024posterarXiv:2312.08578
#11141

Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

Mingcheng Li, Dingkang Yang, Xiao Zhao et al.

CVPR 2024posterarXiv:2404.16456
#11142

ES³: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations

Yuanhang Zhang, Shuang Yang, Shiguang Shan et al.

CVPR 2024poster
#11143

Depth-aware Test-Time Training for Zero-shot Video Object Segmentation

Weihuang Liu, Xi Shen, Haolun Li et al.

CVPR 2024posterarXiv:2403.04258
#11144

MSU-4S - The Michigan State University Four Seasons Dataset

Daniel Kent, Mohammed Alyaqoub, Xiaohu Lu et al.

CVPR 2024poster
#11145

An Interactive Navigation Method with Effect-oriented Affordance

Xiaohan Wang, Yuehu LIU, Xinhang Song et al.

CVPR 2024poster
#11146

Rapid 3D Model Generation with Intuitive 3D Input

Tianrun Chen, Chaotao Ding, Shangzhan Zhang et al.

CVPR 2024highlight
#11147

Unsupervised Salient Instance Detection

Xin Tian, Ke Xu, Rynson W.H. Lau

CVPR 2024poster
#11148

Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception

Junwen He, Yifan Wang, Lijun Wang et al.

CVPR 2024highlightarXiv:2403.02969
#11149

CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation

Zineng Tang, Ziyi Yang, MAHMOUD KHADEMI et al.

CVPR 2024highlight
#11150

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

Yuqi Wang, Yuntao Chen, Xingyu Liao et al.

CVPR 2024posterarXiv:2306.10013
#11151

AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution

Cheeun Hong, Kyoung Mu Lee

CVPR 2024posterarXiv:2404.03296
#11152

MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection

Jakub Micorek, Horst Possegger, Dominik Narnhofer et al.

CVPR 2024posterarXiv:2403.14497
#11153

Instance-level Expert Knowledge and Aggregate Discriminative Attention for Radiology Report Generation

Shenshen Bu, Taiji Li, Zhiming Dai et al.

CVPR 2024poster
#11154

HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation

Zhiying Leng, Tolga Birdal, Xiaohui Liang et al.

CVPR 2024posterarXiv:2403.00372
#11155

Just Add ?! Pose Induced Video Transformers for Understanding Activities of Daily Living

Dominick Reilly, Srijan Das

CVPR 2024poster
#11156

PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

Tianyi Xie, Zeshun Zong, Yuxing Qiu et al.

CVPR 2024highlightarXiv:2311.12198
#11157

Viewpoint-Aware Visual Grounding in 3D Scenes

Xiangxi Shi, Zhonghua Wu, Stefan Lee

CVPR 2024poster
#11158

Long-Tail Class Incremental Learning via Independent Sub-prototype Construction

Xi Wang, Xu Yang, Jie Yin et al.

CVPR 2024poster
#11159

An Aggregation-Free Federated Learning for Tackling Data Heterogeneity

Yuan Wang, Huazhu Fu, Renuga Kanagavelu et al.

CVPR 2024posterarXiv:2404.18962
#11160

Infrared Adversarial Car Stickers

Xiaopei Zhu, Yuqiu Liu, Zhanhao Hu et al.

CVPR 2024posterarXiv:2405.09924
#11161

XFibrosis: Explicit Vessel-Fiber Modeling for Fibrosis Staging from Liver Pathology Images

CHONG YIN, Siqi Liu, Fei Lyu et al.

CVPR 2024poster
#11162

Advancing Saliency Ranking with Human Fixations: Dataset Models and Benchmarks

Bowen Deng, Siyang Song, Andrew French et al.

CVPR 2024poster
#11163

Implicit Event-RGBD Neural SLAM

Delin Qu, Chi Yan, Dong Wang et al.

CVPR 2024highlightarXiv:2311.11013
#11164

Retraining-Free Model Quantization via One-Shot Weight-Coupling Learning

Chen Tang, Yuan Meng, Jiacheng Jiang et al.

CVPR 2024posterarXiv:2401.01543
#11165

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

Tianyu Huang, Yihan Zeng, Zhilu Zhang et al.

CVPR 2024posterarXiv:2312.06439
#11166

From Coarse to Fine-Grained Open-Set Recognition

Nico Lang, Vésteinn Snæbjarnarson, Elijah Cole et al.

CVPR 2024poster
#11167

Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation

Wenxiao Deng, Wenbin Li, Tianyu Ding et al.

CVPR 2024posterarXiv:2404.00563
#11168

Discriminative Pattern Calibration Mechanism for Source-Free Domain Adaptation

Haifeng Xia, Siyu Xia, Zhengming Ding

CVPR 2024poster
#11169

RAM-Avatar: Real-time Photo-Realistic Avatar from Monocular Videos with Full-body Control

xiang deng, Zerong Zheng, Yuxiang Zhang et al.

CVPR 2024poster
#11170

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Zhen Li, Mingdeng Cao, Xintao Wang et al.

CVPR 2024posterarXiv:2312.04461
#11171

Privacy-Preserving Face Recognition Using Trainable Feature Subtraction

Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.

CVPR 2024posterarXiv:2403.12457
#11172

Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model

Wenfeng Song, Xingliang Jin, Shuai Li et al.

CVPR 2024poster
#11173

3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling

Chaokang Jiang, Guangming Wang, Jiuming Liu et al.

CVPR 2024posterarXiv:2402.18146
#11174

CPR-Coach: Recognizing Composite Error Actions based on Single-class Training

Shunli Wang, Shuaibing Wang, Dingkang Yang et al.

CVPR 2024posterarXiv:2309.11718
#11175

Restoration by Generation with Constrained Priors

Zheng Ding, Xuaner Zhang, Zhuowen Tu et al.

CVPR 2024highlightarXiv:2312.17161
#11176

Unified Entropy Optimization for Open-Set Test-Time Adaptation

Zhengqing Gao, Xu-Yao Zhang, Cheng-Lin Liu

CVPR 2024posterarXiv:2404.06065
#11177

Poly Kernel Inception Network for Remote Sensing Detection

Xinhao Cai, Qiuxia Lai, Yuwei Wang et al.

CVPR 2024posterarXiv:2403.06258
#11178

Distraction is All You Need: Memory-Efficient Image Immunization against Diffusion-Based Image Editing

Ling Lo, Cheng Yeo, Hong-Han Shuai et al.

CVPR 2024highlight
#11179

Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model

Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.

CVPR 2024posterarXiv:2311.17112
#11180

MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction

Xiaolu Liu, Song Wang, Wentong Li et al.

CVPR 2024posterarXiv:2404.00876
#11181

ViT-Lens: Towards Omni-modal Representations

Stan Weixian Lei, Yixiao Ge, Kun Yi et al.

CVPR 2024posterarXiv:2311.16081
#11182

Prompt-Driven Referring Image Segmentation with Instance Contrasting

Chao Shang, Zichen Song, Heqian Qiu et al.

CVPR 2024poster
#11183

CosmicMan: A Text-to-Image Foundation Model for Humans

Shikai Li, Jianglin Fu, Kaiyuan Liu et al.

CVPR 2024highlightarXiv:2404.01294
#11184

MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark

Sanghyun Woo, Kwanyong Park, Inkyu Shin et al.

CVPR 2024posterarXiv:2403.20225
#11185

Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering

Zhaohe Liao, Jiangtong Li, Li Niu et al.

CVPR 2024posterarXiv:2407.03008
#11186

Overload: Latency Attacks on Object Detection for Edge Devices

Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung et al.

CVPR 2024posterarXiv:2304.05370
#11187

Neural Exposure Fusion for High-Dynamic Range Object Detection

Emmanuel Onzon, Maximilian Bömer, Fahim Mannan et al.

CVPR 2024poster
#11188

Semantics Distortion and Style Matter: Towards Source-free UDA for Panoramic Segmentation

Xu Zheng, Pengyuan Zhou, ATHANASIOS et al.

CVPR 2024poster
#11189

PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

Anh-Quan Cao, Angela Dai, Raoul de Charette

CVPR 2024posterarXiv:2312.02158
#11190

Evaluating Transferability in Retrieval Tasks: An Approach Using MMD and Kernel Methods

Mengyu Dai, Amir Hossein Raffiee, Aashish Jain et al.

CVPR 2024poster
#11191

ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks

Kai Han, Yunhe Wang, Jianyuan Guo et al.

CVPR 2024poster
#11192

Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)

Tsu-Ching Hsiao, Hao-Wei Chen, Hsuan-Kung Yang et al.

CVPR 2024posterarXiv:2305.15873
#11193

Communication-Efficient Collaborative Perception via Information Filling with Codebook

Yue Hu, Juntong Peng, Sifei Liu et al.

CVPR 2024posterarXiv:2405.04966
#11194

QUADify: Extracting Meshes with Pixel-level Details and Materials from Images

Maximilian Frühauf, Hayko Riemenschneider, Markus Gross et al.

CVPR 2024highlight
#11195

Enhancing Post-training Quantization Calibration through Contrastive Learning

Yuzhang Shang, Gaowen Liu, Ramana Kompella et al.

CVPR 2024poster
#11196

LASO: Language-guided Affordance Segmentation on 3D Object

Yicong Li, Na Zhao, Junbin Xiao et al.

CVPR 2024poster
#11197

Dispersed Structured Light for Hyperspectral 3D Imaging

Suhyun Shin, Seokjun Choi, Felix Heide et al.

CVPR 2024posterarXiv:2311.18287
#11198

DualAD: Disentangling the Dynamic and Static World for End-to-End Driving

Simon Doll, Niklas Hanselmann, Lukas Schneider et al.

CVPR 2024posterarXiv:2406.06264
#11199

Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training

Qian Li, Yuxiao Hu, Yinpeng Dong et al.

CVPR 2024posterarXiv:2312.07067
#11200

ColorPCR: Color Point Cloud Registration with Multi-Stage Geometric-Color Fusion

Juncheng Mu, Lin Bie, Shaoyi Du et al.

CVPR 2024poster