Most Cited 2024 "gui agents" Papers

12,324 papers found • Page 55 of 62

#10801

Improving Unsupervised Hierarchical Representation with Reinforcement Learning

Ruyi An, Yewen Li, Xu He et al.

CVPR 2024poster
#10802

Global Latent Neural Rendering

Thomas Tanay, Matteo Maggioni

CVPR 2024posterarXiv:2312.08338
#10803

Data Poisoning based Backdoor Attacks to Contrastive Learning

Jinghuai Zhang, Hongbin Liu, Jinyuan Jia et al.

CVPR 2024posterarXiv:2211.08229
#10804

RoHM: Robust Human Motion Reconstruction via Diffusion

Siwei Zhang, Bharat Lal Bhatnagar, Yuanlu Xu et al.

CVPR 2024posterarXiv:2401.08570
#10805

SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

Tao Wu, Runyu He, Gangshan Wu et al.

CVPR 2024posterarXiv:2404.04565
#10806

Classes Are Not Equal: An Empirical Study on Image Recognition Fairness

Jiequan Cui, Beier Zhu, Xin Wen et al.

CVPR 2024posterarXiv:2402.18133
#10807

ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object

Chenshuang Zhang, Fei Pan, Junmo Kim et al.

CVPR 2024highlightarXiv:2403.18775
#10808

BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition

Yuxuan Zhou, Xudong Yan, Zhi-Qi Cheng et al.

CVPR 2024poster
#10809

Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors

Yu Zhang, Songpengcheng Xia, Lei Chu et al.

CVPR 2024posterarXiv:2312.02196
#10810

Person-in-WiFi 3D: End-to-End Multi-Person 3D Pose Estimation with Wi-Fi

Kangwei Yan, Fei Wang, Bo Qian et al.

CVPR 2024poster
#10811

ERMVP: Communication-Efficient and Collaboration-Robust Multi-Vehicle Perception in Challenging Environments

Jingyu Zhang, Kun Yang, Yilei Wang et al.

CVPR 2024poster
#10812

GRAM: Global Reasoning for Multi-Page VQA

Itshak Blau, Sharon Fogel, Roi Ronen et al.

CVPR 2024posterarXiv:2401.03411
#10813

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

Qifan Yu, Juncheng Li, Longhui Wei et al.

CVPR 2024posterarXiv:2311.13614
#10814

Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

Zhiqiang Yan, Yuankai Lin, Kun Wang et al.

CVPR 2024posterarXiv:2403.15008
#10815

MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric

Haokun Lin, Haoli Bai, Zhili Liu et al.

CVPR 2024posterarXiv:2403.07839
#10816

DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach

Dayi Tan, Hansheng Chen, Wei Tian et al.

CVPR 2024poster
#10817

Tumor Micro-environment Interactions Guided Graph Learning for Survival Analysis of Human Cancers from Whole-slide Pathological Images

WEI SHAO, YangYang Shi, Daoqiang Zhang et al.

CVPR 2024poster
#10818

Perception-Oriented Video Frame Interpolation via Asymmetric Blending

Guangyang Wu, Xin Tao, Changlin Li et al.

CVPR 2024posterarXiv:2404.06692
#10819

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Qi Yang, Xing Nie, Tong Li et al.

CVPR 2024highlightarXiv:2312.06462
#10820

Exact Fusion via Feature Distribution Matching for Few-shot Image Generation

Yingbo Zhou, Yutong Ye, Pengyu Zhang et al.

CVPR 2024poster
#10821

Fooling Polarization-Based Vision using Locally Controllable Polarizing Projection

Zhuoxiao Li, Zhihang Zhong, Shohei Nobuhara et al.

CVPR 2024posterarXiv:2303.17890
#10822

Affine Equivariant Networks Based on Differential Invariants

Yikang Li, Yeqing Qiu, Yuxuan Chen et al.

CVPR 2024poster
#10823

Diffusion-based Blind Text Image Super-Resolution

Yuzhe Zhang, jiawei zhang, Hao Li et al.

CVPR 2024posterarXiv:2312.08886
#10824

Improving Generalized Zero-Shot Learning by Exploring the Diverse Semantics from External Class Names

Yapeng Li, Yong Luo, Zengmao Wang et al.

CVPR 2024poster
#10825

Continual Learning for Motion Prediction Model via Meta-Representation Learning and Optimal Memory Buffer Retention Strategy

Dae Jun Kang, Dongsuk Kum, Sanmin Kim

CVPR 2024poster
#10826

FlowDiffuser: Advancing Optical Flow Estimation with Diffusion Models

Ao Luo, XIN LI, Fan Yang et al.

CVPR 2024highlight
#10827

3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting

Zhiyin Qian, Shaofei Wang, Marko Mihajlovic et al.

CVPR 2024posterarXiv:2312.09228
#10828

Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation

Haofeng Liu, Chenshu Xu, Yifei Yang et al.

CVPR 2024posterarXiv:2404.01050
#10829

AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring

Xintian Mao, Xiwen Gao, Yan Wang

CVPR 2024posterarXiv:2406.09135
#10830

Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network

Sizhe Zheng, Pan Gao, Peng Zhou et al.

CVPR 2024posterarXiv:2405.19775
#10831

SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement

Tao Wang, Lei Jin, Zheng Wang et al.

CVPR 2024poster
#10832

Building Vision-Language Models on Solid Foundations with Masked Distillation

Sepehr Sameni, Kushal Kafle, Hao Tan et al.

CVPR 2024poster
#10833

MS-DETR: Efficient DETR Training with Mixed Supervision

Chuyang Zhao, Yifan Sun, Wenhao Wang et al.

CVPR 2024posterarXiv:2401.03989
#10834

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

Pengchong Qiao, Lei Shang, Chang Liu et al.

CVPR 2024posterarXiv:2403.06775
#10835

Towards High-fidelity Artistic Image Vectorization via Texture-Encapsulated Shape Parameterization

Ye Chen, Bingbing Ni, Jinfan Liu et al.

CVPR 2024poster
#10836

OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees

Hakyeong Kim, Andreas Meuleman, Hyeonjoong Jang et al.

CVPR 2024posterarXiv:2404.00678
#10837

Deformable One-shot Face Stylization via DINO Semantic Guidance

Yang Zhou, Zichong Chen, Hui Huang

CVPR 2024posterarXiv:2403.00459
#10838

Density-Guided Semi-Supervised 3D Semantic Segmentation with Dual-Space Hardness Sampling

Jianan Li, Qiulei Dong

CVPR 2024poster
#10839

Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework

Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi et al.

CVPR 2024posterarXiv:2403.07636
#10840

LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP

Yunshi HUANG, Fereshteh Shakeri, Jose Dolz et al.

CVPR 2024posterarXiv:2404.02285
#10841

1-Lipschitz Layers Compared: Memory Speed and Certifiable Robustness

Bernd Prach, Fabio Brau, Giorgio Buttazzo et al.

CVPR 2024poster
#10842

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye

CVPR 2024posterarXiv:2312.00845
#10843

PoNQ: a Neural QEM-based Mesh Representation

Nissim Maruani, Maks Ovsjanikov, Pierre Alliez et al.

CVPR 2024posterarXiv:2403.12870
#10844

M3-UDA: A New Benchmark for Unsupervised Domain Adaptive Fetal Cardiac Structure Detection

Bin Pu, Liwen Wang, Jiewen Yang et al.

CVPR 2024poster
#10845

Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning

Da-Wei Zhou, Hai-Long Sun, Han-Jia Ye et al.

CVPR 2024posterarXiv:2403.12030
#10846

Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss

Jaeha Kim, Junghun Oh, Kyoung Mu Lee

CVPR 2024posterarXiv:2404.01692
#10847

Point-VOS: Pointing Up Video Object Segmentation

Sabarinath Mahadevan, Idil Esen Zulfikar, Paul Voigtlaender et al.

CVPR 2024posterarXiv:2402.05917
#10848

3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow

Felix Taubner, Prashant Raina, Mathieu Tuli et al.

CVPR 2024posterarXiv:2404.09819
#10849

HIT: Estimating Internal Human Implicit Tissues from the Body Surface

Marilyn Keller, Vaibhav ARORA, Abdelmouttaleb Dakri et al.

CVPR 2024poster
#10850

Authentic Hand Avatar from a Phone Scan via Universal Hand Model

Gyeongsik Moon, Weipeng Xu, Rohan Joshi et al.

CVPR 2024posterarXiv:2405.07933
#10851

Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection

Jin Yang, Ping Wei, Huan Li et al.

CVPR 2024posterarXiv:2404.09263
#10852

Multiway Point Cloud Mosaicking with Diffusion and Global Optimization

Shengze Jin, Iro Armeni, Marc Pollefeys et al.

CVPR 2024posterarXiv:2404.00429
#10853

NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images

Yufei Han, Heng Guo, Koki Fukai et al.

CVPR 2024posterarXiv:2406.07111
#10854

HDRFlow: Real-Time HDR Video Reconstruction with Large Motions

Gangwei Xu, Yujin Wang, Jinwei Gu et al.

CVPR 2024posterarXiv:2403.03447
#10855

Beyond Average: Individualized Visual Scanpath Prediction

Xianyu Chen, Ming Jiang, Qi Zhao

CVPR 2024posterarXiv:2404.12235
#10856

Beyond Text: Frozen Large Language Models in Visual Signal Comprehension

Lei Zhu, Fangyun Wei, Yanye Lu

CVPR 2024posterarXiv:2403.07874
#10857

LEDITS++: Limitless Image Editing using Text-to-Image Models

Manuel Brack, Felix Friedrich, Katharina Kornmeier et al.

CVPR 2024posterarXiv:2311.16711
#10858

CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective

Shunsuke Yasuki, Masato Taki

CVPR 2024posterarXiv:2403.06676
#10859

Regularized Parameter Uncertainty for Improving Generalization in Reinforcement Learning

Pehuen Moure, Longbiao Cheng, Joachim Ott et al.

CVPR 2024poster
#10860

Robust Noisy Correspondence Learning with Equivariant Similarity Consistency

Yuchen Yang, Erkun Yang, Likai Wang et al.

CVPR 2024poster
#10861

Situational Awareness Matters in 3D Vision Language Reasoning

Yunze Man, Liang-Yan Gui, Yu-Xiong Wang

CVPR 2024posterarXiv:2406.07544
#10862

Decentralized Directed Collaboration for Personalized Federated Learning

Yingqi Liu, Yifan Shi, Qinglun Li et al.

CVPR 2024posterarXiv:2405.17876
#10863

Task-Driven Wavelets using Constrained Empirical Risk Minimization

Eric Marcus, Ray Sheombarsing, Jan-Jakob Sonke et al.

CVPR 2024poster
#10864

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

Zechuan Zhang, Zongxin Yang, Yi Yang

CVPR 2024highlightarXiv:2312.06704
#10865

OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning

Siddharth Srivastava, Gaurav Sharma

CVPR 2024posterarXiv:2507.13364
#10866

Probing Synergistic High-Order Interaction in Infrared and Visible Image Fusion

Naishan Zheng, Man Zhou, Jie Huang et al.

CVPR 2024poster
#10867

Scaling Up Dynamic Human-Scene Interaction Modeling

Nan Jiang, Zhiyuan Zhang, Hongjie Li et al.

CVPR 2024highlightarXiv:2403.08629
#10868

Utility-Fairness Trade-Offs and How to Find Them

Sepehr Dehdashtian, Bashir Sadeghi, Vishnu Naresh Boddeti

CVPR 2024posterarXiv:2404.09454
#10869

Data-Free Quantization via Pseudo-label Filtering

Chunxiao Fan, Ziqi Wang, Dan Guo et al.

CVPR 2024poster
#10870

Fitting Flats to Flats

Gabriel Dogadov, Ugo Finnendahl, Marc Alexa

CVPR 2024poster
#10871

HOIST-Former: Hand-held Objects Identification Segmentation and Tracking in the Wild

Supreeth Narasimhaswamy, Huy Anh Nguyen, Lihan Huang et al.

CVPR 2024poster
#10872

Animating General Image with Large Visual Motion Model

Dengsheng Chen, Xiaoming Wei, Xiaolin Wei

CVPR 2024poster
#10873

MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning

Yixin Liu, Chenrui Fan, Yutong Dai et al.

CVPR 2024posterarXiv:2311.13127
#10874

EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams

Christen Millerdurai, Hiroyasu Akada, Jian Wang et al.

CVPR 2024posterarXiv:2404.08640
#10875

ModaVerse: Efficiently Transforming Modalities with LLMs

Xinyu Wang, Bohan Zhuang, Qi Wu

CVPR 2024posterarXiv:2401.06395
#10876

Improving Generalization via Meta-Learning on Hard Samples

Nishant Jain, Arun Suggala, Pradeep Shenoy

CVPR 2024posterarXiv:2403.12236
#10877

WaveFace: Authentic Face Restoration with Efficient Frequency Recovery

Yunqi Miao, Jiankang Deng, Jungong Han

CVPR 2024posterarXiv:2403.12760
#10878

Hierarchical Histogram Threshold Segmentation – Auto-terminating High-detail Oversegmentation

Thomas Chang, Simon Seibt, Bartosz von Rymon Lipinski

CVPR 2024poster
#10879

CogAgent: A Visual Language Model for GUI Agents

Wenyi Hong, Weihan Wang, Qingsong Lv et al.

CVPR 2024highlightarXiv:2312.08914
#10880

Learning Adaptive Spatial Coherent Correlations for Speech-Preserving Facial Expression Manipulation

Tianshui Chen, Jianman Lin, Zhijing Yang et al.

CVPR 2024highlight
#10881

UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and Unfavorable Sets

Youngju Na, Woo Jae Kim, Kyu Han et al.

CVPR 2024posterarXiv:2403.05086
#10882

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Lihe Yang, Bingyi Kang, Zilong Huang et al.

CVPR 2024posterarXiv:2401.10891
#10883

EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition

Xu Zheng, Addison, Lin Wang

CVPR 2024posterarXiv:2403.14082
#10884

Real-World Efficient Blind Motion Deblurring via Blur Pixel Discretization

Insoo Kim, Jae Seok Choi, Geonseok Seo et al.

CVPR 2024posterarXiv:2404.12168
#10885

Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers

Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim

CVPR 2024posterarXiv:2403.10030
#10886

BANF: Band-Limited Neural Fields for Levels of Detail Reconstruction

Ahan Shabanov, Shrisudhan Govindarajan, Cody Reading et al.

CVPR 2024posterarXiv:2404.13024
#10887

HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting

Hongyu Zhou, Jiahao Shao, Lu Xu et al.

CVPR 2024posterarXiv:2403.12722
#10888

Human Motion Prediction Under Unexpected Perturbation

Jiangbei Yue, Baiyi Li, Julien Pettré et al.

CVPR 2024highlightarXiv:2403.15891
#10889

LLMs are Good Action Recognizers

Haoxuan Qu, Yujun Cai, Jun Liu

CVPR 2024posterarXiv:2404.00532
#10890

SynFog: A Photo-realistic Synthetic Fog Dataset based on End-to-end Imaging Simulation for Advancing Real-World Defogging in Autonomous Driving

Yiming Xie, Henglu Wei, Zhenyi Liu et al.

CVPR 2024posterarXiv:2403.17094
#10891

NeRFiller: Completing Scenes via Generative 3D Inpainting

Ethan Weber, Aleksander Holynski, Varun Jampani et al.

CVPR 2024posterarXiv:2312.04560
#10892

PeVL: Pose-Enhanced Vision-Language Model for Fine-Grained Human Action Recognition

Haosong Zhang, Mei Leong, Liyuan Li et al.

CVPR 2024poster
#10893

MPOD123: One Image to 3D Content Generation Using Mask-enhanced Progressive Outline-to-Detail Optimization

Jimin Xu, Tianbao Wang, Tao Jin et al.

CVPR 2024poster
#10894

Look-Up Table Compression for Efficient Image Restoration

Yinglong Li, Jiacheng Li, Zhiwei Xiong

CVPR 2024highlight
#10895

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

Wenhao Li, Mengyuan Liu, Hong Liu et al.

CVPR 2024highlightarXiv:2311.12028
#10896

PAPR in Motion: Seamless Point-level 3D Scene Interpolation

Shichong Peng, Yanshu Zhang, Ke Li

CVPR 2024highlightarXiv:2406.05533
#10897

Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods

Chenfan Qu, Yiwu Zhong, Chongyu Liu et al.

CVPR 2024poster
#10898

Dense Vision Transformer Compression with Few Samples

Hanxiao Zhang, Yifan Zhou, Guo-Hua Wang

CVPR 2024posterarXiv:2403.18708
#10899

IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing

Shaofei Wang, Bozidar Antic, Andreas Geiger et al.

CVPR 2024posterarXiv:2312.05210
#10900

Exploring Pose-Aware Human-Object Interaction via Hybrid Learning

EASTMAN Z Y WU, Yali Li, Yuan Wang et al.

CVPR 2024poster
#10901

All in One Framework for Multimodal Re-identification in the Wild

He Li, Mang Ye, Ming Zhang et al.

CVPR 2024posterarXiv:2405.04741
#10902

Bilateral Adaptation for Human-Object Interaction Detection with Occlusion-Robustness

Guangzhi Wang, Yangyang Guo, Ziwei Xu et al.

CVPR 2024poster
#10903

TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model

Hantao Yao, Rui Zhang, Changsheng Xu

CVPR 2024poster
#10904

RMT: Retentive Networks Meet Vision Transformers

Qihang Fan, Huaibo Huang, Mingrui Chen et al.

CVPR 2024posterarXiv:2309.11523
#10905

FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning

Gihun Lee, Minchan Jeong, SangMook Kim et al.

CVPR 2024posterarXiv:2308.12532
#10906

Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs

Lin Song, Yukang Chen, Shuai Yang et al.

CVPR 2024poster
#10907

LAENeRF: Local Appearance Editing for Neural Radiance Fields

Lukas Radl, Michael Steiner, Andreas Kurz et al.

CVPR 2024posterarXiv:2312.09913
#10908

Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting

Haiwei Chen, Yajie Zhao

CVPR 2024posterarXiv:2403.18186
#10909

Improved Visual Grounding through Self-Consistent Explanations

Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang et al.

CVPR 2024posterarXiv:2312.04554
#10910

On the Faithfulness of Vision Transformer Explanations

Junyi Wu, Weitai Kang, Hao Tang et al.

CVPR 2024posterarXiv:2404.01415
#10911

CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization

Yao Ni, Piotr Koniusz

CVPR 2024posterarXiv:2404.00521
#10912

OneFormer3D: One Transformer for Unified Point Cloud Segmentation

Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin et al.

CVPR 2024posterarXiv:2311.14405
#10913

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

Minghua Liu, Ruoxi Shi, Linghao Chen et al.

CVPR 2024posterarXiv:2311.07885
#10914

C2KD: Bridging the Modality Gap for Cross-Modal Knowledge Distillation

Fushuo Huo, Wenchao Xu, Jingcai Guo et al.

CVPR 2024highlight
#10915

StrokeFaceNeRF: Stroke-based Facial Appearance Editing in Neural Radiance Field

Xiao-juan Li, Dingxi Zhang, Shu-Yu Chen et al.

CVPR 2024poster
#10916

Neural Modes: Self-supervised Learning of Nonlinear Modal Subspaces

Jiahong Wang, Yinwei DU, Stelian Coros et al.

CVPR 2024posterarXiv:2404.17620
#10917

WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

Soyong Shin, Juyong Kim, Eni Halilaj et al.

CVPR 2024posterarXiv:2312.07531
#10918

CLOAF: CoLlisiOn-Aware Human Flow

Andrey Davydov, Martin Engilberge, Mathieu Salzmann et al.

CVPR 2024posterarXiv:2403.09050
#10919

FedUV: Uniformity and Variance for Heterogeneous Federated Learning

Ha Min Son, Moon-Hyun Kim, Tai-Myoung Chung et al.

CVPR 2024posterarXiv:2402.18372
#10920

Learning Occupancy for Monocular 3D Object Detection

Liang Peng, Junkai Xu, Haoran Cheng et al.

CVPR 2024posterarXiv:2305.15694
#10921

Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow

Hanyu Zhou, Yi Chang, Zhiwei Shi

CVPR 2024posterarXiv:2403.07432
#10922

Language-driven Grasp Detection

An Dinh Vuong, Minh Nhat VU, Baoru Huang et al.

CVPR 2024posterarXiv:2406.09489
#10923

Each Test Image Deserves A Specific Prompt: Continual Test-Time Adaptation for 2D Medical Image Segmentation

Ziyang Chen, Yongsheng Pan, Yiwen Ye et al.

CVPR 2024posterarXiv:2311.18363
#10924

Abductive Ego-View Accident Video Understanding for Safe Driving Perception

Jianwu Fang, Lei-lei Li, Junfei Zhou et al.

CVPR 2024highlightarXiv:2403.00436
#10925

Prompting Vision Foundation Models for Pathology Image Analysis

CHONG YIN, Siqi Liu, Kaiyang Zhou et al.

CVPR 2024poster
#10926

Unmixing Before Fusion: A Generalized Paradigm for Multi-Source-based Hyperspectral Image Synthesis

Yang Yu, Erting Pan, Xinya Wang et al.

CVPR 2024poster
#10927

Navigating Beyond Dropout: An Intriguing Solution towards Generalizable Image Super Resolution

Hongjun Wang, Jiyuan Chen, Yinqiang Zheng et al.

CVPR 2024posterarXiv:2402.18929
#10928

Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch

Xidong Wu, Shangqian Gao, Zeyu Zhang et al.

CVPR 2024posterarXiv:2403.14729
#10929

SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image

Yunhao Li, Xiaodong Wang, Ping Wang et al.

CVPR 2024highlightarXiv:2403.20018
#10930

Learning to Control Camera Exposure via Reinforcement Learning

Kyunghyun Lee, Ukcheol Shin, Byeong-Uk Lee

CVPR 2024posterarXiv:2404.01636
#10931

Regressor-Segmenter Mutual Prompt Learning for Crowd Counting

Mingyue Guo, Li Yuan, Zhaoyi Yan et al.

CVPR 2024posterarXiv:2312.01711
#10932

Vector Graphics Generation via Mutually Impulsed Dual-domain Diffusion

Zhongyin Zhao, Ye Chen, Zhangli Hu et al.

CVPR 2024poster
#10933

Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation

Dongliang Cao, Marvin Eisenberger, Nafie El Amrani et al.

CVPR 2024posterarXiv:2402.18920
#10934

FocSAM: Delving Deeply into Focused Objects in Segmenting Anything

You Huang, Zongyu Lan, Liujuan Cao et al.

CVPR 2024posterarXiv:2405.18706
#10935

Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior

Chen Cheng, Xiaofeng Yang, Fan Yang et al.

CVPR 2024posterarXiv:2403.09140
#10936

Learning to Transform Dynamically for Better Adversarial Transferability

Rongyi Zhu, Zeliang Zhang, Susan Liang et al.

CVPR 2024posterarXiv:2405.14077
#10937

Spin-UP: Spin Light for Natural Light Uncalibrated Photometric Stereo

Zongrui Li, Zhan Lu, Haojie Yan et al.

CVPR 2024posterarXiv:2404.01612
#10938

SEAS: ShapE-Aligned Supervision for Person Re-Identification

Haidong Zhu, Pranav Budhwant, Zhaoheng Zheng et al.

CVPR 2024poster
#10939

Learning to Select Views for Efficient Multi-View Understanding

Yunzhong Hou, Stephen Gould, Liang Zheng

CVPR 2024poster
#10940

LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes

Shanlin Sun, Bingbing Zhuang, Ziyu Jiang et al.

CVPR 2024highlightarXiv:2405.00900
#10941

Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships

Rangel Daroya, Aaron Sun, Subhransu Maji

CVPR 2024highlightarXiv:2403.17173
#10942

UniGS: Unified Representation for Image Generation and Segmentation

Lu Qi, Lehan Yang, Weidong Guo et al.

CVPR 2024posterarXiv:2312.01985
#10943

ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations

Rwiddhi Chakraborty, Adrian de Sena Sletten, Michael C. Kampffmeyer

CVPR 2024posterarXiv:2403.13870
#10944

DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling

Miguel Fainstein, Viviana Siless, Emmanuel Iarussi

CVPR 2024posterarXiv:2402.08876
#10945

UV-IDM: Identity-Conditioned Latent Diffusion Model for Face UV-Texture Generation

Hong Li, Yutang Feng, Song Xue et al.

CVPR 2024poster
#10946

PBWR: Parametric-Building-Wireframe Reconstruction from Aerial LiDAR Point Clouds

Shangfeng Huang, Ruisheng Wang, Bo Guo et al.

CVPR 2024poster
#10947

GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation Demonstration and Imitation

Zifan Wang, Junyu Chen, Ziqing Chen et al.

CVPR 2024poster
#10948

Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth

Zhaoyang Sun, Shengwu Xiong, Yaxiong Chen et al.

CVPR 2024posterarXiv:2405.17240
#10949

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

Yanwu Xu, Yang Zhao, Zhisheng Xiao et al.

CVPR 2024highlightarXiv:2311.09257
#10950

SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation

Keqi Chen, vinkle srivastav, Nicolas Padoy

CVPR 2024posterarXiv:2404.02041
#10951

Context-Aware Integration of Language and Visual References for Natural Language Tracking

Yanyan Shao, Shuting He, Qi Ye et al.

CVPR 2024posterarXiv:2403.19975
#10952

Open-Vocabulary Segmentation with Semantic-Assisted Calibration

Yong Liu, Sule Bai, Guanbin Li et al.

CVPR 2024posterarXiv:2312.04089
#10953

Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts

Cansu Korkmaz, Ahmet Murat Tekalp, Zafer Dogan

CVPR 2024posterarXiv:2402.19215
#10954

RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception

Ruiyang Hao, Siqi Fan, Yingru Dai et al.

CVPR 2024posterarXiv:2403.10145
#10955

Task-Customized Mixture of Adapters for General Image Fusion

Pengfei Zhu, Yang Sun, Bing Cao et al.

CVPR 2024posterarXiv:2403.12494
#10956

PointBeV: A Sparse Approach for BeV Predictions

Loick Chambon, Éloi Zablocki, Mickaël Chen et al.

CVPR 2024posterarXiv:2312.00703
#10957

Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution

Zhikai Chen, Fuchen Long, Zhaofan Qiu et al.

CVPR 2024posterarXiv:2403.17000
#10958

Ensemble Diversity Facilitates Adversarial Transferability

Bowen Tang, Zheng Wang, Yi Bin et al.

CVPR 2024poster
#10959

CFAT: Unleashing Triangular Windows for Image Super-resolution

Abhisek Ray, Gaurav Kumar, Maheshkumar Kolekar

CVPR 2024highlight
#10960

Convolutional Prompting meets Language Models for Continual Learning

Anurag Roy, Riddhiman Moulick, Vinay Verma et al.

CVPR 2024posterarXiv:2403.20317
#10961

Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

Zihua Zhao, Mengxi Chen, Tianjie Dai et al.

CVPR 2024posterarXiv:2405.16996
#10962

Contextual Augmented Global Contrast for Multimodal Intent Recognition

Kaili Sun, Zhiwen Xie, Mang Ye et al.

CVPR 2024poster
#10963

Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering

Vivek Gopalakrishnan, Neel Dey, Polina Golland

CVPR 2024posterarXiv:2312.06358
#10964

Relaxed Contrastive Learning for Federated Learning

Seonguk Seo, Jinkyu Kim, Geeho Kim et al.

CVPR 2024posterarXiv:2401.04928
#10965

EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto et al.

CVPR 2024posterarXiv:2311.15879
#10966

LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation

Kibum Kim, Kanghoon Yoon, Jaehyeong Jeon et al.

CVPR 2024posterarXiv:2310.10404
#10967

Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding

Guofeng Mei, Luigi Riz, Yiming Wang et al.

CVPR 2024highlightarXiv:2312.02244
#10968

Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer Examples

Yuyang Yu, Bangzhen Liu, Chenxi Zheng et al.

CVPR 2024poster
#10969

Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations

Daan de Geus, Gijs Dubbelman

CVPR 2024posterarXiv:2406.10114
#10970

FreeKD: Knowledge Distillation via Semantic Frequency Prompt

Yuan Zhang, Tao Huang, Jiaming Liu et al.

CVPR 2024posterarXiv:2311.12079
#10971

Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning

Wei Zhang, Chaoqun Wan, Tongliang Liu et al.

CVPR 2024poster
#10972

SNIDA: Unlocking Few-Shot Object Detection with Non-linear Semantic Decoupling Augmentation

Yanjie Wang, Xu Zou, Luxin Yan et al.

CVPR 2024poster
#10973

Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching

Peng Xu, Zhiyu Xiang, Chengyu Qiao et al.

CVPR 2024posterarXiv:2306.15612
#10974

Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance

Junkai Fan, Jiangwei Weng, Kun Wang et al.

CVPR 2024posterarXiv:2405.09996
#10975

Exploring Region-Word Alignment in Built-in Detector for Open-Vocabulary Object Detection

Heng Zhang, Qiuyu Zhao, Linyu Zheng et al.

CVPR 2024poster
#10976

L0-Sampler: An L0 Model Guided Volume Sampling for NeRF

Liangchen Li, Juyong Zhang

CVPR 2024poster
#10977

Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features

Niladri Shekhar Dutt, Sanjeev Muralikrishnan, Niloy J. Mitra

CVPR 2024posterarXiv:2311.17024
#10978

Unsupervised Occupancy Learning from Sparse Point Cloud

Amine Ouasfi, Adnane Boukhayma

CVPR 2024highlightarXiv:2404.02759
#10979

GLOW: Global Layout Aware Attacks on Object Detection

Jun Bao, Buyu Liu, Kui Ren et al.

CVPR 2024posterarXiv:2302.14166
#10980

Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning

Yun Li, Zhe Liu, Hang Chen et al.

CVPR 2024posterarXiv:2402.17251
#10981

Neural Underwater Scene Representation

Yunkai Tang, Chengxuan Zhu, Renjie Wan et al.

CVPR 2024poster
#10982

Scaled Decoupled Distillation

Shicai Wei, Chunbo Luo, Yang Luo

CVPR 2024poster
#10983

VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens

Fan Ma, Xiaojie Jin, Heng Wang et al.

CVPR 2024posterarXiv:2312.08870
#10984

Hierarchical Intra-modal Correlation Learning for Label-free 3D Semantic Segmentation

Xin Kang, Lei Chu, Jiahao Li et al.

CVPR 2024poster
#10985

PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving

Xinshuo Weng, Boris Ivanovic, Yan Wang et al.

CVPR 2024poster
#10986

Towards Generalizable Tumor Synthesis

Qi Chen, Xiaoxi Chen, Haorui Song et al.

CVPR 2024posterarXiv:2402.19470
#10987

Adaptive Hyper-graph Aggregation for Modality-Agnostic Federated Learning

Fan Qi, Shuai Li

CVPR 2024poster
#10988

Bi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-based 3D Semantic Scene Completion

Yujie Xue, Ruihui Li, F anWu et al.

CVPR 2024poster
#10989

Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment

Angchi Xu, Wei-Shi Zheng

CVPR 2024posterarXiv:2403.19225
#10990

Depth-Aware Concealed Crop Detection in Dense Agricultural Scenes

Liqiong Wang, Jinyu Yang, Yanfu Zhang et al.

CVPR 2024poster
#10991

FC-GNN: Recovering Reliable and Accurate Correspondences from Interferences

Haobo Xu, Jun Zhou, Hua Yang et al.

CVPR 2024poster
#10992

MoMask: Generative Masked Modeling of 3D Human Motions

chuan guo, Yuxuan Mu, Muhammad Gohar Javed et al.

CVPR 2024posterarXiv:2312.00063
#10993

CapsFusion: Rethinking Image-Text Data at Scale

Qiying Yu, Quan Sun, Xiaosong Zhang et al.

CVPR 2024posterarXiv:2310.20550
#10994

A General and Efficient Training for Transformer via Token Expansion

Wenxuan Huang, Yunhang Shen, Jiao Xie et al.

CVPR 2024posterarXiv:2404.00672
#10995

BigGait: Learning Gait Representation You Want by Large Vision Models

Dingqiang Ye, Chao Fan, Jingzhe Ma et al.

CVPR 2024posterarXiv:2402.19122
#10996

Event-based Visible and Infrared Fusion via Multi-task Collaboration

Mengyue Geng, Lin Zhu, Lizhi Wang et al.

CVPR 2024poster
#10997

Breathing Life Into Sketches Using Text-to-Video Priors

Rinon Gal, Yael Vinker, Yuval Alaluf et al.

CVPR 2024highlightarXiv:2311.13608
#10998

Gaussian Shell Maps for Efficient 3D Human Generation

Rameen Abdal, Wang Yifan, Zifan Shi et al.

CVPR 2024posterarXiv:2311.17857
#10999

Byzantine-robust Decentralized Federated Learning via Dual-domain Clustering and Trust Bootstrapping

Peng Sun, Xinyang Liu, Zhibo Wang et al.

CVPR 2024poster
#11000

MotionEditor: Editing Video Motion via Content-Aware Diffusion

Shuyuan Tu, Qi Dai, Zhi-Qi Cheng et al.

CVPR 2024posterarXiv:2311.18830