Most Cited 2024 "gui agents" Papers

12,324 papers found • Page 53 of 62

#10401

Are Conventional SNNs Really Efficient? A Perspective from Network Quantization

Guobin Shen, Dongcheng Zhao, Tenglong Li et al.

CVPR 2024highlight
#10402

RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation

Zeyuan Yang, LIU JIAGENG, Peihao Chen et al.

CVPR 2024poster
#10403

Sharingan: A Transformer Architecture for Multi-Person Gaze Following

Samy Tafasca, Anshul Gupta, Jean-marc Odobez

CVPR 2024poster
#10404

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation

Bohao Peng, Xiaoyang Wu, Li Jiang et al.

CVPR 2024posterarXiv:2403.14418
#10405

Dynamic Support Information Mining for Category-Agnostic Pose Estimation

Pengfei Ren, Yuanyuan Gao, Haifeng Sun et al.

CVPR 2024poster
#10406

Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness

Sibo Wang, Jie Zhang, Zheng Yuan et al.

CVPR 2024posterarXiv:2401.04350
#10407

MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation

Zhicheng Zhang, Pancheng Zhao, Eunil Park et al.

CVPR 2024poster
#10408

CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection

Jiayi Zhu, Qing Guo, Felix Juefei Xu et al.

CVPR 2024posterarXiv:2403.18554
#10409

Neural Clustering based Visual Representation Learning

Guikun Chen, Xia Li, Yi Yang et al.

CVPR 2024posterarXiv:2403.17409
#10410

ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models

Jeong-gi Kwak, Erqun Dong, Yuhe Jin et al.

CVPR 2024highlightarXiv:2312.01305
#10411

CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training

Yuxin Guo, Siyang Sun, Shuailei Ma et al.

CVPR 2024poster
#10412

CapHuman: Capture Your Moments in Parallel Universes

Chao Liang, Fan Ma, Linchao Zhu et al.

CVPR 2024posterarXiv:2402.00627
#10413

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

Yicheng Xiao, Zhuoyan Luo, Yong Liu et al.

CVPR 2024posterarXiv:2311.16464
#10414

ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images

Nicolas Bourriez, Ihab Bendidi, Cohen Ethan et al.

CVPR 2024posterarXiv:2311.15264
#10415

Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation

Hang Li, Chengzhi Shen, Philip H.S. Torr et al.

CVPR 2024posterarXiv:2311.17216
#10416

VS: Reconstructing Clothed 3D Human from Single Image via Vertex Shift

Leyuan Liu, Yuhan Li, Yunqi Gao et al.

CVPR 2024poster
#10417

Towards Automatic Power Battery Detection: New Challenge Benchmark Dataset and Baseline

Xiaoqi Zhao, Youwei Pang, Zhenyu Chen et al.

CVPR 2024posterarXiv:2312.02528
#10418

Point Transformer V3: Simpler Faster Stronger

Xiaoyang Wu, Li Jiang, Peng-Shuai Wang et al.

CVPR 2024poster
#10419

Improving Distant 3D Object Detection Using 2D Box Supervision

Zetong Yang, Zhiding Yu, Christopher Choy et al.

CVPR 2024posterarXiv:2403.09230
#10420

Infrared Small Target Detection with Scale and Location Sensitivity

Qiankun Liu, Rui Liu, Bolun Zheng et al.

CVPR 2024posterarXiv:2403.19366
#10421

Wonder3D: Single Image to 3D using Cross-Domain Diffusion

Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin et al.

CVPR 2024highlightarXiv:2310.15008
#10422

Honeybee: Locality-enhanced Projector for Multimodal LLM

Junbum Cha, Woo-Young Kang, Jonghwan Mun et al.

CVPR 2024highlightarXiv:2312.06742
#10423

Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation

Hoang Chuong Nguyen, Tianyu Wang, Jose M. Alvarez et al.

CVPR 2024posterarXiv:2404.14908
#10424

SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers

Jonathan F. Carter, Joao Jorge, Oliver Gibson et al.

CVPR 2024highlightarXiv:2404.03831
#10425

Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences

Seungwook Kim, Kejie Li, Xueqing Deng et al.

CVPR 2024posterarXiv:2404.10603
#10426

Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

Haochen Han, Qinghua Zheng, Guang Dai et al.

CVPR 2024posterarXiv:2403.05105
#10427

EVS-assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling

Rui Jiang, Fangwen Tu, Yixuan Long et al.

CVPR 2024poster
#10428

Open-World Semantic Segmentation Including Class Similarity

Matteo Sodano, Federico Magistri, Lucas Nunes et al.

CVPR 2024posterarXiv:2403.07532
#10429

Empowering Resampling Operation for Ultra-High-Definition Image Enhancement with Model-Aware Guidance

Yu, Jie Huang, Li et al.

CVPR 2024poster
#10430

READ: Retrieval-Enhanced Asymmetric Diffusion for Motion Planning

Takeru Oba, Matthew Walter, Norimichi Ukita

CVPR 2024poster
#10431

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

Rongjie Li, Songyang Zhang, Dahua Lin et al.

CVPR 2024posterarXiv:2404.00906
#10432

MeshPose: Unifying DensePose and 3D Body Mesh Reconstruction

Eric-Tuan Le, Antonios Kakolyris, Petros Koutras et al.

CVPR 2024poster
#10433

Bayesian Differentiable Physics for Cloth Digitalization

Deshan Gong, Ningtao Mao, He Wang

CVPR 2024posterarXiv:2402.17664
#10434

MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation

Xiaolong Deng, Huisi Wu, Runhao Zeng et al.

CVPR 2024poster
#10435

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

Zeyinzi Jiang, Chaojie Mao, Yulin Pan et al.

CVPR 2024highlightarXiv:2312.11392
#10436

OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

Xinyu Zhan, Lixin Yang, Yifei Zhao et al.

CVPR 2024posterarXiv:2403.19417
#10437

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

Jiasen Lu, Christopher Clark, Sangho Lee et al.

CVPR 2024highlight
#10438

PTQ4SAM: Post-Training Quantization for Segment Anything

Chengtao Lv, Hong Chen, Jinyang Guo et al.

CVPR 2024posterarXiv:2405.03144
#10439

Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning

Zichen Miao, Jiang Wang, Ze Wang et al.

CVPR 2024poster
#10440

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Zetong Yang, Li Chen, Yanan Sun et al.

CVPR 2024highlightarXiv:2312.17655
#10441

Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving

JINLONG LI, Baolu Li, Zhengzhong Tu et al.

CVPR 2024posterarXiv:2404.04804
#10442

Elite360D: Towards Efficient 360 Depth Estimation via Semantic- and Distance-Aware Bi-Projection Fusion

Hao Ai, Addison, Lin Wang

CVPR 2024posterarXiv:2403.16376
#10443

Learning Triangular Distribution in Visual World

Ping Chen, Xingpeng Zhang, Chengtao Zhou et al.

CVPR 2024posterarXiv:2311.18605
#10444

Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models

Nikita Starodubcev, Dmitry Baranchuk, Artem Fedorov et al.

CVPR 2024posterarXiv:2312.10835
#10445

GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds

Prashant Kumar, Kshitij Madhav Bhat, Vedang Bhupesh Shenvi Nadkarni et al.

CVPR 2024posterarXiv:2312.00068
#10446

HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation

Xin Huang, Ruizhi Shao, Qi Zhang et al.

CVPR 2024posterarXiv:2310.01406
#10447

Unbiased Estimator for Distorted Conics in Camera Calibration

Chaehyeon Song, Jaeho Shin, Myung-Hwan Jeon et al.

CVPR 2024highlightarXiv:2403.04583
#10448

Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding

Chaolei Tan, Jianhuang Lai, Wei-Shi Zheng et al.

CVPR 2024posterarXiv:2403.11463
#10449

Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain

Qunliang Xing, Mai Xu, Shengxi Li et al.

CVPR 2024posterarXiv:2402.17200
#10450

TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process

Zhiyuan Ren, Minchul Kim, Feng Liu et al.

CVPR 2024poster
#10451

HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video

Zicong Fan, Maria Parelli, Maria Kadoglou et al.

CVPR 2024highlightarXiv:2311.18448
#10452

Learning Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification

Zhenyu Cui, Jiahuan Zhou, Xun Wang et al.

CVPR 2024poster
#10453

LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation

Linfeng Yuan, Miaojing Shi, Zijie Yue et al.

CVPR 2024posterarXiv:2306.08736
#10454

Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos

Mehmet Saygin Seyfioglu, Wisdom Ikezogwo, Fatemeh Ghezloo et al.

CVPR 2024posterarXiv:2312.04746
#10455

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

Lingmin Ran, Xiaodong Cun, Jia-Wei Liu et al.

CVPR 2024posterarXiv:2312.02238
#10456

AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving

Mingfu Liang, Jong-Chyi Su, Samuel Schulter et al.

CVPR 2024posterarXiv:2403.17373
#10457

Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models

Pablo Marcos-Manchón, Roberto Alcover-Couso, Juan SanMiguel et al.

CVPR 2024posterarXiv:2403.14291
#10458

Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing

Dongyoung Kim, Jinwoo Kim, Junsang Yu et al.

CVPR 2024posterarXiv:2402.18277
#10459

Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning

Wenjin Hou, Shiming Chen, Shuhuang Chen et al.

CVPR 2024posterarXiv:2404.14808
#10460

A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network

Ruichen Ma, Guanchao Qiao, Yian Liu et al.

CVPR 2024posterarXiv:2403.03739
#10461

OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies

Lingdong Kong, Youquan Liu, Lai Xing Ng et al.

CVPR 2024highlightarXiv:2405.05259
#10462

Z*: Zero-shot Style Transfer via Attention Reweighting

Yingying Deng, Xiangyu He, Fan Tang et al.

CVPR 2024poster
#10463

G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis

Yufei Ye, Abhinav Gupta, Kris Kitani et al.

CVPR 2024posterarXiv:2404.12383
#10464

3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation

Zidu Wang, Xiangyu Zhu, Tianshuo Zhang et al.

CVPR 2024highlightarXiv:2312.00311
#10465

Spike-guided Motion Deblurring with Unknown Modal Spatiotemporal Alignment

Jiyuan Zhang, Shiyan Chen, Yajing Zheng et al.

CVPR 2024poster
#10466

Multi-Task Dense Prediction via Mixture of Low-Rank Experts

Yuqi Yang, Peng-Tao Jiang, Qibin Hou et al.

CVPR 2024posterarXiv:2403.17749
#10467

A Bayesian Approach to OOD Robustness in Image Classification

Prakhar Kaushik, Adam Kortylewski, Alan L. Yuille

CVPR 2024posterarXiv:2403.07277
#10468

ConCon-Chi: Concept-Context Chimera Benchmark for Personalized Vision-Language Tasks

Andrea Rosasco, Stefano Berti, Giulia Pasquale et al.

CVPR 2024poster
#10469

Instance-aware Contrastive Learning for Occluded Human Mesh Reconstruction

Mi-Gyeong Gwon, Gi-Mun Um, Won-Sik Cheong et al.

CVPR 2024poster
#10470

DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction

Jaehyeok Shim, Kyungdon Joo

CVPR 2024posterarXiv:2403.05005
#10471

UniMODE: Unified Monocular 3D Object Detection

Zhuoling Li, Xiaogang Xu, Ser-Nam Lim et al.

CVPR 2024highlight
#10472

Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation

Mukul Khanna, Yongsen Mao, Hanxiao Jiang et al.

CVPR 2024posterarXiv:2306.11290
#10473

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

Yipeng Gao, Zeyu Wang, Wei-Shi Zheng et al.

CVPR 2024posterarXiv:2311.01734
#10474

KeyPoint Relative Position Encoding for Face Recognition

Minchul Kim, Feng Liu, Yiyang Su et al.

CVPR 2024posterarXiv:2403.14852
#10475

QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition

Xiang Li, Jinglu Wang, Xiaohao Xu et al.

CVPR 2024posterarXiv:2310.00132
#10476

From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration

Zekun Qian, Ruize Han, Wei Feng et al.

CVPR 2024posterarXiv:2212.09298
#10477

Joint2Human: High-Quality 3D Human Generation via Compact Spherical Embedding of 3D Joints

Muxin Zhang, Qiao Feng, Zhuo Su et al.

CVPR 2024posterarXiv:2312.08591
#10478

Investigating Compositional Challenges in Vision-Language Models for Visual Grounding

Yunan Zeng, Yan Huang, Jinjin Zhang et al.

CVPR 2024highlight
#10479

SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation

Chen Sichen, Yingyi Zhang, Siming Huang et al.

CVPR 2024posterarXiv:2404.03518
#10480

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis

Feng Liang, Bichen Wu, Jialiang Wang et al.

CVPR 2024highlightarXiv:2312.17681
#10481

DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving

Chen Min, Dawei Zhao, Liang Xiao et al.

CVPR 2024posterarXiv:2405.04390
#10482

Accept the Modality Gap: An Exploration in the Hyperbolic Space

Sameera Ramasinghe, Violetta Shevchenko, Gil Avraham et al.

CVPR 2024highlight
#10483

MirageRoom: 3D Scene Segmentation with 2D Pre-trained Models by Mirage Projection

Haowen Sun, Yueqi Duan, Juncheng Yan et al.

CVPR 2024highlight
#10484

CAD: Photorealistic 3D Generation via Adversarial Distillation

Ziyu Wan, Despoina Paschalidou, Ian Huang et al.

CVPR 2024posterarXiv:2312.06663
#10485

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.

CVPR 2024posterarXiv:2309.00610
#10486

Noisy-Correspondence Learning for Text-to-Image Person Re-identification

Yang Qin, Yingke Chen, Dezhong Peng et al.

CVPR 2024posterarXiv:2308.09911
#10487

Random Entangled Tokens for Adversarially Robust Vision Transformer

Huihui Gong, Minjing Dong, Siqi Ma et al.

CVPR 2024poster
#10488

PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos

Qi Zhao, M. Salman Asif, Zhan Ma

CVPR 2024posterarXiv:2404.08921
#10489

OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition

Tongjia Chen, Hongshan Yu, Zhengeng Yang et al.

CVPR 2024posterarXiv:2312.00096
#10490

DYSON: Dynamic Feature Space Self-Organization for Online Task-Free Class Incremental Learning

Yuhang He, YingJie Chen, Yuhan Jin et al.

CVPR 2024poster
#10491

Harnessing Large Language Models for Training-free Video Anomaly Detection

Luca Zanella, Willi Menapace, Massimiliano Mancini et al.

CVPR 2024posterarXiv:2404.01014
#10492

Continuous Pose for Monocular Cameras in Neural Implicit Representation

Qi Ma, Danda Paudel, Ajad Chhatkuli et al.

CVPR 2024posterarXiv:2311.17119
#10493

Learned Trajectory Embedding for Subspace Clustering

Yaroslava Lochman, Christopher Zach, Carl Olsson

CVPR 2024poster
#10494

BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning

Siyuan Liang, Mingli Zhu, Aishan Liu et al.

CVPR 2024highlightarXiv:2311.12075
#10495

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis

Yuchao Gu, Xintao Wang, Yixiao Ge et al.

CVPR 2024posterarXiv:2212.03185
#10496

Weakly Supervised Video Individual Counting

Xinyan Liu, Guorong Li, Yuankai Qi et al.

CVPR 2024poster
#10497

FairRAG: Fair Human Generation via Fair Retrieval Augmentation

Robik Shrestha, Yang Zou, Qiuyu Chen et al.

CVPR 2024posterarXiv:2403.19964
#10498

MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections

mude hui, Zihao Wei, Hongru Zhu et al.

CVPR 2024posterarXiv:2403.10815
#10499

Learning Inclusion Matching for Animation Paint Bucket Colorization

Yuekun Dai, Shangchen Zhou, Blake Li et al.

CVPR 2024posterarXiv:2403.18342
#10500

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

Yixun Liang, Xin Yang, Jiantao Lin et al.

CVPR 2024highlightarXiv:2311.11284
#10501

Preserving Fairness Generalization in Deepfake Detection

Li Lin, Xinan He, Yan Ju et al.

CVPR 2024posterarXiv:2402.17229
#10502

RepViT: Revisiting Mobile CNN From ViT Perspective

Ao Wang, Hui Chen, Zijia Lin et al.

CVPR 2024posterarXiv:2307.09283
#10503

Improved Implicit Neural Representation with Fourier Reparameterized Training

Kexuan Shi, Xingyu Zhou, Shuhang Gu

CVPR 2024posterarXiv:2401.07402
#10504

Gradient Alignment for Cross-Domain Face Anti-Spoofing

MINH BINH LE, Simon Woo

CVPR 2024posterarXiv:2402.18817
#10505

U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation

You Wu, Kean Liu, Xiaoyue Mi et al.

CVPR 2024posterarXiv:2403.20231
#10506

Insights from the Use of Previously Unseen Neural Architecture Search Datasets

Rob Geada, David Towers, Matthew Forshaw et al.

CVPR 2024posterarXiv:2404.02189
#10507

A Pedestrian is Worth One Prompt: Towards Language Guidance Person Re-Identification

Zexian Yang, Dayan Wu, Chenming Wu et al.

CVPR 2024highlight
#10508

Layout-Agnostic Scene Text Image Synthesis with Diffusion Models

Qilong Zhangli, Jindong Jiang, Di Liu et al.

CVPR 2024posterarXiv:2406.01062
#10509

CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation

Kangfu Mei, Mauricio Delbracio, Hossein Talebi et al.

CVPR 2024posterarXiv:2310.01407
#10510

From SAM to CAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation

Hyeokjun Kweon, Kuk-Jin Yoon

CVPR 2024poster
#10511

Vlogger: Make Your Dream A Vlog

Shaobin Zhuang, Kunchang Li, Xinyuan Chen et al.

CVPR 2024posterarXiv:2401.09414
#10512

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

Zehuan Huang, Hao Wen, Junting Dong et al.

CVPR 2024posterarXiv:2312.06725
#10513

IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images

Yushuang Wu, Luyue Shi, Junhao Cai et al.

CVPR 2024highlightarXiv:2404.00269
#10514

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Bo He, Hengduo Li, Young Kyun Jang et al.

CVPR 2024posterarXiv:2404.05726
#10515

SVGDreamer: Text Guided SVG Generation with Diffusion Model

XiMing Xing, Chuang Wang, Haitao Zhou et al.

CVPR 2024posterarXiv:2312.16476
#10516

Dual Prototype Attention for Unsupervised Video Object Segmentation

Suhwan Cho, Minhyeok Lee, Seunghoon Lee et al.

CVPR 2024posterarXiv:2211.12036
#10517

R-Cyclic Diffuser: Reductive and Cyclic Latent Diffusion for 3D Clothed Human Digitalization

Kennard Chan, Fayao Liu, Guosheng Lin et al.

CVPR 2024poster
#10518

Contrastive Mean-Shift Learning for Generalized Category Discovery

Sua Choi, Dahyun Kang, Minsu Cho

CVPR 2024posterarXiv:2404.09451
#10519

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

Yuqing Wen, Yucheng Zhao, Yingfei Liu et al.

CVPR 2024posterarXiv:2408.07605
#10520

Flexible Biometrics Recognition: Bridging the Multimodality Gap through Attention Alignment and Prompt Tuning

Leslie Ching Ow Tiong, Dick Sigmund, Chen-Hui Chan et al.

CVPR 2024poster
#10521

Towards Variable and Coordinated Holistic Co-Speech Motion Generation

Yifei Liu, Qiong Cao, Yandong Wen et al.

CVPR 2024posterarXiv:2404.00368
#10522

Class Incremental Learning with Multi-Teacher Distillation

Haitao Wen, Lili Pan, Yu Dai et al.

CVPR 2024poster
#10523

Parameter Efficient Self-Supervised Geospatial Domain Adaptation

Linus Scheibenreif, Michael Mommert, Damian Borth

CVPR 2024poster
#10524

ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers

Narges Norouzi, Svetlana Orlova, Daan de Geus et al.

CVPR 2024posterarXiv:2406.09936
#10525

Beyond Seen Primitive Concepts and Attribute-Object Compositional Learning

Nirat Saini, Khoi Pham, Abhinav Shrivastava

CVPR 2024poster
#10526

Scaling Laws of Synthetic Images for Model Training ... for Now

Lijie Fan, Kaifeng Chen, Dilip Krishnan et al.

CVPR 2024posterarXiv:2312.04567
#10527

UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion

Junsheng Zhou, Weiqi Zhang, Baorui Ma et al.

CVPR 2024posterarXiv:2404.06851
#10528

Learning Group Activity Features Through Person Attribute Prediction

Chihiro Nakatani, Hiroaki Kawashima, Norimichi Ukita

CVPR 2024posterarXiv:2403.02753
#10529

MICap: A Unified Model for Identity-Aware Movie Descriptions

Haran Raajesh, Naveen Reddy Desanur, Zeeshan Khan et al.

CVPR 2024posterarXiv:2405.11483
#10530

UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity

Jialong Zuo, Hanyu Zhou, Ying Nie et al.

CVPR 2024posterarXiv:2312.03441
#10531

Test-Time Zero-Shot Temporal Action Localization

Benedetta Liberatori, Alessandro Conti, Paolo Rota et al.

CVPR 2024posterarXiv:2404.05426
#10532

FreeU: Free Lunch in Diffusion U-Net

Chenyang Si, Ziqi Huang, Yuming Jiang et al.

CVPR 2024posterarXiv:2309.11497
#10533

Towards Text-guided 3D Scene Composition

Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin et al.

CVPR 2024posterarXiv:2312.08885
#10534

Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation

Xiaohan Lei, Min Wang, Wengang Zhou et al.

CVPR 2024posterarXiv:2402.17587
#10535

AnyScene: Customized Image Synthesis with Composited Foreground

Ruidong Chen, Lanjun Wang, Weizhi Nie et al.

CVPR 2024poster
#10536

Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform

Chunghyun Park, Seungwook Kim, Jaesik Park et al.

CVPR 2024posterarXiv:2404.11156
#10537

Color Shift Estimation-and-Correction for Image Enhancement

Yiyu Li, Ke Xu, Gerhard Hancke et al.

CVPR 2024posterarXiv:2405.17725
#10538

Endow SAM with Keen Eyes: Temporal-spatial Prompt Learning for Video Camouflaged Object Detection

Wenjun Hui, Zhenfeng Zhu, Shuai Zheng et al.

CVPR 2024poster
#10539

NICE: Neurogenesis Inspired Contextual Encoding for Replay-free Class Incremental Learning

Mustafa B Gurbuz, Jean Moorman, Constantine Dovrolis

CVPR 2024poster
#10540

Taming Mode Collapse in Score Distillation for Text-to-3D Generation

Peihao Wang, Dejia Xu, Zhiwen Fan et al.

CVPR 2024posterarXiv:2401.00909
#10541

FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders

Soumen Basu, Mayuna Gupta, Chetan Madan et al.

CVPR 2024posterarXiv:2403.08848
#10542

Noisy One-point Homographies are Surprisingly Good

Yaqing Ding, Jonathan Astermark, Magnus Oskarsson et al.

CVPR 2024poster
#10543

CSTA: CNN-based Spatiotemporal Attention for Video Summarization

Jaewon Son, Jaehun Park, Kwangsu Kim

CVPR 2024posterarXiv:2405.11905
#10544

SUGAR: Pre-training 3D Visual Representations for Robotics

Shizhe Chen, Ricardo Garcia Pinel, Ivan Laptev et al.

CVPR 2024posterarXiv:2404.01491
#10545

SnAG: Scalable and Accurate Video Grounding

Fangzhou Mu, Sicheng Mo, Yin Li

CVPR 2024posterarXiv:2404.02257
#10546

GLaMM: Pixel Grounding Large Multimodal Model

Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly et al.

CVPR 2024posterarXiv:2311.03356
#10547

ManiFPT: Defining and Analyzing Fingerprints of Generative Models

Hae Jin Song, Mahyar Khayatkhoei, Wael AbdAlmageed

CVPR 2024posterarXiv:2402.10401
#10548

Self-Calibrating Vicinal Risk Minimisation for Model Calibration

Jiawei Liu, Changkun Ye, Ruikai Cui et al.

CVPR 2024poster
#10549

Cinematic Behavior Transfer via NeRF-based Differentiable Filming

Xuekun Jiang, Anyi Rao, Jingbo Wang et al.

CVPR 2024posterarXiv:2311.17754
#10550

Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning

Leonardo Iurada, Marco Ciccone, Tatiana Tommasi

CVPR 2024posterarXiv:2406.01820
#10551

CORE-MPI: Consistency Object Removal with Embedding MultiPlane Image

Donggeun Yoon, Donghyeon Cho

CVPR 2024poster
#10552

ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring

Yuan Xu, Xiaoxuan Ma, Jiajun Su et al.

CVPR 2024poster
#10553

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

Nataniel Ruiz, Yuanzhen Li, Varun Jampani et al.

CVPR 2024posterarXiv:2307.06949
#10554

Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models

Chang Liu, Haoning Wu, Yujie Zhong et al.

CVPR 2024posterarXiv:2306.00973
#10555

UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather

Haimei Zhao, Jing Zhang, Zhuo Chen et al.

CVPR 2024posterarXiv:2404.05145
#10556

LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs

Yunsheng Ma, Can Cui, Xu Cao et al.

CVPR 2024posterarXiv:2312.04372
#10557

Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification

Tingting Zheng, Kui Jiang, Hongxun Yao

CVPR 2024highlightarXiv:2403.07939
#10558

BilevelPruning: Unified Dynamic and Static Channel Pruning for Convolutional Neural Networks

Shangqian Gao, Yanfu Zhang, Feihu Huang et al.

CVPR 2024poster
#10559

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Jiayi Guo, Xingqian Xu, Yifan Pu et al.

CVPR 2024posterarXiv:2312.04410
#10560

Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents

Yuxi Wei, Zi Wang, Yifan Lu et al.

CVPR 2024highlightarXiv:2402.05746
#10561

Learning Continuous 3D Words for Text-to-Image Generation

Ta-Ying Cheng, Matheus Gadelha, Thibault Groueix et al.

CVPR 2024posterarXiv:2402.08654
#10562

Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening

Yule Duan, Xiao Wu, Haoyu Deng et al.

CVPR 2024posterarXiv:2404.07543
#10563

A Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling

Wentao Qu, Yuantian Shao, Lingwu Meng et al.

CVPR 2024posterarXiv:2312.02719
#10564

APISR: Anime Production Inspired Real-World Anime Super-Resolution

Boyang Wang, Fengyu Yang, Xihang Yu et al.

CVPR 2024posterarXiv:2403.01598
#10565

Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

Weizhen He, Yiheng Deng, SHIXIANG TANG et al.

CVPR 2024posterarXiv:2306.07520
#10566

Device-Wise Federated Network Pruning

Shangqian Gao, Junyi Li, Zeyu Zhang et al.

CVPR 2024poster
#10567

SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers

Ioannis Kakogeorgiou, Spyros Gidaris, Konstantinos Karantzalos et al.

CVPR 2024highlightarXiv:2312.00648
#10568

MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation

Yuelong Li, Yafei Mao, Raja Bala et al.

CVPR 2024posterarXiv:2403.08019
#10569

Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos

Yuhan Shen, Ehsan Elhamifar

CVPR 2024poster
#10570

MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision

Chenyangguang Zhang, Guanlong Jiao, Yan Di et al.

CVPR 2024posterarXiv:2310.11696
#10571

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

Youngjoon Jang, Jihoon Kim, Junseok Ahn et al.

CVPR 2024posterarXiv:2405.10272
#10572

Learning to Segment Referred Objects from Narrated Egocentric Videos

Yuhan Shen, Huiyu Wang, Xitong Yang et al.

CVPR 2024poster
#10573

EGTR: Extracting Graph from Transformer for Scene Graph Generation

Jinbae Im, JeongYeon Nam, Nokyung Park et al.

CVPR 2024posterarXiv:2404.02072
#10574

Distributionally Generative Augmentation for Fair Facial Attribute Classification

Fengda Zhang, Qianpei He, Kun Kuang et al.

CVPR 2024posterarXiv:2403.06606
#10575

PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks

Marina Neseem, Conor McCullough, Randy Hsin et al.

CVPR 2024posterarXiv:2404.00103
#10576

THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models

Prannay Kaul, Zhizhong Li, Hao Yang et al.

CVPR 2024posterarXiv:2405.05256
#10577

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

Minghan LI, Shuai Li, Xindong Zhang et al.

CVPR 2024posterarXiv:2402.18115
#10578

Inlier Confidence Calibration for Point Cloud Registration

Yongzhe Yuan, Yue Wu, Xiaolong Fan et al.

CVPR 2024poster
#10579

CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow

Chenbin Pan, Burhan Yaman, Senem Velipasalar et al.

CVPR 2024posterarXiv:2403.08919
#10580

ADFactory: An Effective Framework for Generalizing Optical Flow with NeRF

Han Ling, Quansen Sun, Yinghui Sun et al.

CVPR 2024posterarXiv:2311.04246
#10581

3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions

Weijia Li, Haote Yang, Zhenghao Hu et al.

CVPR 2024posterarXiv:2404.04823
#10582

In Search of a Data Transformation That Accelerates Neural Field Training

Junwon Seo, Sangyoon Lee, Kwang In Kim et al.

CVPR 2024posterarXiv:2311.17094
#10583

FastMAC: Stochastic Spectral Sampling of Correspondence Graph

Yifei Zhang, Hao Zhao, Hongyang Li et al.

CVPR 2024posterarXiv:2403.08770
#10584

PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution

Honghao Chen, Xiangxiang Chu, Renyongjian et al.

CVPR 2024posterarXiv:2403.07589
#10585

Towards Generalizing to Unseen Domains with Few Labels

Chamuditha Jayanga Galappaththige, Sanoojan Baliah, Malitha Gunawardhana et al.

CVPR 2024posterarXiv:2403.11674
#10586

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts

Jialin Wu, Xia Hu, Yaqing Wang et al.

CVPR 2024highlightarXiv:2312.00968
#10587

Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps

Octave Mariotti, Oisin Mac Aodha, Hakan Bilen

CVPR 2024posterarXiv:2312.13216
#10588

Learning Degradation-Independent Representations for Camera ISP Pipelines

Yanhui Guo, Fangzhou Luo, Xiaolin Wu

CVPR 2024posterarXiv:2307.00761
#10589

A Subspace-Constrained Tyler's Estimator and its Applications to Structure from Motion

Feng Yu, Teng Zhang, Gilad Lerman

CVPR 2024posterarXiv:2404.11590
#10590

Low-Resource Vision Challenges for Foundation Models

Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek

CVPR 2024posterarXiv:2401.04716
#10591

Low-Latency Neural Stereo Streaming

Qiqi Hou, Farzad Farhadzadeh, Amir Said et al.

CVPR 2024posterarXiv:2403.17879
#10592

Your Transferability Barrier is Fragile: Free-Lunch for Transferring the Non-Transferable Learning

Ziming Hong, Li Shen, Tongliang Liu

CVPR 2024highlight
#10593

ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe

Yifan Bai, Zeyang Zhao, Yihong Gong et al.

CVPR 2024posterarXiv:2312.17133
#10594

DPHMs: Diffusion Parametric Head Models for Depth-based Tracking

Jiapeng Tang, Angela Dai, Yinyu Nie et al.

CVPR 2024posterarXiv:2312.01068
#10595

MaxQ: Multi-Axis Query for N:M Sparsity Network

Jingyang Xiang, Siqi Li, Junhao Chen et al.

CVPR 2024poster
#10596

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

Myeongseob Ko, Feiyang Kang, Weiyan Shi et al.

CVPR 2024posterarXiv:2402.08922
#10597

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Haoning Wu, Zicheng Zhang, Erli Zhang et al.

CVPR 2024posterarXiv:2311.06783
#10598

Efficient Scene Recovery Using Luminous Flux Prior

ZhongYu Li, Lei Zhang

CVPR 2024poster
#10599

Structured Gradient-based Interpretations via Norm-Regularized Adversarial Training

Shizhan Gong, Qi Dou, Farzan Farnia

CVPR 2024posterarXiv:2404.04647
#10600

Revisiting Global Translation Estimation with Feature Tracks

Peilin Tao, Hainan Cui, Mengqi Rong et al.

CVPR 2024poster