Most Cited ECCV "attention" Papers

2,387 papers found • Page 3 of 12

#401

Denoising Vision Transformers

Jiawei Yang, Katie Luo, Jiefeng Li et al.

ECCV 2024arXiv:2401.02957
31
citations
#402

An Economic Framework for 6-DoF Grasp Detection

Xiao-Ming Wu, Jia-Feng Cai, Jian-Jian Jiang et al.

ECCV 2024arXiv:2407.08366
31
citations
#403

Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion

Otto Seiskari, Jerry Ylilammi, Valtteri Kaatrasalo et al.

ECCV 2024arXiv:2403.13327
31
citations
#404

Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models

Yufei Zhan, Yousong Zhu, Zhiyang Chen et al.

ECCV 2024arXiv:2311.14552
31
citations
#405

MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views

Wangze Xu, Huachen Gao, Shihe Shen et al.

ECCV 2024arXiv:2409.14316
31
citations
#406

Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos

Mi Luo, Zihui Xue, Alex Dimakis et al.

ECCV 2024arXiv:2403.06351
31
citations
#407

Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity

Santiago Pascual, Chunghsin YEH, Ioannis Tsiamas et al.

ECCV 2024arXiv:2407.10387
31
citations
#408

Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection

Xincheng Yao, Ruoqi Li, Zefeng Qian et al.

ECCV 2024arXiv:2403.13349
31
citations
#409

View Selection for 3D Captioning via Diffusion Ranking

Tiange Luo, Justin Johnson, Honglak Lee

ECCV 2024arXiv:2404.07984
31
citations
#410

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

Xiangxiang Chu, Jianlin Su, Bo Zhang et al.

ECCV 2024arXiv:2403.00522
30
citations
#411

Agent3D-Zero: An Agent for Zero-shot 3D Understanding

Sha Zhang, Di Huang, Jiajun Deng et al.

ECCV 2024arXiv:2403.11835
30
citations
#412

N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields

Yash Bhalgat, Iro Laina, Joao F Henriques et al.

ECCV 2024arXiv:2403.10997
30
citations
#413

MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation

Xiaoshuai Hao, Ruikai Li, Hui Zhang et al.

ECCV 2024arXiv:2407.11682
30
citations
#414

Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction

Xinhang Liu, Jiaben Chen, Shiu-Hong Kao et al.

ECCV 2024arXiv:2305.15171
30
citations
#415

Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs

Muhammad Jehanzeb Mirza, Leonid Karlinsky, Wei Lin et al.

ECCV 2024arXiv:2403.11755
30
citations
#416

UMBRAE: Unified Multimodal Brain Decoding

Weihao Xia, Raoul de Charette, Cengiz Oztireli et al.

ECCV 2024arXiv:2404.07202
30
citations
#417

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang et al.

ECCV 2024arXiv:2403.13745
30
citations
#418

PosterLlama: Bridging Design Ability of Langauge Model to Content-Aware Layout Generation

Jaejung Seol, Seojun Kim, Jaejun Yoo

ECCV 2024arXiv:2404.00995
30
citations
#419

EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding

jiazhou zhou, Xu Zheng, Yuanhuiyi Lyu et al.

ECCV 2024arXiv:2308.03135
30
citations
#420

CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model

Aoran Xiao, Weihao Xuan, Heli Qi et al.

ECCV 2024arXiv:2402.03631
30
citations
#421

Soft Prompt Generation for Domain Generalization

Shuanghao Bai, Yuedi Zhang, Wanqi Zhou et al.

ECCV 2024arXiv:2404.19286
30
citations
#422

DragAPart: Learning a Part-Level Motion Prior for Articulated Objects

Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.

ECCV 2024arXiv:2403.15382
30
citations
#423

BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos

Pilhyeon Lee, Hyeran Byun

ECCV 2024arXiv:2312.00083
30
citations
#424

WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing

Shuokang Huang, Kaihan Li, Di You et al.

ECCV 2024arXiv:2402.09430
30
citations
#425

DiffClass: Diffusion-Based Class Incremental Learning

Zichong Meng, Jie Zhang, Changdi Yang et al.

ECCV 2024arXiv:2403.05016
29
citations
#426

Learning Natural Consistency Representation for Face Forgery Video Detection

Daichi Zhang, Zihao Xiao, Shikun Li et al.

ECCV 2024arXiv:2407.10550
29
citations
#427

HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning

Zhecan Wang, Garrett Bingham, Adams Wei Yu et al.

ECCV 2024arXiv:2407.15680
29
citations
#428

Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection

Hu Cao, Zehua Zhang, Yan Xia et al.

ECCV 2024arXiv:2407.12582
29
citations
#429

Video Editing via Factorized Diffusion Distillation

Uriel Singer, Amit Zohar, Yuval Kirstain et al.

ECCV 2024arXiv:2403.09334
29
citations
#430

I-MedSAM: Implicit Medical Image Segmentation with Segment Anything

Xiaobao Wei, Jiajun Cao, Yizhu Jin et al.

ECCV 2024arXiv:2311.17081
29
citations
#431

Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions

Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi

ECCV 2024arXiv:2407.16698
29
citations
#432

Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding

Ozan Unal, Christos Sakaridis, Suman Saha et al.

ECCV 2024arXiv:2309.04561
29
citations
#433

Dataset Distillation by Automatic Training Trajectories

Dai Liu, Jindong Gu, Hu Cao et al.

ECCV 2024arXiv:2407.14245
29
citations
#434

Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training

Cheng Tan, Jingxuan Wei, Zhangyang Gao et al.

ECCV 2024arXiv:2311.14109
29
citations
#435

Nuvo: Neural UV Mapping for Unruly 3D Representations

Pratul Srinivasan, Stephan J Garbin, Dor Verbin et al.

ECCV 2024arXiv:2312.05283
29
citations
#436

MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders

Baijiong Lin, Weisen Jiang, Pengguang Chen et al.

ECCV 2024arXiv:2407.02228
29
citations
#437

Personalized Federated Domain-Incremental Learning based on Adaptive Knowledge Matching

Yichen Li, Wenchao Xu, Haozhao Wang et al.

ECCV 2024arXiv:2407.05005
29
citations
#438

PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving

Zhili Chen, Maosheng Ye, Shuangjie Xu et al.

ECCV 2024arXiv:2311.08100
29
citations
#439

WHAC: World-grounded Humans and Cameras

Wanqi Yin, Zhongang Cai, Chen Wei et al.

ECCV 2024arXiv:2403.12959
29
citations
#440

HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects

Xintao Lv, Liang Xu, Yichao Yan et al.

ECCV 2024arXiv:2407.12371
29
citations
#441

Human Hair Reconstruction with Strand-Aligned 3D Gaussians

Egor Zakharov, Vanessa Sklyarova, Michael J. Black et al.

ECCV 2024arXiv:2409.14778
29
citations
#442

Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection

Liren He, Zhengkai Jiang, Jinlong Peng et al.

ECCV 2024arXiv:2403.11561
28
citations
#443

The Nerfect Match: Exploring NeRF Features for Visual Localization

Qunjie Zhou, Maxim Maximov, Or Litany et al.

ECCV 2024arXiv:2403.09577
28
citations
#444

T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models

Zhongqi Wang, Jie Zhang, Shiguang Shan et al.

ECCV 2024arXiv:2407.04215
28
citations
#445

VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors

Sungwon Hwang, Min-Jung Kim, Taewoong Kang et al.

ECCV 2024arXiv:2407.02945
28
citations
#446

Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation

han li, Shaohui Li, Shuangrui Ding et al.

ECCV 2024arXiv:2407.09853
28
citations
#447

4D Contrastive Superflows are Dense 3D Representation Learners

Xiang Xu, Lingdong Kong, Hui Shuai et al.

ECCV 2024arXiv:2407.06190
28
citations
#448

Trackastra: Transformer-based cell tracking for live-cell microscopy

Benjamin Gallusser, Weigert Martin

ECCV 2024arXiv:2405.15700
28
citations
#449

Occupancy as Set of Points

Yiang Shi, Tianheng Cheng, Qian Zhang et al.

ECCV 2024arXiv:2407.04049
28
citations
#450

Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views

Yabo Chen, Jiemin Fang, Yuyang Huang et al.

ECCV 2024arXiv:2312.04424
28
citations
#451

Zero-shot Object Counting with Good Exemplars

Huilin Zhu, Jingling Yuan, Zhengwei Yang et al.

ECCV 2024arXiv:2407.04948
28
citations
#452

Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities

Xu Zheng, Yuanhuiyi Lyu, LIN WANG

ECCV 2024arXiv:2407.11351
28
citations
#453

MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos

Yushuo Chen, Zerong Zheng, Zhe Li et al.

ECCV 2024arXiv:2407.08414
28
citations
#454

AccDiffusion: An Accurate Method for Higher-Resolution Image Generation

Zhihang Lin, Mingbao Lin, Meng Zhao et al.

ECCV 2024arXiv:2407.10738
28
citations
#455

Language-Driven Physics-Based Scene Synthesis and Editing via Feature Splatting

Ri-Zhao Qiu, Ge Yang, Weijia Zeng et al.

ECCV 2024
28
citations
#456

SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation

Heyuan Li, Ce Chen, Tianhao Shi et al.

ECCV 2024arXiv:2404.05680
28
citations
#457

DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification

Wenhui Zhu, Xiwen Chen, Peijie Qiu et al.

ECCV 2024arXiv:2407.03575
28
citations
#458

Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations

Tomáš Chobola, Yu Liu, Hanyi Zhang et al.

ECCV 2024arXiv:2407.12511
28
citations
#459

Attention Prompting on Image for Large Vision-Language Models

Runpeng Yu, Weihao Yu, Xinchao Wang

ECCV 2024arXiv:2409.17143
28
citations
#460

Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection

Ting Lei, Shaofeng Yin, Yuxin Peng et al.

ECCV 2024arXiv:2408.02484
28
citations
#461

GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features

Luc Sträter, Mohammadreza Salehi, Efstratios Gavves et al.

ECCV 2024arXiv:2407.12427
28
citations
#462

OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

Ming Hu, Peng Xia, Lin Wang et al.

ECCV 2024arXiv:2406.07471
28
citations
#463

Masked Angle-Aware Autoencoder for Remote Sensing Images

Zhihao Li, Biao Hou, Siteng Ma et al.

ECCV 2024arXiv:2408.01946
28
citations
#464

DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing

Hyeonho Jeong, Jinho Chang, GEON YEONG PARK et al.

ECCV 2024arXiv:2403.12002
28
citations
#465

Do text-free diffusion models learn discriminative visual representations?

Soumik Mukhopadhyay, Matthew Gwilliam, Yosuke Yamaguchi et al.

ECCV 2024arXiv:2311.17921
27
citations
#466

ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation

Zhiyuan MA, Yuxiang WEI, Yabin Zhang et al.

ECCV 2024arXiv:2407.02040
27
citations
#467

MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction

Seongju Lee, Junseok Lee, Yeonguk Yu et al.

ECCV 2024arXiv:2407.21635
27
citations
#468

Tuning-Free Image Customization with Image and Text Guidance

Pengzhi Li, Qiang Nie, Ying Chen et al.

ECCV 2024arXiv:2403.12658
27
citations
#469

Safe-Sim: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries

WEI-JER Chang, Francesco Pittaluga, Masayoshi TOMIZUKA et al.

ECCV 2024arXiv:2401.00391
27
citations
#470

Few-shot Class Incremental Learning with Attention-Aware Self-Adaptive Prompt

Chenxi Liu, Zhenyi Wang, Tianyi Xiong et al.

ECCV 2024arXiv:2403.09857
27
citations
#471

FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification

Yu Tian, Congcong Wen, Min Shi et al.

ECCV 2024arXiv:2407.08813
27
citations
#472

StableDrag: Stable Dragging for Point-based Image Editing

Yutao Cui, Xiaotong Zhao, Guozhen Zhang et al.

ECCV 2024arXiv:2403.04437
27
citations
#473

SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection

Huafeng Chen, Pengxu Wei, Guangqian Guo et al.

ECCV 2024arXiv:2408.10760
27
citations
#474

ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers

Jinke Li, Xiao He, Chonghua Zhou et al.

ECCV 2024arXiv:2405.04299
27
citations
#475

Multistain Pretraining for Slide Representation Learning in Pathology

Guillaume Jaume, Anurag J Vaidya, Andrew Zhang et al.

ECCV 2024arXiv:2408.02859
27
citations
#476

Progressive Pretext Task Learning for Human Trajectory Prediction

Xiaotong Lin, Tianming Liang, Jian-Huang Lai et al.

ECCV 2024arXiv:2407.11588
26
citations
#477

Benchmarking Object Detectors with COCO: A New Path Forward

Shweta Singh, Aayan Yadav, Jitesh Jain et al.

ECCV 2024arXiv:2403.18819
26
citations
#478

Dolfin: Diffusion Layout Transformers without Autoencoder

Yilin Wang, Zeyuan Chen, Liangjun Zhong et al.

ECCV 2024arXiv:2310.16305
26
citations
#479

Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation

Fangfu Liu, Hanyang Wang, Weiliang Chen et al.

ECCV 2024arXiv:2403.09625
26
citations
#480

SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting

Richard Shaw, Michal Nazarczuk, Song Jifei et al.

ECCV 2024arXiv:2312.13308
26
citations
#481

Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators

Yifan Pu, Xia Zhuofan, Jiayi Guo et al.

ECCV 2024arXiv:2408.05710
26
citations
#482

Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching

Meng Chu, Zhedong Zheng, Wei Ji et al.

ECCV 2024arXiv:2311.12751
26
citations
#483

Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration

shihao zhou, Jinshan Pan, Jinglei Shi et al.

ECCV 2024arXiv:2404.00288
26
citations
#484

SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark

Zhengdi Yu, Shaoli Huang, yongkang cheng et al.

ECCV 2024arXiv:2310.20436
26
citations
#485

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Guez Aflalo et al.

ECCV 2024arXiv:2404.01197
26
citations
#486

Prioritized Semantic Learning for Zero-shot Instance Navigation

Xinyu Sun, Lizhao Liu, Hongyan Zhi et al.

ECCV 2024arXiv:2403.11650
26
citations
#487

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

Fucai Ke, Zhixi Cai, Simindokht Jahangard et al.

ECCV 2024arXiv:2403.12884
26
citations
#488

PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation

Yizhe Xiong, Hui Chen, Tianxiang Hao et al.

ECCV 2024arXiv:2403.09192
26
citations
#489

Enhancing Diffusion Models with Text-Encoder Reinforcement Learning

Chaofeng Chen, Annan Wang, Haoning Wu et al.

ECCV 2024arXiv:2311.15657
26
citations
#490

LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning

Bolin Lai, Xiaoliang Dai, Lawrence Chen et al.

ECCV 2024arXiv:2312.03849
26
citations
#491

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

ZUYAN LIU, Benlin Liu, Jiahui Wang et al.

ECCV 2024arXiv:2407.18121
26
citations
#492

In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation

Dahyun Kang, Minsu Cho

ECCV 2024arXiv:2408.04961
26
citations
#493

GeoCalib: Learning Single-image Calibration with Geometric Optimization

Alexander Veicht, Paul-Edouard Sarlin, Philipp Lindenberger et al.

ECCV 2024arXiv:2409.06704
25
citations
#494

MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution

Yuxuan Jiang, Chen Feng, Fan Zhang et al.

ECCV 2024arXiv:2404.09571
25
citations
#495

WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering

Pingyi Chen, Chenglu Zhu, Sunyi Zheng et al.

ECCV 2024arXiv:2407.05603
25
citations
#496

Enhancing Vectorized Map Perception with Historical Rasterized Maps

Xiaoyu Zhang, Guangwei Liu, Zihao Liu et al.

ECCV 2024arXiv:2409.00620
25
citations
#497

A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

Yixiang Qiu, Hao Fang, Hongyao Yu et al.

ECCV 2024arXiv:2407.13863
25
citations
#498

OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection

Hu Zhang, xu jianhua, Tao Tang et al.

ECCV 2024arXiv:2312.08876
25
citations
#499

LLMGA: Multimodal Large Language Model based Generation Assistant

Bin Xia, Shiyin Wang, Yingfan Tao et al.

ECCV 2024arXiv:2311.16500
25
citations
#500

Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving

Zhenghao Peng, Wenjie Luo, Yiren Lu et al.

ECCV 2024arXiv:2409.18343
25
citations
#501

SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow

Yuanzhi Zhu, Xingchao Liu, Qiang Liu

ECCV 2024arXiv:2407.12718
25
citations
#502

Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning

Yan Li, Weiwei Guo, Xue Yang et al.

ECCV 2024arXiv:2311.11646
25
citations
#503

TrojVLM: Backdoor Attack Against Vision Language Models

Weimin Lyu, Lu Pang, Tengfei Ma et al.

ECCV 2024arXiv:2409.19232
25
citations
#504

MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty

Tim Broedermann, David Brüggemann, Christos Sakaridis et al.

ECCV 2024arXiv:2401.12761
25
citations
#505

Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

Yaoting Wang, Peiwen Sun, Dongzhan Zhou et al.

ECCV 2024arXiv:2407.10957
25
citations
#506

SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection

Hongcheng Zhang, Liu Liang, Pengxin Zeng et al.

ECCV 2024arXiv:2403.07284
25
citations
#507

Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks

Sehwan Choi, Jun Won Choi, JUNGHO KIM et al.

ECCV 2024arXiv:2407.13517
25
citations
#508

Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding

Talfan Evans, Shreya Pathak, Hamza Merzic et al.

ECCV 2024arXiv:2312.05328
25
citations
#509

Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection

Yuanpeng Tu, Boshen Zhang, Liang Liu et al.

ECCV 2024arXiv:2401.03145
24
citations
#510

Isomorphic Pruning for Vision Models

Gongfan Fang, Xinyin Ma, Michael Bi Mi et al.

ECCV 2024arXiv:2407.04616
24
citations
#511

Unleashing the Power of Prompt-driven Nucleus Instance Segmentation

Zhongyi Shui, Yunlong Zhang, Kai Yao et al.

ECCV 2024arXiv:2311.15939
24
citations
#512

Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition

Haijun Xiong, Bin Feng, Xinggang Wang et al.

ECCV 2024arXiv:2407.12519
24
citations
#513

EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere

Jiaxi Jiang, Paul Streli, Manuel Meier et al.

ECCV 2024arXiv:2308.06493
24
citations
#514

Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models

Hyeonwoo Kim, Sookwan Han, Patrick Kwon et al.

ECCV 2024arXiv:2401.12978
24
citations
#515

AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer

Zhuguanyu Wu, Jiaxin Chen, Hanwen Zhong et al.

ECCV 2024arXiv:2407.12951
24
citations
#516

Cascade Prompt Learning for Visual-Language Model Adaptation

Ge Wu, Xin Zhang, Zheng Li et al.

ECCV 2024
24
citations
#517

Score Distillation Sampling with Learned Manifold Corrective

Thiemo Alldieck, Nikos Kolotouros, Cristian Sminchisescu

ECCV 2024arXiv:2401.05293
24
citations
#518

DyFADet: Dynamic Feature Aggregation for Temporal Action Detection

Le Yang, Ziwei Zheng, Yizeng Han et al.

ECCV 2024arXiv:2407.03197
24
citations
#519

LISO: Lidar-only Self-Supervised 3D Object Detection

Stefan Baur, Frank Moosmann, Andreas Geiger

ECCV 2024arXiv:2403.07071
24
citations
#520

VideoMamba: Spatio-Temporal Selective State Space Model

Jinyoung Park, Hee-Seon Kim, Kangwook Ko et al.

ECCV 2024arXiv:2407.08476
24
citations
#521

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Guohao Sun, Can Qin, JIAMINAN WANG et al.

ECCV 2024arXiv:2403.11299
24
citations
#522

Semantic Residual Prompts for Continual Learning

Martin Menabue, Emanuele Frascaroli, Matteo Boschini et al.

ECCV 2024arXiv:2403.06870
24
citations
#523

FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing

Gwanhyeong Koo, Sunjae Yoon, Ji Woo Hong et al.

ECCV 2024arXiv:2407.17850
24
citations
#524

Diffusion Soup: Model Merging for Text-to-Image Diffusion Models

Benjamin J Biggs, Arjun Seshadri, Yang Zou et al.

ECCV 2024arXiv:2406.08431
24
citations
#525

OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving

Guoqing Wang, Zhongdao Wang, Pin Tang et al.

ECCV 2024arXiv:2404.15014
24
citations
#526

TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data

Siyi Du, Shaoming Zheng, Yinsong Wang et al.

ECCV 2024arXiv:2407.07582
24
citations
#527

Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering

Ruofan Liang, Zan Gojcic, Merlin Nimier-David et al.

ECCV 2024arXiv:2408.09702
24
citations
#528

PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control

Rishubh Parihar, Sachidanand VS, Sabariswaran Mani et al.

ECCV 2024arXiv:2408.05083
24
citations
#529

Facial Affective Behavior Analysis with Instruction Tuning

Yifan Li, Anh Dao, Wentao Bao et al.

ECCV 2024arXiv:2404.05052
24
citations
#530

Text-Conditioned Resampler For Long Form Video Understanding

Bruno Korbar, Yongqin Xian, Alessio Tonioni et al.

ECCV 2024arXiv:2312.11897
24
citations
#531

Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation

Nina Weng, Paraskevas Pegios, Eike Petersen et al.

ECCV 2024arXiv:2312.14223
24
citations
#532

G3R: Gradient Guided Generalizable Reconstruction

Yun Chen, Jingkang Wang, Ze Yang et al.

ECCV 2024arXiv:2409.19405
24
citations
#533

EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks

Ziming Wang, Ziling Wang, Huaning Li et al.

ECCV 2024arXiv:2403.12574
24
citations
#534

RealViformer: Investigating Attention for Real-World Video Super-Resolution

Yuehan Zhang, Angela Yao

ECCV 2024arXiv:2407.13987
23
citations
#535

F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

Jie Yang, Xuesong Niu, Nan Jiang et al.

ECCV 2024arXiv:2407.12435
23
citations
#536

Object-Centric Diffusion for Efficient Video Editing

Kumara Kahatapitiya, Adil Karjauv, Davide Abati et al.

ECCV 2024arXiv:2401.05735
23
citations
#537

StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

Wen Li, Muyuan Fang, Cheng Zou et al.

ECCV 2024arXiv:2409.02543
23
citations
#538

Region-Adaptive Transform with Segmentation Prior for Image Compression

Yuxi Liu, Wenhan Yang, Huihui Bai et al.

ECCV 2024arXiv:2403.00628
23
citations
#539

FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection

Jianwei Zhao, Xin Li, Fan Yang et al.

ECCV 2024arXiv:2407.13133
23
citations
#540

Defect Spectrum: A Granular Look of Large-scale Defect Datasets with Rich Semantics

Shuai Yang, ZhiFei Chen, Pengguang Chen et al.

ECCV 2024arXiv:2310.17316
23
citations
#541

Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything Constraint

Sixiang Chen, Tian Ye, Kai Zhang et al.

ECCV 2024arXiv:2409.15739
23
citations
#542

HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models

Shen Zhang, Zhaowei CHEN, Zhenyu Zhao et al.

ECCV 2024arXiv:2311.17528
23
citations
#543

ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems

Denis Zavadski, Johann-Friedrich Feiden, Carsten Rother

ECCV 2024arXiv:2312.06573
23
citations
#544

EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding

Yuanming Li, Wei-Jin Huang, An-Lan Wang et al.

ECCV 2024arXiv:2406.08877
23
citations
#545

Improving Medical Multi-modal Contrastive Learning with Expert Annotations

Yogesh Kumar, Pekka Marttinen

ECCV 2024arXiv:2403.10153
23
citations
#546

Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.

ECCV 2024arXiv:2407.01851
23
citations
#547

DataDream: Few-shot Guided Dataset Generation

Jae Myung Kim, Jessica Bader, Stephan Alaniz et al.

ECCV 2024arXiv:2407.10910
23
citations
#548

Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

Byeongjun Park, Hyojun Go, Jin-Young Kim et al.

ECCV 2024arXiv:2403.09176
23
citations
#549

milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing

Fangqiang Ding, Zhen Luo, Peijun Zhao et al.

ECCV 2024arXiv:2306.17010
23
citations
#550

PALM: Predicting Actions through Language Models

Sanghwan Kim, Daoji Huang, Yongqin Xian et al.

ECCV 2024arXiv:2311.17944
23
citations
#551

Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation

Juncheng Ma, Peiwen Sun, Yaoting Wang et al.

ECCV 2024arXiv:2407.11820
23
citations
#552

Revisit Anything: Visual Place Recognition via Image Segment Retrieval

Kartik Garg, Sai Shubodh Puligilla, Shishir N Y Kolathaya et al.

ECCV 2024arXiv:2409.18049
23
citations
#553

GiT: Towards Generalist Vision Transformer through Universal Language Interface

Haiyang Wang, Hao Tang, Li Jiang et al.

ECCV 2024arXiv:2403.09394
23
citations
#554

An Incremental Unified Framework for Small Defect Inspection

Jiaqi Tang, Hao Lu, Xiaogang Xu et al.

ECCV 2024arXiv:2312.08917
22
citations
#555

Online Zero-Shot Classification with CLIP

Qi Qian, JUHUA HU

ECCV 2024arXiv:2408.13320
22
citations
#556

Factorized Diffusion: Perceptual Illusions by Noise Decomposition

Daniel Geng, Inbum Park, Andrew Owens

ECCV 2024arXiv:2404.11615
22
citations
#557

PromptFusion: Decoupling Stability and Plasticity for Continual Learning

Haoran Chen, Zuxuan Wu, Xintong Han et al.

ECCV 2024arXiv:2303.07223
22
citations
#558

ViLA: Efficient Video-Language Alignment for Video Question Answering

Xijun Wang, Junbang Liang, Chun-Kai Wang et al.

ECCV 2024arXiv:2312.08367
22
citations
#559

PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors

Tianyuan Yuan, Mao Yucheng, Jiawei Yang et al.

ECCV 2024arXiv:2403.09079
22
citations
#560

IRGen: Generative Modeling for Image Retrieval

Yidan Zhang, Ting Zhang, DONG CHEN et al.

ECCV 2024arXiv:2303.10126
22
citations
#561

DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks

Caixin Kang, Yinpeng Dong, Zhengyi Wang et al.

ECCV 2024arXiv:2306.09124
22
citations
#562

Robust Calibration of Large Vision-Language Adapters

Balamurali Murugesan, Julio Silva-Rodríguez, Ismail Ben Ayed et al.

ECCV 2024arXiv:2407.13588
22
citations
#563

SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic

Kashyap Chitta, Daniel Dauner, Andreas Geiger

ECCV 2024arXiv:2403.17933
22
citations
#564

Collaborative Control for Geometry-Conditioned PBR Image Generation

Shimon Vainer, Mark Boss, Mathias Parger et al.

ECCV 2024arXiv:2402.05919
22
citations
#565

Reliability in Semantic Segmentation: Can We Use Synthetic Data?

Thibaut Loiseau, Tuan Hung Vu, Mickael Chen et al.

ECCV 2024arXiv:2312.09231
22
citations
#566

Continuous Memory Representation for Anomaly Detection

Joo Chan Lee, Taejune Kim, Eunbyung Park et al.

ECCV 2024arXiv:2402.18293
22
citations
#567

Eliminating Feature Ambiguity for Few-Shot Segmentation

Qianxiong Xu, Guosheng Lin, Chen Change Loy et al.

ECCV 2024arXiv:2407.09842
22
citations
#568

OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations

Yiming Zuo, Jia Deng

ECCV 2024arXiv:2406.11711
22
citations
#569

RadEdit: stress-testing biomedical vision models via diffusion image editing

Fernando Pérez-García, Sam Bond-Taylor, Pedro Sanchez et al.

ECCV 2024arXiv:2312.12865
22
citations
#570

VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions

Seokha Moon, Hyun Woo, Hongbeen Park et al.

ECCV 2024arXiv:2407.12345
22
citations
#571

AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization

Shixiong Xu, Chenghao Zhang, Lubin Fan et al.

ECCV 2024arXiv:2407.08156
22
citations
#572

SEED: A Simple and Effective 3D DETR in Point Clouds

Zhe Liu, Jinghua Hou, Xiaoqing Ye et al.

ECCV 2024arXiv:2407.10749
22
citations
#573

SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery

Sarah Rastegar, Mohammadreza Salehi, Yuki M Asano et al.

ECCV 2024arXiv:2408.14371
22
citations
#574

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

Xiang Fan, Anand Bhattad, Ranjay Krishna

ECCV 2024arXiv:2403.14617
22
citations
#575

Robust-Wide: Robust Watermarking against Instruction-driven Image Editing

Runyi Hu, Jie Zhang, Ting Xu et al.

ECCV 2024arXiv:2402.12688
22
citations
#576

3D Hand Pose Estimation in Everyday Egocentric Images

Aditya Prakash, Ruisen Tu, Matthew Chang et al.

ECCV 2024arXiv:2312.06583
22
citations
#577

NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields

Muhammad Zubair Irshad, Sergey Zakharov, Vitor Guizilini et al.

ECCV 2024arXiv:2404.01300
22
citations
#578

DIM: Dyadic Interaction Modeling for Social Behavior Generation

Minh Tran, Di Chang, Maksim Siniukov et al.

ECCV 2024
22
citations
#579

MotionChain: Conversational Motion Controllers via Multimodal Prompts

Biao Jiang, Xin Chen, Chi Zhang et al.

ECCV 2024arXiv:2404.01700
22
citations
#580

Text2LiDAR: Text-guided LiDAR Point Clouds Generation via Equirectangular Transformer

Yang Wu, Kaihua Zhang, Jianjun Qian et al.

ECCV 2024arXiv:2407.19628
22
citations
#581

Visible and Clear: Finding Tiny Objects in Difference Map

Bing Cao, Haiyu Yao, Pengfei Zhu et al.

ECCV 2024arXiv:2405.11276
21
citations
#582

One-Shot Diffusion Mimicker for Handwritten Text Generation

Gang Dai, Yifan Zhang, Quhui Ke et al.

ECCV 2024arXiv:2409.04004
21
citations
#583

FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients

Shangchao Su, Bin Li, Xiangyang Xue

ECCV 2024arXiv:2311.11227
21
citations
#584

Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration

Chu Jie Qin, Ruiqi Wu, Zikun Liu et al.

ECCV 2024arXiv:2409.19403
21
citations
#585

Is Retain Set All You Need in Machine Unlearning? Restoring Performance of Unlearned Models with Out-Of-Distribution Images

Jacopo Bonato, Marco Cotogni, Luigi Sabetta

ECCV 2024arXiv:2404.12922
21
citations
#586

A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars

Ronglai Zuo, Fangyun Wei, Zenggui Chen et al.

ECCV 2024arXiv:2401.04730
21
citations
#587

Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching

Junpeng Jing, Ye Mao, Krystian Mikolajczyk

ECCV 2024arXiv:2403.10755
21
citations
#588

Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

Byeonghyun Pak, Byeongju Woo, Sunghwan Kim et al.

ECCV 2024arXiv:2407.09033
21
citations
#589

AMEGO: Active Memory from long EGOcentric videos

Gabriele Goletto, Tushar Nagarajan, Giuseppe Averta et al.

ECCV 2024arXiv:2409.10917
21
citations
#590

ZeST: Zero-Shot Material Transfer from a Single Image

Ta-Ying Cheng, Prafull Sharma, Andrew Markham et al.

ECCV 2024arXiv:2404.06425
21
citations
#591

Navigation Instruction Generation with BEV Perception and Large Language Models

Sheng Fan, Rui Liu, Wenguan Wang et al.

ECCV 2024arXiv:2407.15087
21
citations
#592

WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding

Quan Kong, Yuki Kawana, Rajat Saini et al.

ECCV 2024arXiv:2407.15350
21
citations
#593

Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation

Guan Gui, Bin-Bin Gao, Jun Liu et al.

ECCV 2024arXiv:2505.09263
21
citations
#594

Protecting NeRFs' Copyright via Plug-And-Play Watermarking Base Model

Qi Song, Ziyuan Luo, Ka Chun Cheung et al.

ECCV 2024arXiv:2407.07735
21
citations
#595

Training-free Video Temporal Grounding using Large-scale Pre-trained Models

Minghang Zheng, Xinhao Cai, Qingchao Chen et al.

ECCV 2024arXiv:2408.16219
21
citations
#596

LayoutFlow: Flow Matching for Layout Generation

Julian Jorge Andrade Guerreiro, Naoto Inoue, Kento Masui et al.

ECCV 2024arXiv:2403.18187
21
citations
#597

EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models

Ruoxi Chen, Haibo Jin, Yixin Liu et al.

ECCV 2024arXiv:2311.12066
21
citations
#598

CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches

Sifan Wu, Amir Hosein Khasahmadi, Mor Katz et al.

ECCV 2024arXiv:2409.17457
21
citations
#599

SAGS: Structure-Aware 3D Gaussian Splatting

Evangelos Ververas, Rolandos Alexandros Potamias, Song Jifei et al.

ECCV 2024arXiv:2404.19149
21
citations
#600

HERGen: Elevating Radiology Report Generation with Longitudinal Data

Fuying Wang, Shenghui Du, Lequan Yu

ECCV 2024arXiv:2407.15158
21
citations