Most Cited ECCV Highlight "causal revolution" Papers

2,387 papers found • Page 1 of 12

#1

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren et al.

ECCV 2024arXiv:2303.05499
3440
citations
#2

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao

ECCV 2024arXiv:2402.13616
3033
citations
#3

MMBENCH: Is Your Multi-Modal Model an All-around Player?

Yuan Liu, Haodong Duan, Yuanhan Zhang et al.

ECCV 2024arXiv:2307.06281
1745
citations
#4

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Lin Chen, Jinsong Li, Xiaoyi Dong et al.

ECCV 2024arXiv:2311.12793
970
citations
#5

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen et al.

ECCV 2024arXiv:2402.05054
639
citations
#6

Adversarial Diffusion Distillation

Axel Sauer, Dominik Lorenz, Andreas Blattmann et al.

ECCV 2024arXiv:2311.17042
629
citations
#7

MambaIR: A Simple Baseline for Image Restoration with State-Space Model

Hang Guo, Jinmin Li, Tao Dai et al.

ECCV 2024arXiv:2402.15648
560
citations
#8

Grounding Image Matching in 3D with MASt3R

Vincent Leroy, Yohann Cabon, Jerome Revaud

ECCV 2024arXiv:2406.09756
541
citations
#9

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

Yanwei Li, Chengyao Wang, Jiaya Jia

ECCV 2024arXiv:2311.17043
499
citations
#10

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Renrui Zhang, Dongzhi Jiang, Yichi Zhang et al.

ECCV 2024arXiv:2403.14624
498
citations
#11

CoTracker: It is Better to Track Together

Nikita Karaev, Ignacio Rocco, Ben Graham et al.

ECCV 2024arXiv:2307.07635
466
citations
#12

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

Nanye Ma, Mark Goldstein, Michael Albergo et al.

ECCV 2024arXiv:2401.08740
448
citations
#13

MobileNetV4: Universal Models for the Mobile Ecosystem

Danfeng Qin, Chas Leichner, Manolis Delakis et al.

ECCV 2024arXiv:2404.10518
434
citations
#14

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Jinbo Xing, Menghan Xia, Yong Zhang et al.

ECCV 2024arXiv:2310.12190
424
citations
#15

VideoMamba: State Space Model for Efficient Video Understanding

Kunchang Li, Xinhao Li, Yi Wang et al.

ECCV 2024arXiv:2403.06977
407
citations
#16

DriveLM: Driving with Graph Visual Question Answering

Chonghao Sima, Katrin Renz, Kashyap Chitta et al.

ECCV 2024arXiv:2312.14150
376
citations
#17

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Yuedong Chen, Haofei Xu, Chuanxia Zheng et al.

ECCV 2024arXiv:2403.14627
374
citations
#18

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Liang Chen, Haozhe Zhao, Tianyu Liu et al.

ECCV 2024arXiv:2403.06764
368
citations
#19

Evaluating Text-to-Visual Generation with Image-to-Text Generation

Zhiqiu Lin, Deepak Pathak, Baiqi Li et al.

ECCV 2024arXiv:2404.01291
357
citations
#20

Wavelet Convolutions for Large Receptive Fields

Shahaf Finder, Roy Amoyal, Eran Treister et al.

ECCV 2024arXiv:2407.05848
348
citations
#21

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

Mingqiao Ye, Martin Danelljan, Fisher Yu et al.

ECCV 2024arXiv:2312.00732
344
citations
#22

BLINK: Multimodal Large Language Models Can See but Not Perceive

Xingyu Fu, Yushi Hu, Bangzheng Li et al.

ECCV 2024arXiv:2404.12390
333
citations
#23

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Vikram Voleti, Chun-Han Yao, Mark Boss et al.

ECCV 2024arXiv:2403.12008
323
citations
#24

FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting

Zehao Zhu, Zhiwen Fan, Yifan Jiang et al.

ECCV 2024arXiv:2312.00451
297
citations
#25

PointLLM: Empowering Large Language Models to Understand Point Clouds

Runsen Xu, Xiaolong Wang, Tai Wang et al.

ECCV 2024arXiv:2308.16911
295
citations
#26

Long-CLIP: Unlocking the Long-Text Capability of CLIP

Beichen Zhang, Pan Zhang, Xiaoyi Dong et al.

ECCV 2024arXiv:2403.15378
287
citations
#27

DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior

Xinqi Lin, Jingwen He, Ziyan Chen et al.

ECCV 2024arXiv:2308.15070
283
citations
#28

Photorealistic Video Generation with Diffusion Models

Agrim Gupta, Lijun Yu, Kihyuk Sohn et al.

ECCV 2024arXiv:2312.06662
278
citations
#29

Factorizing Text-to-Video Generation by Explicit Image Conditioning

Rohit Girdhar, Mannat Singh, Andrew Brown et al.

ECCV 2024arXiv:2311.10709
266
citations
#30

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

Yinghao Xu, Zifan Shi, Wang Yifan et al.

ECCV 2024arXiv:2403.14621
264
citations
#31

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier et al.

ECCV 2024arXiv:2403.09611
250
citations
#32

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Kai Zhang, Sai Bi, Hao Tan et al.

ECCV 2024arXiv:2404.19702
250
citations
#33

Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization

Tao Yang, Rongyuan Wu, Peiran Ren et al.

ECCV 2024arXiv:2308.14469
249
citations
#34

DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

Xiaofeng Wang, Zheng Zhu, Guan Huang et al.

ECCV 2024arXiv:2309.09777
239
citations
#35

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Shenhao Zhu, Junming Chen, Zuozhuo Dai et al.

ECCV 2024arXiv:2403.14781
239
citations
#36

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding

Yi Wang, Kunchang Li, Xinhao Li et al.

ECCV 2024arXiv:2403.15377
236
citations
#37

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Xiaohan Wang, Yuhui Zhang, Orr Zohar et al.

ECCV 2024arXiv:2403.10517
231
citations
#38

Segment and Recognize Anything at Any Granularity

Feng Li, Hao Zhang, Peize Sun et al.

ECCV 2024arXiv:2307.04767
230
citations
#39

GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

Xiao Fu, Wei Yin, Mu Hu et al.

ECCV 2024arXiv:2403.12013
230
citations
#40

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Linrui Tian, Qi Wang, Bang Zhang et al.

ECCV 2024arXiv:2402.17485
223
citations
#41

PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Junsong Chen, Chongjian GE, Enze Xie et al.

ECCV 2024
223
citations
#42

CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model

Zhengyi Wang, Yikai Wang, Yifei Chen et al.

ECCV 2024arXiv:2403.05034
219
citations
#43

Agent Attention: On the Integration of Softmax and Linear Attention

Dongchen Han, Tianzhu Ye, Yizeng Han et al.

ECCV 2024arXiv:2312.08874
212
citations
#44

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Shilong Liu, Hao Cheng, Haotian Liu et al.

ECCV 2024arXiv:2311.05437
200
citations
#45

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

Xin Liu, Yichen Zhu, Jindong Gu et al.

ECCV 2024arXiv:2311.17600
199
citations
#46

ZigMa: A DiT-style Zigzag Mamba Diffusion Model

Tao Hu, Stefan Andreas Baumann, Ming Gui et al.

ECCV 2024arXiv:2403.13802
188
citations
#47

HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression

Yihang Chen, Qianyi Wu, Weiyao Lin et al.

ECCV 2024arXiv:2403.14530
188
citations
#48

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

Yang Liu, Chuanchen Luo, Lue Fan et al.

ECCV 2024arXiv:2404.01133
186
citations
#49

EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS

Sharath Girish, Kamal Gupta, Abhinav Shrivastava

ECCV 2024arXiv:2312.04564
184
citations
#50

Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians

Guangchi Fang, Bing Wang

ECCV 2024arXiv:2403.14166
183
citations
#51

Sapiens: Foundation for Human Vision Models

Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez et al.

ECCV 2024arXiv:2408.12569
179
citations
#52

To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now

Yimeng Zhang, jinghan jia, Xin Chen et al.

ECCV 2024arXiv:2310.11868
176
citations
#53

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Viraj Shah, Nataniel Ruiz, Forrester Cole et al.

ECCV 2024arXiv:2311.13600
175
citations
#54

LLaVA-UHD: an LMM Perceiving any Aspect Ratio and High-Resolution Images

Zonghao Guo, Ruyi Xu, Yuan Yao et al.

ECCV 2024arXiv:2403.11703
174
citations
#55

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models

Yuwei Guo, Ceyuan Yang, Anyi Rao et al.

ECCV 2024arXiv:2311.16933
173
citations
#56

OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

Wenzhao Zheng, Weiliang Chen, Yuanhui Huang et al.

ECCV 2024arXiv:2311.16038
172
citations
#57

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

Rui Zhao, Yuchao Gu, Jay Zhangjie Wu et al.

ECCV 2024arXiv:2310.08465
167
citations
#58

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

Xuan JU, Xian Liu, Xintao Wang et al.

ECCV 2024arXiv:2403.06976
165
citations
#59

Generative End-to-End Autonomous Driving

Wenzhao Zheng, Ruiqi Song, Xianda Guo et al.

ECCV 2024arXiv:2402.11502
162
citations
#60

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Yue Fan, Xiaojian Ma, Rujie Wu et al.

ECCV 2024arXiv:2403.11481
161
citations
#61

Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting

Yunzhi Yan, Haotong Lin, Chenxu Zhou et al.

ECCV 2024arXiv:2401.01339
157
citations
#62

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Keen You, Haotian Zhang, Eldon Schoop et al.

ECCV 2024arXiv:2404.05719
157
citations
#63

Global Structure-from-Motion Revisited

Linfei Pan, Daniel Barath, Marc Pollefeys et al.

ECCV 2024arXiv:2407.20219
155
citations
#64

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

Junhao Zhuang, Yanhong Zeng, WENRAN LIU et al.

ECCV 2024arXiv:2312.03594
153
citations
#65

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Ming Li, Taojiannan Yang, Huafeng Kuang et al.

ECCV 2024arXiv:2404.07987
153
citations
#66

Rotary Position Embedding for Vision Transformer

Byeongho Heo, Song Park, Dongyoon Han et al.

ECCV 2024arXiv:2403.13298
153
citations
#67

AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection

Yunkang Cao, Jiangning Zhang, Luca Frittoli et al.

ECCV 2024arXiv:2407.15795
148
citations
#68

GRiT: A Generative Region-to-text Transformer for Object Understanding

Jialian Wu, Jianfeng Wang, Zhengyuan Yang et al.

ECCV 2024arXiv:2212.00280
147
citations
#69

Compact 3D Scene Representation via Self-Organizing Gaussian Grids

Wieland Morgenstern, Florian Barthel, Anna Hilsmann et al.

ECCV 2024arXiv:2312.13299
143
citations
#70

Physics-Based Interaction with 3D Objects via Video Generation

Tianyuan Zhang, Hong-Xing Yu, Rundi Wu et al.

ECCV 2024arXiv:2404.13026
142
citations
#71

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

Cong Wei, Yang Chen, Haonan Chen et al.

ECCV 2024arXiv:2311.17136
139
citations
#72

SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

Mingrui Li, Shuhong Liu, Heng Zhou et al.

ECCV 2024arXiv:2402.03246
136
citations
#73

Paying More Attention to Images: A Training-Free Method for Alleviating Hallucination in LVLMs

Shi Liu, Kecheng Zheng, Wei Chen

ECCV 2024arXiv:2407.21771
134
citations
#74

LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model

Dilxat Muhtar, Zhenshi Li, Feng Gu et al.

ECCV 2024arXiv:2402.02544
133
citations
#75

LongVLM: Efficient Long Video Understanding via Large Language Models

Yuetian Weng, Mingfei Han, Haoyu He et al.

ECCV 2024arXiv:2404.03384
131
citations
#76

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

Baoxiong Jia, Yixin Chen, Huangyue Yu et al.

ECCV 2024arXiv:2401.09340
131
citations
#77

ST-LLM: Large Language Models Are Effective Temporal Learners

Ruyang Liu, Chen Li, Haoran Tang et al.

ECCV 2024arXiv:2404.00308
129
citations
#78

Dolphins: Multimodal Language Model for Driving

Yingzi Ma, Yulong Cao, Jiachen Sun et al.

ECCV 2024arXiv:2312.00438
128
citations
#79

SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference

Feng Wang, Jieru Mei, Alan Yuille

ECCV 2024arXiv:2312.01597
127
citations
#80

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

Rohit Gandikota, Joanna Materzynska, Tingrui Zhou et al.

ECCV 2024arXiv:2311.12092
125
citations
#81

DiffiT: Diffusion Vision Transformers for Image Generation

Ali Hatamizadeh, Jiaming Song, Guilin Liu et al.

ECCV 2024arXiv:2312.02139
122
citations
#82

MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model

Wenxun Dai, Ling-Hao Chen, Jingbo Wang et al.

ECCV 2024arXiv:2404.19759
121
citations
#83

Drag Anything: Motion Control for Anything using Entity Representation

Weijia Wu, Zhuang Li, Yuchao Gu et al.

ECCV 2024
121
citations
#84

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Zekun Qi, Runpei Dong, Shaochen Zhang et al.

ECCV 2024arXiv:2402.17766
120
citations
#85

InstructIR: High-Quality Image Restoration Following Human Instructions

Marcos Conde, Gregor Geigle, Radu Timofte

ECCV 2024arXiv:2401.16468
118
citations
#86

SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow

Yihan Wang, Lahav Lipson, Jia Deng

ECCV 2024arXiv:2405.14793
117
citations
#87

IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

Mingjin Zhang, Yuchun Wang, Jie Guo et al.

ECCV 2024arXiv:2407.07520
117
citations
#88

Implicit Style-Content Separation using B-LoRA

Yarden Frenkel, Yael Vinker, Ariel Shamir et al.

ECCV 2024arXiv:2403.14572
117
citations
#89

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

Ming Nie, Renyuan Peng, Chunwei Wang et al.

ECCV 2024arXiv:2312.03661
115
citations
#90

CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization

K L Navaneet, Kossar Pourahmadi, Soroush Abbasi Koohpayegani et al.

ECCV 2024arXiv:2311.18159
115
citations
#91

DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting

Angelos Kratimenos, Jiahui Lei, Kostas Daniilidis

ECCV 2024arXiv:2312.00112
114
citations
#92

Motion Mamba: Efficient and Long Sequence Motion Generation

Zeyu Zhang, Akide Liu, Ian Reid et al.

ECCV 2024arXiv:2403.07487
114
citations
#93

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Hao Zhang, Hongyang Li, Feng Li et al.

ECCV 2024arXiv:2312.02949
114
citations
#94

ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation

Guanxing Lu, Shiyi Zhang, Ziwei Wang et al.

ECCV 2024arXiv:2403.08321
112
citations
#95

OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

Raghav Kapoor, Yash Parag Butala, Melisa A Russak et al.

ECCV 2024arXiv:2402.17553
112
citations
#96

Relightable 3D Gaussians: Realistic Point Cloud Relighting with BRDF Decomposition and Ray Tracing

Jian Gao, chun gu, Youtian Lin et al.

ECCV 2024arXiv:2311.16043
112
citations
#97

Gaussian in the wild: 3D Gaussian Splatting for Unconstrained Image Collections

Dongbin Zhang, Chuming Wang, Weitao Wang et al.

ECCV 2024arXiv:2403.15704
111
citations
#98

ReNoise: Real Image Inversion Through Iterative Noising

Daniel Garibi, Or Patashnik, Andrey Voynov et al.

ECCV 2024arXiv:2403.14602
108
citations
#99

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

Yunhao Gou, Kai Chen, Zhili LIU et al.

ECCV 2024arXiv:2403.09572
108
citations
#100

LITA: Language Instructed Temporal-Localization Assistant

De-An Huang, Shijia Liao, Subhashree Radhakrishnan et al.

ECCV 2024arXiv:2403.19046
108
citations
#101

PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Shaowei Liu, Zhongzheng Ren, Saurabh Gupta et al.

ECCV 2024arXiv:2409.18964
107
citations
#102

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Chuofan Ma, Yi Jiang, Jiannan Wu et al.

ECCV 2024arXiv:2404.13013
107
citations
#103

Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data

Shufan Li, Aditya Grover, Harkanwar Singh

ECCV 2024arXiv:2402.05892
106
citations
#104

latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction

Christopher Wewer, Kevin Raj, Eddy Ilg et al.

ECCV 2024arXiv:2403.16292
106
citations
#105

Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks

MohammadReza Davari, Eugene Belilovsky

ECCV 2024arXiv:2312.06795
106
citations
#106

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

Jingye Chen, Yupan Huang, Tengchao Lv et al.

ECCV 2024arXiv:2311.16465
106
citations
#107

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

Haoqin Tu, Chenhang Cui, Zijun Wang et al.

ECCV 2024arXiv:2311.16101
105
citations
#108

AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting

Yu Wang, Xiaogeng Liu, Yu Li et al.

ECCV 2024arXiv:2403.09513
105
citations
#109

Restoring Images in Adverse Weather Conditions via Histogram Transformer

Shangquan Sun, Wenqi Ren, Xinwei Gao et al.

ECCV 2024arXiv:2407.10172
103
citations
#110

LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models

Hai Jiang, Ao Luo, Xiaohong Liu et al.

ECCV 2024arXiv:2407.08939
102
citations
#111

Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models

Yifan Li, hangyu guo, Kun Zhou et al.

ECCV 2024arXiv:2403.09792
101
citations
#112

STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians

Yifei Zeng, Yanqin Jiang, Siyu Zhu et al.

ECCV 2024arXiv:2403.14939
101
citations
#113

DOCCI: Descriptions of Connected and Contrasting Images

Yasumasa Onoe, Sunayana Rane, Zachary E Berger et al.

ECCV 2024arXiv:2404.19753
100
citations
#114

MVDiffHD: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

Shitao Tang, Jiacheng Chen, Dilin Wang et al.

ECCV 2024
100
citations
#115

VISA: Reasoning Video Object Segmentation via Large Language Model

Cilin Yan, haochen wang, Shilin Yan et al.

ECCV 2024arXiv:2407.11325
99
citations
#116

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang et al.

ECCV 2024arXiv:2405.17429
99
citations
#117

DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

Yiqun Duan, Xianda Guo, Zheng Zhu

ECCV 2024arXiv:2303.05021
99
citations
#118

Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

Zheng Zhang, WENBO HU, Yixing Lao et al.

ECCV 2024arXiv:2403.15530
99
citations
#119

Revising Densification in Gaussian Splatting

Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder

ECCV 2024arXiv:2404.06109
98
citations
#120

FoundPose: Unseen Object Pose Estimation with Foundation Features

Evin Pınar Örnek, Yann Labbé, Bugra Tekin et al.

ECCV 2024arXiv:2311.18809
98
citations
#121

Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance

Liting Lin, Heng Fan, Zhipeng Zhang et al.

ECCV 2024arXiv:2403.05231
97
citations
#122

Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers

Chi-Pin Huang, Kai-Po Chang, Chung-Ting Tsai et al.

ECCV 2024arXiv:2311.17717
96
citations
#123

GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing

Jing Wu, Jiawang Bian, Xinghui Li et al.

ECCV 2024arXiv:2403.08733
95
citations
#124

Towards Open-ended Visual Quality Comparison

Haoning Wu, Hanwei Zhu, Zicheng Zhang et al.

ECCV 2024arXiv:2402.16641
95
citations
#125

Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models

Zhiyuan You, Zheyuan Li, Jinjin Gu et al.

ECCV 2024arXiv:2312.08962
95
citations
#126

Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

Haobo Yuan, Xiangtai Li, Chong Zhou et al.

ECCV 2024arXiv:2401.02955
91
citations
#127

CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization

Jiawei Zhang, Jiahe Li, Xiaohan Yu et al.

ECCV 2024arXiv:2405.12110
91
citations
#128

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Weiyun Wang Weiyun, yiming ren, Haowen Luo et al.

ECCV 2024arXiv:2402.19474
89
citations
#129

Controllable Human-Object Interaction Synthesis

Jiaman Li, Alexander Clegg, Roozbeh Mottaghi et al.

ECCV 2024arXiv:2312.03913
89
citations
#130

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Haoran Wei, Lingyu Kong, Jinyue Chen et al.

ECCV 2024arXiv:2312.06109
89
citations
#131

AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion

yitong jiang, Zhaoyang Zhang, Tianfan Xue et al.

ECCV 2024arXiv:2310.10123
88
citations
#132

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Qing Jiang, Feng Li, Zhaoyang Zeng et al.

ECCV 2024arXiv:2403.14610
86
citations
#133

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Jingkang Yang, Yuhao Dong, Shuai Liu et al.

ECCV 2024arXiv:2310.08588
85
citations
#134

Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation

Yuanchen Ju, Kaizhe Hu, Guowei Zhang et al.

ECCV 2024arXiv:2401.07487
84
citations
#135

PSALM: Pixelwise Segmentation with Large Multi-modal Model

Zheng Zhang, YeYao Ma, Enming Zhang et al.

ECCV 2024arXiv:2403.14598
83
citations
#136

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang, Xiaoyang Wu, Xi Chen et al.

ECCV 2024arXiv:2309.00616
83
citations
#137

Arc2Face: A Foundation Model for ID-Consistent Human Faces

Foivos Paraperas Papantoniou, Alexandros Lattas, Stylianos Moschoglou et al.

ECCV 2024arXiv:2403.11641
81
citations
#138

FreeInit: Bridging Initialization Gap in Video Diffusion Models

Tianxing Wu, Chenyang Si, Yuming Jiang et al.

ECCV 2024arXiv:2312.07537
81
citations
#139

Rethinking Tree-Ring Watermarking for Enhanced Multi-Key Identification

Hai Ci, Pei Yang, Yiren Song et al.

ECCV 2024arXiv:2404.14055
81
citations
#140

RGBD GS-ICP SLAM

Seongbo Ha, Jiung Yeon, Hyeonwoo Yu

ECCV 2024arXiv:2403.12550
81
citations
#141

CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field

Jiarui Hu, Xianhao Chen, Boyin Feng et al.

ECCV 2024arXiv:2403.16095
80
citations
#142

Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models

Chao Gong, Kai Chen, Zhipeng Wei et al.

ECCV 2024arXiv:2407.12383
79
citations
#143

Deblurring 3D Gaussian Splatting

Byeonghyeon Lee, Howoong Lee, Xiangyu Sun et al.

ECCV 2024arXiv:2401.00834
79
citations
#144

Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization

Renjie Pi, Tianyang Han, Wei Xiong et al.

ECCV 2024arXiv:2403.08730
78
citations
#145

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Gengze Zhou, Yicong Hong, Zun Wang et al.

ECCV 2024arXiv:2407.12366
78
citations
#146

TLControl: Trajectory and Language Control for Human Motion Synthesis

WEILIN WAN, Zhiyang Dou, Taku Komura et al.

ECCV 2024arXiv:2311.17135
77
citations
#147

Large-scale Reinforcement Learning for Diffusion Models

Yinan Zhang, Eric Tzeng, Yilun Du et al.

ECCV 2024arXiv:2401.12244
77
citations
#148

Model Stock: All we need is just a few fine-tuned models

Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han

ECCV 2024arXiv:2403.19522
77
citations
#149

EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation

Wenyang Zhou, Zhiyang Dou, Zeyu Cao et al.

ECCV 2024
77
citations
#150

Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

Homanga Bharadhwaj, Roozbeh Mottaghi, Abhinav Gupta et al.

ECCV 2024arXiv:2405.01527
77
citations
#151

ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

Mengcheng Lan, Chaofeng Chen, Yiping Ke et al.

ECCV 2024arXiv:2407.12442
77
citations
#152

Distilling Diffusion Models into Conditional GANs

Minguk Kang, Richard Zhang, Connelly Barnes et al.

ECCV 2024arXiv:2405.05967
77
citations
#153

V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception

Hao Xiang, Xin Xia, Zhaoliang Zheng et al.

ECCV 2024arXiv:2403.16034
77
citations
#154

WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation

Jiachen Lu, Ze Huang, Zeyu Yang et al.

ECCV 2024arXiv:2312.02934
76
citations
#155

A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization

Qiyu Chen, Huiyuan Luo, Chengkan Lv et al.

ECCV 2024arXiv:2407.09359
76
citations
#156

MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

Tianqi Liu, Guangcong Wang, Shoukang Hu et al.

ECCV 2024arXiv:2405.12218
76
citations
#157

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Yushi Lan, Fangzhou Hong, Shuai Yang et al.

ECCV 2024arXiv:2403.12019
75
citations
#158

Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting

Jeongmin Bae, Seoha Kim, Youngsik Yun et al.

ECCV 2024arXiv:2404.03613
75
citations
#159

Improving Diffusion Models for Authentic Virtual Try-on in the Wild

Choi Yisol, Sangkyung Kwak, Kyungmin Lee et al.

ECCV 2024arXiv:2403.05139
75
citations
#160

Expressive Whole-Body 3D Gaussian Avatar

Gyeongsik Moon, Takaaki Shiratori, Shunsuke Saito

ECCV 2024arXiv:2407.21686
75
citations
#161

Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians

Licheng Zhong, Hong-Xing Yu, Jiajun Wu et al.

ECCV 2024arXiv:2403.09434
75
citations
#162

BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting

Lingzhe Zhao, Peng Wang, Peidong Liu

ECCV 2024arXiv:2403.11831
74
citations
#163

Integer-Valued Training and Spike-driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection

Xinhao Luo, Man Yao, Yuhong Chou et al.

ECCV 2024arXiv:2407.20708
74
citations
#164

Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

Donghoon Ahn, Hyoungwon Cho, Jaewon Min et al.

ECCV 2024arXiv:2403.17377
74
citations
#165

ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation

Mengcheng Lan, Chaofeng Chen, Yiping Ke et al.

ECCV 2024arXiv:2408.04883
74
citations
#166

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

Muyao Niu, Xiaodong Cun, Xintao Wang et al.

ECCV 2024arXiv:2405.20222
74
citations
#167

DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

Haoran Li, Haolin Shi, Wenli Zhang et al.

ECCV 2024arXiv:2404.03575
73
citations
#168

Generating Human Interaction Motions in Scenes with Text Control

Hongwei Yi, Justus Thies, Michael J. Black et al.

ECCV 2024arXiv:2404.10685
73
citations
#169

GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

XINJIE ZHANG, Xingtong Ge, Tongda Xu et al.

ECCV 2024arXiv:2403.08551
72
citations
#170

SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution

mingjun zheng, Long Sun, Jiangxin Dong et al.

ECCV 2024
72
citations
#171

Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation

Zhenliang Ni, Xinghao Chen, Yingjie Zhai et al.

ECCV 2024arXiv:2405.06228
72
citations
#172

OneRestore: A Universal Restoration Framework for Composite Degradation

Yu Guo, Yuan Gao, Yuxu Lu et al.

ECCV 2024arXiv:2407.04621
71
citations
#173

DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing

Minghao Chen, Iro Laina, Andrea Vedaldi

ECCV 2024arXiv:2404.18929
71
citations
#174

Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

Yanguang Sun, Chunyan Xu, Jian Yang et al.

ECCV 2024arXiv:2409.01686
71
citations
#175

When Do We Not Need Larger Vision Models?

Baifeng Shi, Ziyang Wu, Maolin Mao et al.

ECCV 2024arXiv:2403.13043
71
citations
#176

End-to-End Rate-Distortion Optimized 3D Gaussian Representation

Henan Wang, Hanxin Zhu, Tianyu He et al.

ECCV 2024arXiv:2406.01597
71
citations
#177

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Wendi Zheng, Jiayan Teng, Zhuoyi Yang et al.

ECCV 2024arXiv:2403.05121
70
citations
#178

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Shijie Zhou, Zhiwen Fan, Dejia Xu et al.

ECCV 2024arXiv:2404.06903
70
citations
#179

Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

Basile Van Hoorick, Rundi Wu, Ege Ozguroglu et al.

ECCV 2024arXiv:2405.14868
69
citations
#180

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

Kong Zhe, Yong Zhang, Tianyu Yang et al.

ECCV 2024arXiv:2403.10983
69
citations
#181

EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis

Shuai Tan, Bin Ji, Mengxiao Bi et al.

ECCV 2024arXiv:2404.01647
69
citations
#182

MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke et al.

ECCV 2024arXiv:2405.02771
68
citations
#183

TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

Yufu Wang, Ziyun Wang, Lingjie Liu et al.

ECCV 2024arXiv:2403.17346
67
citations
#184

Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer

Eric Brachmann, Jamie Wynn, Shuai Chen et al.

ECCV 2024arXiv:2404.14351
67
citations
#185

GIVT: Generative Infinite-Vocabulary Transformers

Michael Tschannen, Cian Eastwood, Fabian Mentzer

ECCV 2024arXiv:2312.02116
67
citations
#186

Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery

Sukrut Rao, Sweta Mahajan, Moritz Böhle et al.

ECCV 2024arXiv:2407.14499
67
citations
#187

Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot

Fabien Baradel, Thomas Lucas, Matthieu Armando et al.

ECCV 2024arXiv:2402.14654
66
citations
#188

TC4D: Trajectory-Conditioned Text-to-4D Generation

Sherwin Bahmani, Xian Liu, Wang Yifan et al.

ECCV 2024arXiv:2403.17920
66
citations
#189

Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

Yunlong Zhang, Honglin Li, YUXUAN SUN et al.

ECCV 2024arXiv:2311.07125
66
citations
#190

Language-Image Pre-training with Long Captions

Kecheng Zheng, Yifei Zhang, Wei Wu et al.

ECCV 2024arXiv:2403.17007
65
citations
#191

Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels

Rui Huang, Songyou Peng, Ayca Takmaz et al.

ECCV 2024arXiv:2312.17232
64
citations
#192

Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

Xiuquan Hou, Meiqin Liu, Senlin Zhang et al.

ECCV 2024arXiv:2407.11699
64
citations
#193

SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition

Jeonghyeok Do, Munchurl Kim

ECCV 2024arXiv:2403.09508
63
citations
#194

Unifying 3D Vision-Language Understanding via Promptable Queries

ziyu zhu, Zhuofan Zhang, Xiaojian Ma et al.

ECCV 2024arXiv:2405.11442
63
citations
#195

GraspXL: Generating Grasping Motions for Diverse Objects at Scale

Hui Zhang, Sammy Christen, Zicong Fan et al.

ECCV 2024arXiv:2403.19649
63
citations
#196

Large Motion Model for Unified Multi-Modal Motion Generation

Mingyuan Zhang, Daisheng Jin, Chenyang Gu et al.

ECCV 2024arXiv:2404.01284
63
citations
#197

GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views

Yaniv Wolf, Amit Bracha, Ron Kimmel

ECCV 2024arXiv:2404.01810
63
citations
#198

DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video

Narek Tumanyan, Assaf Singer, Shai Bagon et al.

ECCV 2024arXiv:2403.14548
62
citations
#199

Local All-Pair Correspondence for Point Tracking

Seokju Cho, Jiahui Huang, Jisu Nam et al.

ECCV 2024arXiv:2407.15420
62
citations
#200

CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians

Avinash Paliwal, Wei Ye, Jinhui Xiong et al.

ECCV 2024arXiv:2403.19495
62
citations
PreviousNext