Most Cited ECCV Oral "search graph construction" Papers

2,387 papers found • Page 1 of 12

#1

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren et al.

ECCV 2024posterarXiv:2303.05499
3368
citations
#2

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao

ECCV 2024posterarXiv:2402.13616
2952
citations
#3

Adversarial Diffusion Distillation

Axel Sauer, Dominik Lorenz, Andreas Blattmann et al.

ECCV 2024posterarXiv:2311.17042
617
citations
#4

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen et al.

ECCV 2024posterarXiv:2402.05054
616
citations
#5

Grounding Image Matching in 3D with MASt3R

Vincent Leroy, Yohann Cabon, Jerome Revaud

ECCV 2024posterarXiv:2406.09756
512
citations
#6

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Renrui Zhang, Dongzhi Jiang, Yichi Zhang et al.

ECCV 2024posterarXiv:2403.14624
473
citations
#7

CoTracker: It is Better to Track Together

Nikita Karaev, Ignacio Rocco, Ben Graham et al.

ECCV 2024posterarXiv:2307.07635
450
citations
#8

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

Nanye Ma, Mark Goldstein, Michael Albergo et al.

ECCV 2024posterarXiv:2401.08740
428
citations
#9

MobileNetV4: Universal Models for the Mobile Ecosystem

Danfeng Qin, Chas Leichner, Manolis Delakis et al.

ECCV 2024posterarXiv:2404.10518
407
citations
#10

VideoMamba: State Space Model for Efficient Video Understanding

Kunchang Li, Xinhao Li, Yi Wang et al.

ECCV 2024posterarXiv:2403.06977
401
citations
#11

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Yuedong Chen, Haofei Xu, Chuanxia Zheng et al.

ECCV 2024posterarXiv:2403.14627
356
citations
#12

Evaluating Text-to-Visual Generation with Image-to-Text Generation

Zhiqiu Lin, Deepak Pathak, Baiqi Li et al.

ECCV 2024posterarXiv:2404.01291
347
citations
#13

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Liang Chen, Haozhe Zhao, Tianyu Liu et al.

ECCV 2024posterarXiv:2403.06764
343
citations
#14

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Vikram Voleti, Chun-Han Yao, Mark Boss et al.

ECCV 2024posterarXiv:2403.12008
318
citations
#15

BLINK: Multimodal Large Language Models Can See but Not Perceive

Xingyu Fu, Yushi Hu, Bangzheng Li et al.

ECCV 2024posterarXiv:2404.12390
307
citations
#16

FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting

Zehao Zhu, Zhiwen Fan, Yifan Jiang et al.

ECCV 2024posterarXiv:2312.00451
293
citations
#17

PointLLM: Empowering Large Language Models to Understand Point Clouds

Runsen Xu, Xiaolong Wang, Tai Wang et al.

ECCV 2024posterarXiv:2308.16911
289
citations
#18

DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior

Xinqi Lin, Jingwen He, Ziyan Chen et al.

ECCV 2024posterarXiv:2308.15070
279
citations
#19

Photorealistic Video Generation with Diffusion Models

Agrim Gupta, Lijun Yu, Kihyuk Sohn et al.

ECCV 2024posterarXiv:2312.06662
270
citations
#20

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

Yinghao Xu, Zifan Shi, Wang Yifan et al.

ECCV 2024posterarXiv:2403.14621
259
citations
#21

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Kai Zhang, Sai Bi, Hao Tan et al.

ECCV 2024posterarXiv:2404.19702
246
citations
#22

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier et al.

ECCV 2024posterarXiv:2403.09611
246
citations
#23

Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization

Tao Yang, Rongyuan Wu, Peiran Ren et al.

ECCV 2024posterarXiv:2308.14469
242
citations
#24

DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

Xiaofeng Wang, Zheng Zhu, Guan Huang et al.

ECCV 2024posterarXiv:2309.09777
234
citations
#25

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Shenhao Zhu, Junming Chen, Zuozhuo Dai et al.

ECCV 2024posterarXiv:2403.14781
233
citations
#26

Segment and Recognize Anything at Any Granularity

Feng Li, Hao Zhang, Peize Sun et al.

ECCV 2024posterarXiv:2307.04767
226
citations
#27

PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Junsong Chen, Chongjian GE, Enze Xie et al.

ECCV 2024poster
223
citations
#28

GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

Xiao Fu, Wei Yin, Mu Hu et al.

ECCV 2024posterarXiv:2403.12013
223
citations
#29

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Linrui Tian, Qi Wang, Bang Zhang et al.

ECCV 2024posterarXiv:2402.17485
218
citations
#30

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding

Yi Wang, Kunchang Li, Xinhao Li et al.

ECCV 2024posterarXiv:2403.15377
214
citations
#31

CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model

Zhengyi Wang, Yikai Wang, Yifei Chen et al.

ECCV 2024posterarXiv:2403.05034
213
citations
#32

Agent Attention: On the Integration of Softmax and Linear Attention

Dongchen Han, Tianzhu Ye, Yizeng Han et al.

ECCV 2024posterarXiv:2312.08874
206
citations
#33

ZigMa: A DiT-style Zigzag Mamba Diffusion Model

Tao Hu, Stefan Andreas Baumann, Ming Gui et al.

ECCV 2024posterarXiv:2403.13802
188
citations
#34

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

Xin Liu, Yichen Zhu, Jindong Gu et al.

ECCV 2024posterarXiv:2311.17600
183
citations
#35

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

Yang Liu, Chuanchen Luo, Lue Fan et al.

ECCV 2024posterarXiv:2404.01133
180
citations
#36

HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression

Yihang Chen, Qianyi Wu, Weiyao Lin et al.

ECCV 2024posterarXiv:2403.14530
179
citations
#37

Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians

Guangchi Fang, Bing Wang

ECCV 2024posterarXiv:2403.14166
175
citations
#38

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models

Yuwei Guo, Ceyuan Yang, Anyi Rao et al.

ECCV 2024posterarXiv:2311.16933
171
citations
#39

Sapiens: Foundation for Human Vision Models

Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez et al.

ECCV 2024posterarXiv:2408.12569
170
citations
#40

LLaVA-UHD: an LMM Perceiving any Aspect Ratio and High-Resolution Images

Zonghao Guo, Ruyi Xu, Yuan Yao et al.

ECCV 2024posterarXiv:2403.11703
170
citations
#41

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Viraj Shah, Nataniel Ruiz, Forrester Cole et al.

ECCV 2024posterarXiv:2311.13600
167
citations
#42

OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

Wenzhao Zheng, Weiliang Chen, Yuanhui Huang et al.

ECCV 2024posterarXiv:2311.16038
167
citations
#43

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

Xuan JU, Xian Liu, Xintao Wang et al.

ECCV 2024posterarXiv:2403.06976
163
citations
#44

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Keen You, Haotian Zhang, Eldon Schoop et al.

ECCV 2024posterarXiv:2404.05719
154
citations
#45

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

Junhao Zhuang, Yanhong Zeng, WENRAN LIU et al.

ECCV 2024posterarXiv:2312.03594
152
citations
#46

Generative End-to-End Autonomous Driving

Wenzhao Zheng, Ruiqi Song, Xianda Guo et al.

ECCV 2024posterarXiv:2402.11502
150
citations
#47

Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting

Yunzhi Yan, Haotong Lin, Chenxu Zhou et al.

ECCV 2024posterarXiv:2401.01339
149
citations
#48

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Ming Li, Taojiannan Yang, Huafeng Kuang et al.

ECCV 2024posterarXiv:2404.07987
148
citations
#49

Physics-Based Interaction with 3D Objects via Video Generation

Tianyuan Zhang, Hong-Xing Yu, Rundi Wu et al.

ECCV 2024posterarXiv:2404.13026
137
citations
#50

Rotary Position Embedding for Vision Transformer

Byeongho Heo, Song Park, Dongyoon Han et al.

ECCV 2024posterarXiv:2403.13298
135
citations
#51

SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

Mingrui Li, Shuhong Liu, Heng Zhou et al.

ECCV 2024posterarXiv:2402.03246
131
citations
#52

LongVLM: Efficient Long Video Understanding via Large Language Models

Yuetian Weng, Mingfei Han, Haoyu He et al.

ECCV 2024posterarXiv:2404.03384
128
citations
#53

LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model

Dilxat Muhtar, Zhenshi Li, Feng Gu et al.

ECCV 2024posterarXiv:2402.02544
127
citations
#54

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

Cong Wei, Yang Chen, Haonan Chen et al.

ECCV 2024posterarXiv:2311.17136
127
citations
#55

Dolphins: Multimodal Language Model for Driving

Yingzi Ma, Yulong Cao, Jiachen Sun et al.

ECCV 2024posterarXiv:2312.00438
126
citations
#56

ST-LLM: Large Language Models Are Effective Temporal Learners

Ruyang Liu, Chen Li, Haoran Tang et al.

ECCV 2024posterarXiv:2404.00308
125
citations
#57

Paying More Attention to Images: A Training-Free Method for Alleviating Hallucination in LVLMs

Shi Liu, Kecheng Zheng, Wei Chen

ECCV 2024posterarXiv:2407.21771
121
citations
#58

Drag Anything: Motion Control for Anything using Entity Representation

Weijia Wu, Zhuang Li, Yuchao Gu et al.

ECCV 2024poster
120
citations
#59

SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference

Feng Wang, Jieru Mei, Alan Yuille

ECCV 2024posterarXiv:2312.01597
120
citations
#60

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

Rohit Gandikota, Joanna Materzynska, Tingrui Zhou et al.

ECCV 2024posterarXiv:2311.12092
120
citations
#61

DiffiT: Diffusion Vision Transformers for Image Generation

Ali Hatamizadeh, Jiaming Song, Guilin Liu et al.

ECCV 2024posterarXiv:2312.02139
119
citations
#62

MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model

Wenxun Dai, Ling-Hao Chen, Jingbo Wang et al.

ECCV 2024posterarXiv:2404.19759
117
citations
#63

InstructIR: High-Quality Image Restoration Following Human Instructions

Marcos Conde, Gregor Geigle, Radu Timofte

ECCV 2024posterarXiv:2401.16468
114
citations
#64

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Hao Zhang, Hongyang Li, Feng Li et al.

ECCV 2024posterarXiv:2312.02949
114
citations
#65

SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow

Yihan Wang, Lahav Lipson, Jia Deng

ECCV 2024posterarXiv:2405.14793
113
citations
#66

Implicit Style-Content Separation using B-LoRA

Yarden Frenkel, Yael Vinker, Ariel Shamir et al.

ECCV 2024posterarXiv:2403.14572
113
citations
#67

DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting

Angelos Kratimenos, Jiahui Lei, Kostas Daniilidis

ECCV 2024posterarXiv:2312.00112
113
citations
#68

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Zekun Qi, Runpei Dong, Shaochen Zhang et al.

ECCV 2024posterarXiv:2402.17766
113
citations
#69

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

Ming Nie, Renyuan Peng, Chunwei Wang et al.

ECCV 2024posterarXiv:2312.03661
112
citations
#70

IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

Mingjin Zhang, Yuchun Wang, Jie Guo et al.

ECCV 2024posterarXiv:2407.07520
110
citations
#71

Gaussian in the wild: 3D Gaussian Splatting for Unconstrained Image Collections

Dongbin Zhang, Chuming Wang, Weitao Wang et al.

ECCV 2024posterarXiv:2403.15704
109
citations
#72

Motion Mamba: Efficient and Long Sequence Motion Generation

Zeyu Zhang, Akide Liu, Ian Reid et al.

ECCV 2024posterarXiv:2403.07487
108
citations
#73

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Chuofan Ma, Yi Jiang, Jiannan Wu et al.

ECCV 2024posterarXiv:2404.13013
107
citations
#74

CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization

K L Navaneet, Kossar Pourahmadi, Soroush Abbasi Koohpayegani et al.

ECCV 2024posterarXiv:2311.18159
106
citations
#75

ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation

Guanxing Lu, Shiyi Zhang, Ziwei Wang et al.

ECCV 2024posterarXiv:2403.08321
106
citations
#76

ReNoise: Real Image Inversion Through Iterative Noising

Daniel Garibi, Or Patashnik, Andrey Voynov et al.

ECCV 2024posterarXiv:2403.14602
105
citations
#77

PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Shaowei Liu, Zhongzheng Ren, Saurabh Gupta et al.

ECCV 2024posterarXiv:2409.18964
104
citations
#78

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

Jingye Chen, Yupan Huang, Tengchao Lv et al.

ECCV 2024posterarXiv:2311.16465
104
citations
#79

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

Haoqin Tu, Chenhang Cui, Zijun Wang et al.

ECCV 2024posterarXiv:2311.16101
103
citations
#80

Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data

Shufan Li, Aditya Grover, Harkanwar Singh

ECCV 2024posterarXiv:2402.05892
103
citations
#81

latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction

Christopher Wewer, Kevin Raj, Eddy Ilg et al.

ECCV 2024posterarXiv:2403.16292
102
citations
#82

MVDiffHD: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

Shitao Tang, Jiacheng Chen, Dilin Wang et al.

ECCV 2024poster
100
citations
#83

DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

Yiqun Duan, Xianda Guo, Zheng Zhu

ECCV 2024posterarXiv:2303.05021
98
citations
#84

Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks

MohammadReza Davari, Eugene Belilovsky

ECCV 2024posterarXiv:2312.06795
98
citations
#85

LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models

Hai Jiang, Ao Luo, Xiaohong Liu et al.

ECCV 2024posterarXiv:2407.08939
98
citations
#86

DOCCI: Descriptions of Connected and Contrasting Images

Yasumasa Onoe, Sunayana Rane, Zachary E Berger et al.

ECCV 2024posterarXiv:2404.19753
98
citations
#87

Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

Zheng Zhang, WENBO HU, Yixing Lao et al.

ECCV 2024posterarXiv:2403.15530
96
citations
#88

Revising Densification in Gaussian Splatting

Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder

ECCV 2024posterarXiv:2404.06109
95
citations
#89

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang et al.

ECCV 2024posterarXiv:2405.17429
95
citations
#90

Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models

Yifan Li, hangyu guo, Kun Zhou et al.

ECCV 2024posterarXiv:2403.09792
95
citations
#91

VISA: Reasoning Video Object Segmentation via Large Language Model

Cilin Yan, haochen wang, Shilin Yan et al.

ECCV 2024posterarXiv:2407.11325
95
citations
#92

Towards Open-ended Visual Quality Comparison

Haoning Wu, Hanwei Zhu, Zicheng Zhang et al.

ECCV 2024posterarXiv:2402.16641
93
citations
#93

STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians

Yifei Zeng, Yanqin Jiang, Siyu Zhu et al.

ECCV 2024posterarXiv:2403.14939
92
citations
#94

Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance

Liting Lin, Heng Fan, Zhipeng Zhang et al.

ECCV 2024posterarXiv:2403.05231
92
citations
#95

Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models

Zhiyuan You, Zheyuan Li, Jinjin Gu et al.

ECCV 2024posterarXiv:2312.08962
92
citations
#96

CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization

Jiawei Zhang, Jiahe Li, Xiaohan Yu et al.

ECCV 2024posterarXiv:2405.12110
90
citations
#97

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Haoran Wei, Lingyu Kong, Jinyue Chen et al.

ECCV 2024posterarXiv:2312.06109
89
citations
#98

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Weiyun Wang Weiyun, yiming ren, Haowen Luo et al.

ECCV 2024posterarXiv:2402.19474
86
citations
#99

AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion

yitong jiang, Zhaoyang Zhang, Tianfan Xue et al.

ECCV 2024posterarXiv:2310.10123
83
citations
#100

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Qing Jiang, Feng Li, Zhaoyang Zeng et al.

ECCV 2024posterarXiv:2403.14610
83
citations
#101

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang, Xiaoyang Wu, Xi Chen et al.

ECCV 2024posterarXiv:2309.00616
82
citations
#102

Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation

Yuanchen Ju, Kaizhe Hu, Guowei Zhang et al.

ECCV 2024posterarXiv:2401.07487
82
citations
#103

PSALM: Pixelwise Segmentation with Large Multi-modal Model

Zheng Zhang, YeYao Ma, Enming Zhang et al.

ECCV 2024posterarXiv:2403.14598
82
citations
#104

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Jingkang Yang, Yuhao Dong, Shuai Liu et al.

ECCV 2024posterarXiv:2310.08588
81
citations
#105

Arc2Face: A Foundation Model for ID-Consistent Human Faces

Foivos Paraperas Papantoniou, Alexandros Lattas, Stylianos Moschoglou et al.

ECCV 2024posterarXiv:2403.11641
79
citations
#106

CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field

Jiarui Hu, Xianhao Chen, Boyin Feng et al.

ECCV 2024posterarXiv:2403.16095
78
citations
#107

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Gengze Zhou, Yicong Hong, Zun Wang et al.

ECCV 2024posterarXiv:2407.12366
77
citations
#108

Deblurring 3D Gaussian Splatting

Byeonghyeon Lee, Howoong Lee, Xiangyu Sun et al.

ECCV 2024posterarXiv:2401.00834
77
citations
#109

TLControl: Trajectory and Language Control for Human Motion Synthesis

WEILIN WAN, Zhiyang Dou, Taku Komura et al.

ECCV 2024posterarXiv:2311.17135
77
citations
#110

EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation

Wenyang Zhou, Zhiyang Dou, Zeyu Cao et al.

ECCV 2024poster
77
citations
#111

Distilling Diffusion Models into Conditional GANs

Minguk Kang, Richard Zhang, Connelly Barnes et al.

ECCV 2024posterarXiv:2405.05967
75
citations
#112

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Yushi Lan, Fangzhou Hong, Shuai Yang et al.

ECCV 2024posterarXiv:2403.12019
75
citations
#113

A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization

Qiyu Chen, Huiyuan Luo, Chengkan Lv et al.

ECCV 2024posterarXiv:2407.09359
75
citations
#114

Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization

Renjie Pi, Tianyang Han, Wei Xiong et al.

ECCV 2024posterarXiv:2403.08730
75
citations
#115

BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting

Lingzhe Zhao, Peng Wang, Peidong Liu

ECCV 2024posterarXiv:2403.11831
74
citations
#116

V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception

Hao Xiang, Xin Xia, Zhaoliang Zheng et al.

ECCV 2024posterarXiv:2403.16034
73
citations
#117

Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting

Jeongmin Bae, Seoha Kim, Youngsik Yun et al.

ECCV 2024posterarXiv:2404.03613
73
citations
#118

Improving Diffusion Models for Authentic Virtual Try-on in the Wild

Choi Yisol, Sangkyung Kwak, Kyungmin Lee et al.

ECCV 2024posterarXiv:2403.05139
72
citations
#119

Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

Homanga Bharadhwaj, Roozbeh Mottaghi, Abhinav Gupta et al.

ECCV 2024posterarXiv:2405.01527
72
citations
#120

SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution

mingjun zheng, Long Sun, Jiangxin Dong et al.

ECCV 2024poster
71
citations
#121

Model Stock: All we need is just a few fine-tuned models

Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han

ECCV 2024posterarXiv:2403.19522
71
citations
#122

Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation

Zhenliang Ni, Xinghao Chen, Yingjie Zhai et al.

ECCV 2024posterarXiv:2405.06228
71
citations
#123

When Do We Not Need Larger Vision Models?

Baifeng Shi, Ziyang Wu, Maolin Mao et al.

ECCV 2024posterarXiv:2403.13043
70
citations
#124

RGBD GS-ICP SLAM

Seongbo Ha, Jiung Yeon, Hyeonwoo Yu

ECCV 2024posterarXiv:2403.12550
70
citations
#125

Large-scale Reinforcement Learning for Diffusion Models

Yinan Zhang, Eric Tzeng, Yilun Du et al.

ECCV 2024posterarXiv:2401.12244
69
citations
#126

Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

Yanguang Sun, Chunyan Xu, Jian Yang et al.

ECCV 2024posterarXiv:2409.01686
69
citations
#127

GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

XINJIE ZHANG, Xingtong Ge, Tongda Xu et al.

ECCV 2024posterarXiv:2403.08551
68
citations
#128

ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

Mengcheng Lan, Chaofeng Chen, Yiping Ke et al.

ECCV 2024posterarXiv:2407.12442
68
citations
#129

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Shijie Zhou, Zhiwen Fan, Dejia Xu et al.

ECCV 2024posterarXiv:2404.06903
68
citations
#130

Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer

Eric Brachmann, Jamie Wynn, Shuai Chen et al.

ECCV 2024posterarXiv:2404.14351
67
citations
#131

End-to-End Rate-Distortion Optimized 3D Gaussian Representation

Henan Wang, Hanxin Zhu, Tianyu He et al.

ECCV 2024posterarXiv:2406.01597
67
citations
#132

TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

Yufu Wang, Ziyun Wang, Lingjie Liu et al.

ECCV 2024posterarXiv:2403.17346
66
citations
#133

OneRestore: A Universal Restoration Framework for Composite Degradation

Yu Guo, Yuan Gao, Yuxu Lu et al.

ECCV 2024posterarXiv:2407.04621
66
citations
#134

Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

Yunlong Zhang, Honglin Li, YUXUAN SUN et al.

ECCV 2024posterarXiv:2311.07125
65
citations
#135

TC4D: Trajectory-Conditioned Text-to-4D Generation

Sherwin Bahmani, Xian Liu, Wang Yifan et al.

ECCV 2024posterarXiv:2403.17920
64
citations
#136

Unifying 3D Vision-Language Understanding via Promptable Queries

ziyu zhu, Zhuofan Zhang, Xiaojian Ma et al.

ECCV 2024posterarXiv:2405.11442
64
citations
#137

Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot

Fabien Baradel, Thomas Lucas, Matthieu Armando et al.

ECCV 2024posterarXiv:2402.14654
63
citations
#138

GIVT: Generative Infinite-Vocabulary Transformers

Michael Tschannen, Cian Eastwood, Fabian Mentzer

ECCV 2024posterarXiv:2312.02116
63
citations
#139

Language-Image Pre-training with Long Captions

Kecheng Zheng, Yifei Zhang, Wei Wu et al.

ECCV 2024posterarXiv:2403.17007
63
citations
#140

GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views

Yaniv Wolf, Amit Bracha, Ron Kimmel

ECCV 2024posterarXiv:2404.01810
62
citations
#141

Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery

Sukrut Rao, Sweta Mahajan, Moritz Böhle et al.

ECCV 2024posterarXiv:2407.14499
62
citations
#142

SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition

Jeonghyeok Do, Munchurl Kim

ECCV 2024posterarXiv:2403.09508
62
citations
#143

Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

Xiuquan Hou, Meiqin Liu, Senlin Zhang et al.

ECCV 2024posterarXiv:2407.11699
61
citations
#144

DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video

Narek Tumanyan, Assaf Singer, Shai Bagon et al.

ECCV 2024posterarXiv:2403.14548
61
citations
#145

Large Motion Model for Unified Multi-Modal Motion Generation

Mingyuan Zhang, Daisheng Jin, Chenyang Gu et al.

ECCV 2024posterarXiv:2404.01284
61
citations
#146

GraspXL: Generating Grasping Motions for Diverse Objects at Scale

Hui Zhang, Sammy Christen, Zicong Fan et al.

ECCV 2024posterarXiv:2403.19649
60
citations
#147

TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

Jiahe Li, Jiawei Zhang, Xiao Bai et al.

ECCV 2024posterarXiv:2404.15264
59
citations
#148

VeCLIP: Improving CLIP Training via Visual-enriched Captions

Zhengfeng Lai, Haotian Zhang, Bowen Zhang et al.

ECCV 2024posterarXiv:2310.07699
59
citations
#149

Diffusion Models for Open-Vocabulary Segmentation

Laurynas Karazija, Iro Laina, Andrea Vedaldi et al.

ECCV 2024posterarXiv:2306.09316
59
citations
#150

SILC: Improving Vision Language Pretraining with Self-Distillation

Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai et al.

ECCV 2024posterarXiv:2310.13355
58
citations
#151

ControlLLM: Augment Language Models with Tools by Searching on Graphs

Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao et al.

ECCV 2024posterarXiv:2310.17796
57
citations
#152

SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation

Yi-Chia Chen, WeiHua Li, Cheng Sun et al.

ECCV 2024posterarXiv:2409.10542
57
citations
#153

GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

hang yao, Ming LIU, Zhicun Yin et al.

ECCV 2024posterarXiv:2406.07487
57
citations
#154

ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Daniel Winter, Matan Cohen, Shlomi Fruchter et al.

ECCV 2024posterarXiv:2403.18818
56
citations
#155

DQ-DETR: DETR with Dynamic Query for Tiny Object Detection

Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai et al.

ECCV 2024posterarXiv:2404.03507
56
citations
#156

Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction

Bencheng Liao, Shaoyu Chen, Bo Jiang et al.

ECCV 2024posterarXiv:2303.08815
56
citations
#157

Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Yuanwen Yue, Anurag Das, Francis Engelmann et al.

ECCV 2024posterarXiv:2407.20229
55
citations
#158

OmniSat: Self-Supervised Modality Fusion for Earth Observation

Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.

ECCV 2024posterarXiv:2404.08351
55
citations
#159

VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation

Zhen Qu, Xian Tao, Mukesh Prasad et al.

ECCV 2024posterarXiv:2407.12276
55
citations
#160

FlashTex: Fast Relightable Mesh Texturing with LightControlNet

Kangle Deng, Timothy Omernick, Alexander B Weiss et al.

ECCV 2024posterarXiv:2402.13251
54
citations
#161

IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination

Xi Chen, Sida Peng, Dongchen Yang et al.

ECCV 2024posterarXiv:2404.11593
54
citations
#162

MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping

Jiacheng Chen, Yuefan Wu, Tan Jiaqi et al.

ECCV 2024posterarXiv:2403.15951
54
citations
#163

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Yue Han, Junwei Zhu, Keke He et al.

ECCV 2024posterarXiv:2405.12970
54
citations
#164

A Comparative Study of Image Restoration Networks for General Backbone Network Design

Xiangyu Chen, Zheyuan Li, Yuandong Pu et al.

ECCV 2024posterarXiv:2310.11881
53
citations
#165

Latent Guard: a Safety Framework for Text-to-image Generation

Runtao Liu, Ashkan Khakzar, Jindong Gu et al.

ECCV 2024posterarXiv:2404.08031
53
citations
#166

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

Zeyu Liu, Weicong Liang, Zhanhao Liang et al.

ECCV 2024posterarXiv:2403.09622
53
citations
#167

VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models

Junlin Han, Filippos Kokkinos, Philip Torr

ECCV 2024posterarXiv:2403.12034
52
citations
#168

GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes

Ibrahim Ethem Hamamci, Sezgin Er, Anjany Sekuboyina et al.

ECCV 2024posterarXiv:2305.16037
52
citations
#169

LaRa: Efficient Large-Baseline Radiance Fields

Anpei Chen, Haofei Xu, Stefano Esposito et al.

ECCV 2024posterarXiv:2407.04699
52
citations
#170

HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution

Xiang Zhang, Yulun Zhang, Fisher Yu

ECCV 2024posterarXiv:2407.05878
52
citations
#171

GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering

Yanyan Li, Chenyu Lyu, Yan Di et al.

ECCV 2024posterarXiv:2403.11324
52
citations
#172

ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions

Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik et al.

ECCV 2024posterarXiv:2311.17057
51
citations
#173

GVGEN: Text-to-3D Generation with Volumetric Representation

Xianglong He, Junyi Chen, Sida Peng et al.

ECCV 2024posterarXiv:2403.12957
51
citations
#174

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

Linjiang Huang, Rongyao Fang, Aiping Zhang et al.

ECCV 2024posterarXiv:2403.12963
51
citations
#175

HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting

Helisa Dhamo, Yinyu Nie, Arthur Moreau et al.

ECCV 2024posterarXiv:2312.02902
51
citations
#176

CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

Qilang Ye, Zitong Yu, Rui Shao et al.

ECCV 2024posterarXiv:2403.04640
50
citations
#177

Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training

David Wan, Jaemin Cho, Elias Stengel-Eskin et al.

ECCV 2024posterarXiv:2403.02325
50
citations
#178

Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning

Chongyu Fan, Jiancheng Liu, Alfred Hero et al.

ECCV 2024posterarXiv:2403.07362
50
citations
#179

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Lanqing Guo, Yingqing He, Haoxin Chen et al.

ECCV 2024posterarXiv:2402.10491
50
citations
#180

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

Zihao Liu, Xiaoyu Zhang, Guangwei Liu et al.

ECCV 2024posterarXiv:2402.17430
49
citations
#181

TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models

Aditya Aravind Chinchure, Pushkar Shukla, Gaurav Bhatt et al.

ECCV 2024posterarXiv:2312.01261
49
citations
#182

ReMamber: Referring Image Segmentation with Mamba Twister

Yuhuan Yang, Chaofan Ma, Jiangchao Yao et al.

ECCV 2024posterarXiv:2403.17839
49
citations
#183

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

Shicheng Li, Lei Li, Yi Liu et al.

ECCV 2024posterarXiv:2311.17404
49
citations
#184

LCM-Lookahead for Encoder-based Text-to-Image Personalization

Rinon Gal, Or Lichter, Elad Richardson et al.

ECCV 2024posterarXiv:2404.03620
49
citations
#185

UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

Zexiang Liu, Yangguang Li, Youtian Lin et al.

ECCV 2024posterarXiv:2312.08754
49
citations
#186

CoMo: Controllable Motion Generation through Language Guided Pose Code Editing

Yiming Huang, WEILIN WAN, Yue Yang et al.

ECCV 2024posterarXiv:2403.13900
48
citations
#187

Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

Yuqian Fu, Yu Wang, Yixuan Pan et al.

ECCV 2024posterarXiv:2402.03094
48
citations
#188

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Yuru Jia, Lukas Hoyer, Shengyu Huang et al.

ECCV 2024posterarXiv:2312.03048
48
citations
#189

UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction

Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud et al.

ECCV 2024posterarXiv:2403.15098
47
citations
#190

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

Yiming Zhao, Zhouhui Lian

ECCV 2024posterarXiv:2312.04884
47
citations
#191

Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models

Yixuan Ren, Yang Zhou, Jimei Yang et al.

ECCV 2024posterarXiv:2402.14780
47
citations
#192

Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer

Yu Deng, Duomin Wang, Baoyuan Wang

ECCV 2024posterarXiv:2403.13570
47
citations
#193

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

Yixuan Wu, Yizhou Wang, Shixiang Tang et al.

ECCV 2024posterarXiv:2403.12488
47
citations
#194

Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention

Jie Ren, Yaxin Li, Shenglai Zeng et al.

ECCV 2024posterarXiv:2403.11052
46
citations
#195

When Fast Fourier Transform Meets Transformer for Image Restoration

xingyu jiang, Xiuhui Zhang, Ning Gao et al.

ECCV 2024poster
46
citations
#196

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Yang Zheng, Qingqing Zhao, Guandao Yang et al.

ECCV 2024posterarXiv:2404.04421
46
citations
#197

FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally

Qiuhong Shen, Xingyi Yang, Xinchao Wang

ECCV 2024posterarXiv:2409.08270
45
citations
#198

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

Yongwei Chen, Tengfei Wang, Tong Wu et al.

ECCV 2024posterarXiv:2403.12409
45
citations
#199

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

Tong Shao, Zhuotao Tian, Hang Zhao et al.

ECCV 2024posterarXiv:2407.08268
44
citations
#200

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

Rizhao Cai, Zirui Song, DAYAN GUAN et al.

ECCV 2024posterarXiv:2312.02896
44
citations
PreviousNext