Research Alpha Leak - Rising Stars in Research

#1

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren et al.

ECCV 2024

3,368

citations

#2

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao

ECCV 2024

2,952

citations

#3

Adversarial Diffusion Distillation

Axel Sauer, Dominik Lorenz, Andreas Blattmann et al.

ECCV 2024

617

citations

#4

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen et al.

ECCV 2024

616

citations

#5

Grounding Image Matching in 3D with MASt3R

Vincent Leroy, Yohann Cabon, Jerome Revaud

ECCV 2024

512

citations

#6

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Renrui Zhang, Dongzhi Jiang, Yichi Zhang et al.

ECCV 2024

473

citations

#7

CoTracker: It is Better to Track Together

Nikita Karaev, Ignacio Rocco, Ben Graham et al.

ECCV 2024

450

citations

#8

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

Nanye Ma, Mark Goldstein, Michael Albergo et al.

ECCV 2024

428

citations

#9

MobileNetV4: Universal Models for the Mobile Ecosystem

Danfeng Qin, Chas Leichner, Manolis Delakis et al.

ECCV 2024

407

citations

#10

VideoMamba: State Space Model for Efficient Video Understanding

Kunchang Li, Xinhao Li, Yi Wang et al.

ECCV 2024

401

citations

#11

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Yuedong Chen, Haofei Xu, Chuanxia Zheng et al.

ECCV 2024

356

citations

#12

Evaluating Text-to-Visual Generation with Image-to-Text Generation

Zhiqiu Lin, Deepak Pathak, Baiqi Li et al.

ECCV 2024

347

citations

#13

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Liang Chen, Haozhe Zhao, Tianyu Liu et al.

ECCV 2024

343

citations

#14

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Vikram Voleti, Chun-Han Yao, Mark Boss et al.

ECCV 2024

318

citations

#15

BLINK: Multimodal Large Language Models Can See but Not Perceive

Xingyu Fu, Yushi Hu, Bangzheng Li et al.

ECCV 2024

307

citations

#16

FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting

Zehao Zhu, Zhiwen Fan, Yifan Jiang et al.

ECCV 2024

293

citations

#17

PointLLM: Empowering Large Language Models to Understand Point Clouds

Runsen Xu, Xiaolong Wang, Tai Wang et al.

ECCV 2024

289

citations

#18

DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior

Xinqi Lin, Jingwen He, Ziyan Chen et al.

ECCV 2024

279

citations

#19

Photorealistic Video Generation with Diffusion Models

Agrim Gupta, Lijun Yu, Kihyuk Sohn et al.

ECCV 2024

270

citations

#20

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

Yinghao Xu, Zifan Shi, Wang Yifan et al.

ECCV 2024

259

citations

ECCV

Top Papers in ECCV 2024

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Adversarial Diffusion Distillation

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Grounding Image Matching in 3D with MASt3R

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

CoTracker: It is Better to Track Together

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

MobileNetV4: Universal Models for the Mobile Ecosystem

VideoMamba: State Space Model for Efficient Video Understanding

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Evaluating Text-to-Visual Generation with Image-to-Text Generation

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

BLINK: Multimodal Large Language Models Can See but Not Perceive

FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting

PointLLM: Empowering Large Language Models to Understand Point Clouds

DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior

Photorealistic Video Generation with Diffusion Models

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation