Most Cited 2024 Highlight "radiology report generation" Papers

12,324 papers found • Page 1 of 62

#1

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren et al.

ECCV 2024posterarXiv:2303.05499
3368
citations
#2

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao

ECCV 2024posterarXiv:2402.13616
2952
citations
#3

DETRs Beat YOLOs on Real-time Object Detection

Yian Zhao, Wenyu Lv, Shangliang Xu et al.

CVPR 2024posterarXiv:2304.08069
2424
citations
#4

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Zhe Chen, Jiannan Wu, Wenhai Wang et al.

CVPR 2024posterarXiv:2312.14238
2210
citations
#5

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion

Chong Mou, Xintao Wang, Liangbin Xie et al.

AAAI 2024paperarXiv:2302.08453
1423
citations
#6

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Xin Li, Jing Yu Koh, Alexander Ku et al.

ICLR 2024poster
1366
citations
#7

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Akari Asai, Zeqiu Wu, Yizhong Wang et al.

ICLR 2024posterarXiv:2310.11511
1356
citations
#8

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Pan Lu, Hritik Bansal, Tony Xia et al.

ICLR 2024posterarXiv:2310.02255
1171
citations
#9

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Yujia Qin, Shihao Liang, Yining Ye et al.

ICLR 2024spotlightarXiv:2307.16789
1128
citations
#10

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Guanjun Wu, Taoran Yi, Jiemin Fang et al.

CVPR 2024posterarXiv:2310.08528
1061
citations
#11

Grounding Multimodal Large Language Models to the World

Zhiliang Peng, Wenhui Wang, Li Dong et al.

ICLR 2024posterarXiv:2306.14824
1032
citations
#12

VBench: Comprehensive Benchmark Suite for Video Generative Models

Ziqi Huang, Yinan He, Jiashuo Yu et al.

CVPR 2024highlightarXiv:2311.17982
996
citations
#13

A Generalist Agent

Jackie Kay, Sergio Gómez Colmenarejo, Mahyar Bordbar et al.

ICLR 2024poster
978
citations
#14

MVDream: Multi-view Diffusion for 3D Generation

Yichun Shi, Peng Wang, Jianglong Ye et al.

ICLR 2024posterarXiv:2308.16512
871
citations
#15

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

Kunchang Li, Yali Wang, Yinan He et al.

CVPR 2024highlightarXiv:2311.17005
864
citations
#16

LISA: Reasoning Segmentation via Large Language Model

Xin Lai, Zhuotao Tian, Yukang Chen et al.

CVPR 2024posterarXiv:2308.00692
721
citations
#17

Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction

Ziyi Yang, Xinyu Gao, Wen Zhou et al.

CVPR 2024posterarXiv:2309.13101
686
citations
#18

VILA: On Pre-training for Visual Language Models

Ji Lin, Danny Yin, Wei Ping et al.

CVPR 2024posterarXiv:2312.07533
685
citations
#19

Adversarial Diffusion Distillation

Axel Sauer, Dominik Lorenz, Andreas Blattmann et al.

ECCV 2024posterarXiv:2311.17042
617
citations
#20

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen et al.

ECCV 2024posterarXiv:2402.05054
616
citations
#21

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Qinghao Ye, Haiyang Xu, Jiabo Ye et al.

CVPR 2024highlightarXiv:2311.04257
601
citations
#22

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Tao Lu, Mulin Yu, Linning Xu et al.

CVPR 2024highlightarXiv:2312.00109
589
citations
#23

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

Shengbang Tong, Zhuang Liu, Yuexiang Zhai et al.

CVPR 2024posterarXiv:2401.06209
570
citations
#24

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Longhui Yu, Weisen JIANG, Han Shi et al.

ICLR 2024spotlightarXiv:2309.12284
554
citations
#25

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

Boyuan Chen, Zhuo Xu, Sean Kirmani et al.

CVPR 2024posterarXiv:2401.12168
550
citations
#26

One-step Diffusion with Distribution Matching Distillation

Tianwei Yin, Michaël Gharbi, Richard Zhang et al.

CVPR 2024posterarXiv:2311.18828
543
citations
#27

Language Model Beats Diffusion - Tokenizer is key to visual generation

Lijun Yu, José Lezama, Nitesh Bharadwaj Gundavarapu et al.

ICLR 2024posterarXiv:2310.05737
525
citations
#28

Grounding Image Matching in 3D with MASt3R

Vincent Leroy, Yohann Cabon, Jerome Revaud

ECCV 2024posterarXiv:2406.09756
512
citations
#29

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

David Charatan, Sizhe Lester Li, Andrea Tagliasacchi et al.

CVPR 2024posterarXiv:2312.12337
496
citations
#30

Patches Are All You Need?

Asher Trockman, J Kolter

ICLR 2024posterarXiv:2201.09792
487
citations
#31

SplaTAM: Splat Track & Map 3D Gaussians for Dense RGB-D SLAM

Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula et al.

CVPR 2024posterarXiv:2312.02126
477
citations
#32

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

Weize Chen, Yusheng Su, Jingwei Zuo et al.

ICLR 2024posterarXiv:2308.10848
476
citations
#33

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Renrui Zhang, Dongzhi Jiang, Yichi Zhang et al.

ECCV 2024posterarXiv:2403.14624
473
citations
#34

Eureka: Human-Level Reward Design via Coding Large Language Models

Yecheng Jason Ma, William Liang, Guanzhi Wang et al.

ICLR 2024posterarXiv:2310.12931
471
citations
#35

Benchmarking Large Language Models in Retrieval-Augmented Generation

Jiawei Chen, Hongyu Lin, Xianpei Han et al.

AAAI 2024paperarXiv:2309.01431
458
citations
#36

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Enxin Song, Wenhao Chai, Guanhong Wang et al.

CVPR 2024posterarXiv:2307.16449
457
citations
#37

CoTracker: It is Better to Track Together

Nikita Karaev, Ignacio Rocco, Ben Graham et al.

ECCV 2024posterarXiv:2307.07635
450
citations
#38

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Sicong Leng, Hang Zhang, Guanzheng Chen et al.

CVPR 2024highlightarXiv:2311.16922
449
citations
#39

SALMONN: Towards Generic Hearing Abilities for Large Language Models

Changli Tang, Wenyi Yu, Guangzhi Sun et al.

ICLR 2024posterarXiv:2310.13289
447
citations
#40

Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting

Zeyu Yang, Hongye Yang, Zijie Pan et al.

ICLR 2024oralarXiv:2310.10642
440
citations
#41

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

Nanye Ma, Mark Goldstein, Michael Albergo et al.

ECCV 2024posterarXiv:2401.08740
428
citations
#42

Generative Multimodal Models are In-Context Learners

Quan Sun, Yufeng Cui, Xiaosong Zhang et al.

CVPR 2024posterarXiv:2312.13286
422
citations
#43

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Bowen Wen, Wei Yang, Jan Kautz et al.

CVPR 2024highlightarXiv:2312.08344
412
citations
#44

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

Yangsibo Huang, Samyak Gupta, Mengzhou Xia et al.

ICLR 2024spotlightarXiv:2310.06987
412
citations
#45

YaRN: Efficient Context Window Extension of Large Language Models

Bowen Peng, Jeffrey Quesnelle, Honglu Fan et al.

ICLR 2024posterarXiv:2309.00071
410
citations
#46

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

Yi Wang, Yinan He, Yizhuo Li et al.

ICLR 2024spotlightarXiv:2307.06942
408
citations
#47

MobileNetV4: Universal Models for the Mobile Ecosystem

Danfeng Qin, Chas Leichner, Manolis Delakis et al.

ECCV 2024posterarXiv:2404.10518
407
citations
#48

VideoMamba: State Space Model for Efficient Video Understanding

Kunchang Li, Xinhao Li, Yi Wang et al.

ECCV 2024posterarXiv:2403.06977
401
citations
#49

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Zhang Li, Biao Yang, Qiang Liu et al.

CVPR 2024highlightarXiv:2311.06607
384
citations
#50

Universal Guidance for Diffusion Models

Arpit Bansal, Hong-Min Chu, Avi Schwarzschild et al.

ICLR 2024posterarXiv:2302.07121
380
citations
#51

Prometheus: Inducing Fine-Grained Evaluation Capability in Language Models

Seungone Kim, Jamin Shin, yejin cho et al.

ICLR 2024posterarXiv:2310.08491
378
citations
#52

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

Suyu Ge, Yunan Zhang, Liyuan Liu et al.

ICLR 2024posterarXiv:2310.01801
372
citations
#53

Large Language Models Are Not Robust Multiple Choice Selectors

Chujie Zheng, Hao Zhou, Fandong Meng et al.

ICLR 2024oralarXiv:2309.03882
370
citations
#54

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

Jing Shi, Wei Xiong, Zhe Lin et al.

CVPR 2024posterarXiv:2304.03411
369
citations
#55

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Qidong Huang, Xiaoyi Dong, Pan Zhang et al.

CVPR 2024highlightarXiv:2311.17911
365
citations
#56

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

Chi Yan, Delin Qu, Dong Wang et al.

CVPR 2024highlightarXiv:2311.11700
359
citations
#57

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Yuedong Chen, Haofei Xu, Chuanxia Zheng et al.

ECCV 2024posterarXiv:2403.14627
356
citations
#58

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Shuhuai Ren, Linli Yao, Shicheng Li et al.

CVPR 2024posterarXiv:2312.02051
356
citations
#59

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Peng Jin, Ryuichi Takanobu, Cai Zhang et al.

CVPR 2024highlightarXiv:2311.08046
354
citations
#60

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

Tianrui Guan, Fuxiao Liu, Xiyang Wu et al.

CVPR 2024posterarXiv:2310.14566
354
citations
#61

Compact 3D Gaussian Representation for Radiance Field

Joo Chan Lee, Daniel Rho, Xiangyu Sun et al.

CVPR 2024highlightarXiv:2311.13681
348
citations
#62

Evaluating Text-to-Visual Generation with Image-to-Text Generation

Zhiqiu Lin, Deepak Pathak, Baiqi Li et al.

ECCV 2024posterarXiv:2404.01291
347
citations
#63

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Tianyu Yu, Yuan Yao, Haoye Zhang et al.

CVPR 2024posterarXiv:2312.00849
344
citations
#64

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Bin Zhu, Bin Lin, Munan Ning et al.

ICLR 2024posterarXiv:2310.01852
343
citations
#65

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Liang Chen, Haozhe Zhao, Tianyu Liu et al.

ECCV 2024posterarXiv:2403.06764
343
citations
#66

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2024posterarXiv:2402.19479
341
citations
#67

Preference Ranking Optimization for Human Alignment

Feifan Song, Bowen Yu, Minghao Li et al.

AAAI 2024paperarXiv:2306.17492
334
citations
#68

Learning Interactive Real-World Simulators

Sherry Yang, Yilun Du, Seyed Ghasemipour et al.

ICLR 2024posterarXiv:2310.06114
334
citations
#69

ControlVideo: Training-free Controllable Text-to-video Generation

Yabo Zhang, Yuxiang Wei, Dongsheng jiang et al.

ICLR 2024posterarXiv:2305.13077
331
citations
#70

Text-to-3D using Gaussian Splatting

Zilong Chen, Feng Wang, Yikai Wang et al.

CVPR 2024posterarXiv:2309.16585
330
citations
#71

Human Motion Diffusion as a Generative Prior

Yonatan Shafir, Guy Tevet, Roy Kapon et al.

ICLR 2024posterarXiv:2303.01418
328
citations
#72

Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

Shijie Zhou, Haoran Chang, Sicheng Jiang et al.

CVPR 2024highlightarXiv:2312.03203
327
citations
#73

V?: Guided Visual Search as a Core Mechanism in Multimodal LLMs

Penghao Wu, Saining Xie

CVPR 2024posterarXiv:2312.14135
327
citations
#74

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

Dongjun Kim, Chieh-Hsin Lai, WeiHsiang Liao et al.

ICLR 2024posterarXiv:2310.02279
322
citations
#75

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

Yiwen Chen, Zilong Chen, Chi Zhang et al.

CVPR 2024posterarXiv:2311.14521
321
citations
#76

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang et al.

ICLR 2024spotlightarXiv:2308.13137
320
citations
#77

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew et al.

CVPR 2024posterarXiv:2311.16498
318
citations
#78

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Vikram Voleti, Chun-Han Yao, Mark Boss et al.

ECCV 2024posterarXiv:2403.12008
318
citations
#79

Splatter Image: Ultra-Fast Single-View 3D Reconstruction

Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi

CVPR 2024posterarXiv:2312.13150
316
citations
#80

Vision-Language Foundation Models as Effective Robot Imitators

Xinghang Li, Minghuan Liu, Hanbo Zhang et al.

ICLR 2024spotlightarXiv:2311.01378
310
citations
#81

Video-P2P: Video Editing with Cross-attention Control

Shaoteng Liu, Yuechen Zhang, Wenbo Li et al.

CVPR 2024posterarXiv:2303.04761
309
citations
#82

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Guan Wang, Sijie Cheng, Xianyuan Zhan et al.

ICLR 2024posterarXiv:2309.11235
309
citations
#83

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

Yujun Shi, Chuhui Xue, Jun Hao Liew et al.

CVPR 2024highlightarXiv:2306.14435
308
citations
#84

BLINK: Multimodal Large Language Models Can See but Not Perceive

Xingyu Fu, Yushi Hu, Bangzheng Li et al.

ECCV 2024posterarXiv:2404.12390
307
citations
#85

Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Kevin Clark, Paul Vicol, Kevin Swersky et al.

ICLR 2024posterarXiv:2309.17400
303
citations
#86

SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

Yihua Huang, Yangtian Sun, Ziyi Yang et al.

CVPR 2024posterarXiv:2312.14937
302
citations
#87

TD-MPC2: Scalable, Robust World Models for Continuous Control

Nicklas Hansen, Hao Su, Xiaolong Wang

ICLR 2024spotlightarXiv:2310.16828
293
citations
#88

FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting

Zehao Zhu, Zhiwen Fan, Yifan Jiang et al.

ECCV 2024posterarXiv:2312.00451
293
citations
#89

SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Saleh Ashkboos, Maximilian Croci, Marcelo Gennari do Nascimento et al.

ICLR 2024posterarXiv:2401.15024
289
citations
#90

PointLLM: Empowering Large Language Models to Understand Point Clouds

Runsen Xu, Xiaolong Wang, Tai Wang et al.

ECCV 2024posterarXiv:2308.16911
289
citations
#91

AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection

Qihang Zhou, Guansong Pang, Yu Tian et al.

ICLR 2024posterarXiv:2310.18961
288
citations
#92

ReconFusion: 3D Reconstruction with Diffusion Priors

Rundi Wu, Ben Mildenhall, Philipp Henzler et al.

CVPR 2024posterarXiv:2312.02981
283
citations
#93

DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior

Xinqi Lin, Jingwen He, Ziyan Chen et al.

ECCV 2024posterarXiv:2308.15070
279
citations
#94

Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos

Yue Ma, Yingqing HE, Xiaodong Cun et al.

AAAI 2024paperarXiv:2304.01186
276
citations
#95

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

Gengze Zhou, Yicong Hong, Qi Wu

AAAI 2024paperarXiv:2305.16986
276
citations
#96

DreamLLM: Synergistic Multimodal Comprehension and Creation

Runpei Dong, chunrui han, Yuang Peng et al.

ICLR 2024spotlightarXiv:2309.11499
275
citations
#97

Provable Robust Watermarking for AI-Generated Text

Xuandong Zhao, Prabhanjan Ananth, Lei Li et al.

ICLR 2024posterarXiv:2306.17439
271
citations
#98

Photorealistic Video Generation with Diffusion Models

Agrim Gupta, Lijun Yu, Kihyuk Sohn et al.

ECCV 2024posterarXiv:2312.06662
270
citations
#99

Understanding the Effects of RLHF on LLM Generalisation and Diversity

Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis et al.

ICLR 2024posterarXiv:2310.06452
267
citations
#100

NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving

Tianwen Qian, Jingjing Chen, Linhai Zhuo et al.

AAAI 2024paperarXiv:2305.14836
266
citations
#101

Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers

Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo et al.

CVPR 2024posterarXiv:2312.09147
266
citations
#102

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

Lu Ling, Yichen Sheng, Zhi Tu et al.

CVPR 2024posterarXiv:2312.16256
266
citations
#103

DeepCache: Accelerating Diffusion Models for Free

Xinyin Ma, Gongfan Fang, Xinchao Wang

CVPR 2024posterarXiv:2312.00858
265
citations
#104

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

Chongyu Fan, Jiancheng Liu, Yihua Zhang et al.

ICLR 2024spotlightarXiv:2310.12508
263
citations
#105

Large Language Models as Tool Makers

Tianle Cai, Xuezhi Wang, Tengyu Ma et al.

ICLR 2024posterarXiv:2305.17126
262
citations
#106

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

Yinghao Xu, Zifan Shi, Wang Yifan et al.

ECCV 2024posterarXiv:2403.14621
259
citations
#107

MedSegDiff-V2: Diffusion-based Medical Image Segmentation with Transformer

Junde Wu, Wei Ji, Huazhu Fu et al.

AAAI 2024paperarXiv:2301.11798
259
citations
#108

Detecting and Preventing Hallucinations in Large Vision Language Models

Anisha Gunjal, Jihan Yin, Erhan Bas

AAAI 2024paperarXiv:2308.06394
256
citations
#109

SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

Rongyuan Wu, Tao Yang, Lingchen Sun et al.

CVPR 2024posterarXiv:2311.16518
256
citations
#110

On Scaling Up a Multilingual Vision and Language Model

Xi Chen, Josip Djolonga, Piotr Padlewski et al.

CVPR 2024posterarXiv:2305.18565
254
citations
#111

EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations

Yi-Lun Liao, Brandon Wood, Abhishek Das et al.

ICLR 2024posterarXiv:2306.12059
254
citations
#112

Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts

Jian Xie, Kai Zhang, Jiangjie Chen et al.

ICLR 2024spotlightarXiv:2305.13300
252
citations
#113

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier et al.

ECCV 2024posterarXiv:2403.09611
246
citations
#114

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Kai Zhang, Sai Bi, Hao Tan et al.

ECCV 2024posterarXiv:2404.19702
246
citations
#115

VTimeLLM: Empower LLM to Grasp Video Moments

Bin Huang, Xin Wang, Hong Chen et al.

CVPR 2024highlightarXiv:2311.18445
244
citations
#116

Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization

Tao Yang, Rongyuan Wu, Peiran Ren et al.

ECCV 2024posterarXiv:2308.14469
242
citations
#117

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

Taoran Yi, Jiemin Fang, Junjie Wang et al.

CVPR 2024posterarXiv:2310.08529
241
citations
#118

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

Yunyang Xiong, Balakrishnan Varadarajan, Lemeng Wu et al.

CVPR 2024highlightarXiv:2312.00863
241
citations
#119

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.

AAAI 2024paperarXiv:2308.15366
240
citations
#120

Emu Edit: Precise Image Editing via Recognition and Generation Tasks

Shelly Sheynin, Adam Polyak, Uriel Singer et al.

CVPR 2024highlightarXiv:2311.10089
238
citations
#121

RoMa: Robust Dense Feature Matching

Johan Edstedt, Qiyu Sun, Georg Bökman et al.

CVPR 2024posterarXiv:2305.15404
238
citations
#122

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

Yaofang Liu, Xiaodong Cun, Xuebo Liu et al.

CVPR 2024posterarXiv:2310.11440
237
citations
#123

SaProt: Protein Language Modeling with Structure-aware Vocabulary

Jin Su, Chenchen Han, Yuyang Zhou et al.

ICLR 2024spotlight
237
citations
#124

SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery

Xin Guo, Jiangwei Lao, Bo Dang et al.

CVPR 2024posterarXiv:2312.10115
236
citations
#125

Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation

Hongtao Wu, Ya Jing, Chilam Cheang et al.

ICLR 2024posterarXiv:2312.13139
236
citations
#126

Omni-Kernel Network for Image Restoration

Yuning Cui, Wenqi Ren, Alois Knoll

AAAI 2024paper
235
citations
#127

DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

Xiaofeng Wang, Zheng Zhu, Guan Huang et al.

ECCV 2024posterarXiv:2309.09777
234
citations
#128

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Shenhao Zhu, Junming Chen, Zuozhuo Dai et al.

ECCV 2024posterarXiv:2403.14781
233
citations
#129

DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization

Jiahe Li, Jiawei Zhang, Xiao Bai et al.

CVPR 2024posterarXiv:2403.06912
232
citations
#130

Language Models Represent Space and Time

Wes Gurnee, Max Tegmark

ICLR 2024oralarXiv:2310.02207
232
citations
#131

Knowledge Graph Prompting for Multi-Document Question Answering

Yu Wang, Nedim Lipka, Ryan A. Rossi et al.

AAAI 2024paperarXiv:2308.11730
231
citations
#132

Sequential Modeling Enables Scalable Learning for Large Vision Models

Yutong Bai, Xinyang Geng, Karttikeya Mangalam et al.

CVPR 2024posterarXiv:2312.00785
230
citations
#133

GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces

Yingwenqi Jiang, Jiadong Tu, Yuan Liu et al.

CVPR 2024posterarXiv:2311.17977
228
citations
#134

DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model

Yinghao Xu, Hao Tan, Fujun Luan et al.

ICLR 2024spotlightarXiv:2311.09217
227
citations
#135

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

Xuhui Zhou, Hao Zhu, Leena Mathur et al.

ICLR 2024spotlightarXiv:2310.11667
226
citations
#136

Segment and Recognize Anything at Any Granularity

Feng Li, Hao Zhang, Peize Sun et al.

ECCV 2024posterarXiv:2307.04767
226
citations
#137

PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Junsong Chen, Chongjian GE, Enze Xie et al.

ECCV 2024poster
223
citations
#138

GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

Xiao Fu, Wei Yin, Mu Hu et al.

ECCV 2024posterarXiv:2403.12013
223
citations
#139

RECOMP: Improving Retrieval-Augmented LMs with Context Compression and Selective Augmentation

Fangyuan Xu, Weijia Shi, Eunsol Choi

ICLR 2024poster
222
citations
#140

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

Hong Liu, Zhiyuan Li, David Hall et al.

ICLR 2024posterarXiv:2305.14342
222
citations
#141

Listen, Think, and Understand

Yuan Gong, Hongyin Luo, Alexander Liu et al.

ICLR 2024posterarXiv:2305.10790
221
citations
#142

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Linrui Tian, Qi Wang, Bang Zhang et al.

ECCV 2024posterarXiv:2402.17485
218
citations
#143

Data Filtering Networks

Alex Fang, Albin Madappally Jose, Amit Jain et al.

ICLR 2024posterarXiv:2309.17425
217
citations
#144

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding

Yi Wang, Kunchang Li, Xinhao Li et al.

ECCV 2024posterarXiv:2403.15377
214
citations
#145

SinSR: Diffusion-Based Image Super-Resolution in a Single Step

Yufei Wang, Wenhan Yang, Xinyuan Chen et al.

CVPR 2024posterarXiv:2311.14760
214
citations
#146

CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model

Zhengyi Wang, Yikai Wang, Yifei Chen et al.

ECCV 2024posterarXiv:2403.05034
213
citations
#147

Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer

Jiwoo Chung, Sangeek Hyun, Jae-Pil Heo

CVPR 2024highlightarXiv:2312.09008
211
citations
#148

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

Xinyuan Chen, Yaohui Wang, Lingjun Zhang et al.

ICLR 2024oralarXiv:2310.20700
209
citations
#149

A Variational Perspective on Solving Inverse Problems with Diffusion Models

Morteza Mardani, Jiaming Song, Jan Kautz et al.

ICLR 2024posterarXiv:2305.04391
207
citations
#150

Habitat 3.0: A Co-Habitat for Humans, Avatars, and Robots

Xavier Puig, Eric Undersander, Andrew Szot et al.

ICLR 2024posterarXiv:2310.13724
206
citations
#151

Agent Attention: On the Integration of Softmax and Linear Attention

Dongchen Han, Tianzhu Ye, Yizeng Han et al.

ECCV 2024posterarXiv:2312.08874
206
citations
#152

Demystifying CLIP Data

Hu Xu, Saining Xie, Xiaoqing Tan et al.

ICLR 2024spotlightarXiv:2309.16671
205
citations
#153

Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-World Multi-Turn Dialogue

Songhua Yang, Hanjie Zhao, Senbin Zhu et al.

AAAI 2024paperarXiv:2308.03549
204
citations
#154

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

Aojun Zhou, Ke Wang, Zimu Lu et al.

ICLR 2024posterarXiv:2308.07921
196
citations
#155

LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models

Yixiao Li, Yifan Yu, Chen Liang et al.

ICLR 2024posterarXiv:2310.08659
194
citations
#156

Conformal Risk Control

Anastasios Angelopoulos, Stephen Bates, Adam Fisch et al.

ICLR 2024spotlightarXiv:2208.02814
193
citations
#157

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

Le Xue, Ning Yu, Shu Zhang et al.

CVPR 2024posterarXiv:2305.08275
192
citations
#158

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Wenbo Hu, Yifan Xu, Yi Li et al.

AAAI 2024paperarXiv:2308.09936
190
citations
#159

ZigMa: A DiT-style Zigzag Mamba Diffusion Model

Tao Hu, Stefan Andreas Baumann, Ming Gui et al.

ECCV 2024posterarXiv:2403.13802
188
citations
#160

Think before you speak: Training Language Models With Pause Tokens

Sachin Goyal, Ziwei Ji, Ankit Singh Rawat et al.

ICLR 2024posterarXiv:2310.02226
187
citations
#161

OctoPack: Instruction Tuning Code Large Language Models

Niklas Muennighoff, Qian Liu, Armel Zebaze et al.

ICLR 2024spotlightarXiv:2308.07124
187
citations
#162

Mixed-Type Tabular Data Synthesis with Score-based Diffusion in Latent Space

Hengrui Zhang, Jiani Zhang, Zhengyuan Shen et al.

ICLR 2024posterarXiv:2310.09656
186
citations
#163

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

Xin Liu, Yichen Zhu, Jindong Gu et al.

ECCV 2024posterarXiv:2311.17600
183
citations
#164

Putting the Object Back into Video Object Segmentation

Ho Kei Cheng, Seoung Wug Oh, Brian Price et al.

CVPR 2024highlightarXiv:2310.12982
182
citations
#165

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

Yang Liu, Chuanchen Luo, Lue Fan et al.

ECCV 2024posterarXiv:2404.01133
180
citations
#166

HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression

Yihang Chen, Qianyi Wu, Weiyao Lin et al.

ECCV 2024posterarXiv:2403.14530
179
citations
#167

ReLoRA: High-Rank Training Through Low-Rank Updates

Vladislav Lialin, Sherin Muckatira, Namrata Shivagunde et al.

ICLR 2024posterarXiv:2307.05695
179
citations
#168

MSGNet: Learning Multi-Scale Inter-series Correlations for Multivariate Time Series Forecasting

Wanlin Cai, Yuxuan Liang, Xianggen Liu et al.

AAAI 2024paperarXiv:2401.00423
177
citations
#169

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Jeongho Kim, Gyojung Gu, Minho Park et al.

CVPR 2024posterarXiv:2312.01725
176
citations
#170

On the Reliability of Watermarks for Large Language Models

John Kirchenbauer, Jonas Geiping, Yuxin Wen et al.

ICLR 2024posterarXiv:2306.04634
176
citations
#171

Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians

Guangchi Fang, Bing Wang

ECCV 2024posterarXiv:2403.14166
175
citations
#172

ODTrack: Online Dense Temporal Token Learning for Visual Tracking

Yaozong Zheng, Bineng Zhong, Qihua Liang et al.

AAAI 2024paperarXiv:2401.01686
173
citations
#173

RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection

Ximiao Zhang, Min Xu, Xiuzhuang Zhou

CVPR 2024posterarXiv:2403.05897
171
citations
#174

Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding

Jin-Chuan Shi, Miao Wang, Haobin Duan et al.

CVPR 2024posterarXiv:2311.18482
171
citations
#175

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models

Yuwei Guo, Ceyuan Yang, Anyi Rao et al.

ECCV 2024posterarXiv:2311.16933
171
citations
#176

Sapiens: Foundation for Human Vision Models

Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez et al.

ECCV 2024posterarXiv:2408.12569
170
citations
#177

LLaVA-UHD: an LMM Perceiving any Aspect Ratio and High-Resolution Images

Zonghao Guo, Ruyi Xu, Yuan Yao et al.

ECCV 2024posterarXiv:2403.11703
170
citations
#178

Fast Machine Unlearning without Retraining through Selective Synaptic Dampening

Jack Foster, Stefan Schoepf, Alexandra Brintrup

AAAI 2024paperarXiv:2308.07707
170
citations
#179

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

Zhiqi Li, Zhiding Yu, Shiyi Lan et al.

CVPR 2024posterarXiv:2312.03031
169
citations
#180

4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling

Sherwin Bahmani, Ivan Skorokhodov, Victor Rong et al.

CVPR 2024posterarXiv:2311.17984
168
citations
#181

OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

Wenzhao Zheng, Weiliang Chen, Yuanhui Huang et al.

ECCV 2024posterarXiv:2311.16038
167
citations
#182

Compositional Chain-of-Thought Prompting for Large Multimodal Models

Chancharik Mitra, Brandon Huang, Trevor Darrell et al.

CVPR 2024posterarXiv:2311.17076
167
citations
#183

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Viraj Shah, Nataniel Ruiz, Forrester Cole et al.

ECCV 2024posterarXiv:2311.13600
167
citations
#184

Can Large Language Models Infer Causation from Correlation?

Zhijing Jin, Jiarui Liu, Zhiheng LYU et al.

ICLR 2024posterarXiv:2306.05836
166
citations
#185

BioCLIP: A Vision Foundation Model for the Tree of Life

Samuel Stevens, Jiaman Wu, Matthew Thompson et al.

CVPR 2024posterarXiv:2311.18803
165
citations
#186

SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

Zhijing Shao, Wang Zhaolong, Zhuang Li et al.

CVPR 2024posterarXiv:2403.05087
165
citations
#187

Uni3D: Exploring Unified 3D Representation at Scale

Junsheng Zhou, Jinsheng Wang, Baorui Ma et al.

ICLR 2024spotlightarXiv:2310.06773
165
citations
#188

GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions

Junjie Wang, Jiemin Fang, Xiaopeng Zhang et al.

CVPR 2024posterarXiv:2311.16037
164
citations
#189

HIVE: Harnessing Human Feedback for Instructional Visual Editing

Shu Zhang, Xinyi Yang, Yihao Feng et al.

CVPR 2024posterarXiv:2303.09618
164
citations
#190

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

Xuan JU, Xian Liu, Xintao Wang et al.

ECCV 2024posterarXiv:2403.06976
163
citations
#191

Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?

Yu-Lin Tsai, Chia-Yi Hsu, Chulin Xie et al.

ICLR 2024posterarXiv:2310.10012
162
citations
#192

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Shunyuan Zheng, Boyao ZHOU, Ruizhi Shao et al.

CVPR 2024highlightarXiv:2312.02155
160
citations
#193

Is Self-Repair a Silver Bullet for Code Generation?

Theo X. Olausson, Jeevana Priya Inala, Chenglong Wang et al.

ICLR 2024posterarXiv:2306.09896
160
citations
#194

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Niels Mündler, Jingxuan He, Slobodan Jenko et al.

ICLR 2024posterarXiv:2305.15852
159
citations
#195

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

Niloofar Mireshghallah, Hyunwoo Kim, Xuhui Zhou et al.

ICLR 2024spotlightarXiv:2310.17884
158
citations
#196

Grounded Text-to-Image Synthesis with Attention Refocusing

Quynh Phung, Songwei Ge, Jia-Bin Huang

CVPR 2024posterarXiv:2306.05427
157
citations
#197

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

Peng Wu, Xuerong Zhou, Guansong Pang et al.

AAAI 2024paperarXiv:2308.11681
156
citations
#198

BooookScore: A systematic exploration of book-length summarization in the era of LLMs

Yapei Chang, Kyle Lo, Tanya Goyal et al.

ICLR 2024posterarXiv:2310.00785
156
citations
#199

Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering

Zhiwen Yan, Weng Fei Low, Yu Chen et al.

CVPR 2024posterarXiv:2311.17089
155
citations
#200

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Peng Wang, Hao Tan, Sai Bi et al.

ICLR 2024spotlightarXiv:2311.12024
154
citations
PreviousNext