Most Cited 2024 Poster "data-scarce regimes" Papers

12,324 papers found • Page 1 of 62

#1

Improved Baselines with Visual Instruction Tuning

Haotian Liu, Chunyuan Li, Yuheng Li et al.

CVPR 2024highlightarXiv:2310.03744
4359
citations
#2

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Dustin Podell, Zion English, Kyle Lacey et al.

ICLR 2024spotlightarXiv:2307.01952
3991
citations
#3

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren et al.

ECCV 2024arXiv:2303.05499
3440
citations
#4

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao

ECCV 2024arXiv:2402.13616
3033
citations
#5

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann et al.

ICML 2024arXiv:2403.03206
2965
citations
#6

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

Deyao Zhu, jun chen, Xiaoqian Shen et al.

ICLR 2024arXiv:2304.10592
2806
citations
#7

DETRs Beat YOLOs on Real-time Object Detection

Yian Zhao, Wenyu Lv, Shangliang Xu et al.

CVPR 2024arXiv:2304.08069
2565
citations
#8

Let's Verify Step by Step

Hunter Lightman, Vineet Kosaraju, Yuri Burda et al.

ICLR 2024arXiv:2305.20050
2488
citations
#9

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Zhe Chen, Jiannan Wu, Wenhai Wang et al.

CVPR 2024arXiv:2312.14238
2295
citations
#10

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Tri Dao

ICLR 2024arXiv:2307.08691
2224
citations
#11

MMBENCH: Is Your Multi-Modal Model an All-around Player?

Yuan Liu, Haodong Duan, Yuanhan Zhang et al.

ECCV 2024arXiv:2307.06281
1745
citations
#12

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Xiang Yue, Yuansheng Ni, Kai Zhang et al.

CVPR 2024arXiv:2311.16502
1715
citations
#13

SWE-bench: Can Language Models Resolve Real-world Github Issues?

Carlos E Jimenez, John Yang, Alexander Wettig et al.

ICLR 2024arXiv:2310.06770
1485
citations
#14

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Lihe Yang, Bingyi Kang, Zilong Huang et al.

CVPR 2024arXiv:2401.10891
1479
citations
#15

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion

Chong Mou, Xintao Wang, Liangbin Xie et al.

AAAI 2024paperarXiv:2302.08453
1460
citations
#16

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Lianghui Zhu, Bencheng Liao, Qian Zhang et al.

ICML 2024arXiv:2401.09417
1457
citations
#17

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Akari Asai, Zeqiu Wu, Yizhong Wang et al.

ICLR 2024arXiv:2310.11511
1435
citations
#18

Efficient Streaming Language Models with Attention Sinks

Guangxuan Xiao, Yuandong Tian, Beidi Chen et al.

ICLR 2024arXiv:2309.17453
1396
citations
#19

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen et al.

ICLR 2024arXiv:2308.00352
1367
citations
#20

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Xin Li, Jing Yu Koh, Alexander Ku et al.

ICLR 2024
1366
citations
#21

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

Yong Liu, Tengge Hu, Haoran Zhang et al.

ICLR 2024oralarXiv:2310.06625
1356
citations
#22

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

Yuwei GUO, Ceyuan Yang, Anyi Rao et al.

ICLR 2024oralarXiv:2307.04725
1330
citations
#23

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Yilun Du, Shuang Li, Antonio Torralba et al.

ICML 2024arXiv:2305.14325
1274
citations
#24

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Pan Lu, Hritik Bansal, Tony Xia et al.

ICLR 2024arXiv:2310.02255
1235
citations
#25

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Yujia Qin, Shihao Liang, Yining Ye et al.

ICLR 2024spotlightarXiv:2307.16789
1197
citations
#26

WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions

Can Xu, Qingfeng Sun, Kai Zheng et al.

ICLR 2024arXiv:2304.12244
1162
citations
#27

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Tri Dao, Albert Gu

ICML 2024arXiv:2405.21060
1146
citations
#28

Graph of Thoughts: Solving Elaborate Problems with Large Language Models

Maciej Besta, Nils Blach, Ales Kubicek et al.

AAAI 2024paperarXiv:2308.09687
1116
citations
#29

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Guanjun Wu, Taoran Yi, Jiemin Fang et al.

CVPR 2024arXiv:2310.08528
1110
citations
#30

VBench: Comprehensive Benchmark Suite for Video Generative Models

Ziqi Huang, Yinan He, Jiashuo Yu et al.

CVPR 2024highlightarXiv:2311.17982
1072
citations
#31

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

Weihao Yu, Zhengyuan Yang, Linjie Li et al.

ICML 2024arXiv:2308.02490
1066
citations
#32

Grounding Multimodal Large Language Models to the World

Zhiliang Peng, Wenhui Wang, Li Dong et al.

ICLR 2024arXiv:2306.14824
1059
citations
#33

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng et al.

ICML 2024arXiv:2403.04132
1026
citations
#34

DUSt3R: Geometric 3D Vision Made Easy

Shuzhe Wang, Vincent Leroy, Yohann Cabon et al.

CVPR 2024arXiv:2312.14132
1005
citations
#35

A Generalist Agent

Jackie Kay, Sergio Gómez Colmenarejo, Mahyar Bordbar et al.

ICLR 2024
978
citations
#36

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Lin Chen, Jinsong Li, Xiaoyi Dong et al.

ECCV 2024arXiv:2311.12793
970
citations
#37

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Xiangyu Qi, Yi Zeng, Tinghao Xie et al.

ICLR 2024arXiv:2310.03693
966
citations
#38

Teaching Large Language Models to Self-Debug

Xinyun Chen, Maxwell Lin, Nathanael Schaerli et al.

ICLR 2024arXiv:2304.05128
959
citations
#39

WebArena: A Realistic Web Environment for Building Autonomous Agents

Shuyan Zhou, Frank F Xu, Hao Zhu et al.

ICLR 2024arXiv:2307.13854
916
citations
#40

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

Kunchang Li, Yali Wang, Yinan He et al.

CVPR 2024highlightarXiv:2311.17005
902
citations
#41

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Jiaxiang Tang, Jiawei Ren, Hang Zhou et al.

ICLR 2024arXiv:2309.16653
884
citations
#42

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Ziyang Luo, Can Xu, Pu Zhao et al.

ICLR 2024arXiv:2306.08568
881
citations
#43

MVDream: Multi-view Diffusion for 3D Generation

Yichun Shi, Peng Wang, Jianglong Ye et al.

ICLR 2024arXiv:2308.16512
880
citations
#44

Model Alignment as Prospect Theoretic Optimization

Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff et al.

ICML 2024spotlightarXiv:2402.01306
871
citations
#45

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Mantas Mazeika, Long Phan, Xuwang Yin et al.

ICML 2024arXiv:2402.04249
802
citations
#46

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

Chi-Min Chan, Weize Chen, Yusheng Su et al.

ICLR 2024arXiv:2308.07201
766
citations
#47

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

Ming Jin, Shiyu Wang, Lintao Ma et al.

ICLR 2024arXiv:2310.01728
765
citations
#48

LISA: Reasoning Segmentation via Large Language Model

Xin Lai, Zhuotao Tian, Yukang Chen et al.

CVPR 2024arXiv:2308.00692
742
citations
#49

Large Language Models Cannot Self-Correct Reasoning Yet

Jie Huang, Xinyun Chen, Swaroop Mishra et al.

ICLR 2024arXiv:2310.01798
738
citations
#50

NExT-GPT: Any-to-Any Multimodal LLM

Shengqiong Wu, Hao Fei, Leigang Qu et al.

ICML 2024arXiv:2309.05519
726
citations
#51

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

Miao Xiong, Zhiyuan Hu, Xinyang Lu et al.

ICLR 2024arXiv:2306.13063
715
citations
#52

LRM: Large Reconstruction Model for Single Image to 3D

Yicong Hong, Kai Zhang, Jiuxiang Gu et al.

ICLR 2024arXiv:2311.04400
711
citations
#53

Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction

Ziyi Yang, Xinyu Gao, Wen Zhou et al.

CVPR 2024arXiv:2309.13101
710
citations
#54

DoRA: Weight-Decomposed Low-Rank Adaptation

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin et al.

ICML 2024arXiv:2402.09353
706
citations
#55

VILA: On Pre-training for Visual Language Models

Ji Lin, Danny Yin, Wei Ping et al.

CVPR 2024arXiv:2312.07533
701
citations
#56

A Simple and Effective Pruning Approach for Large Language Models

Mingjie Sun, Zhuang Liu, Anna Bair et al.

ICLR 2024arXiv:2306.11695
700
citations
#57

Training Diffusion Models with Reinforcement Learning

Kevin Black, Michael Janner, Yilun Du et al.

ICLR 2024arXiv:2305.13301
691
citations
#58

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Li Hu

CVPR 2024arXiv:2311.17117
684
citations
#59

Large Language Models as Optimizers

Chengrun Yang, Xuezhi Wang, Yifeng Lu et al.

ICLR 2024arXiv:2309.03409
683
citations
#60

YOLO-World: Real-Time Open-Vocabulary Object Detection

Tianheng Cheng, Lin Song, Yixiao Ge et al.

CVPR 2024arXiv:2401.17270
682
citations
#61

Wonder3D: Single Image to 3D using Cross-Domain Diffusion

Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin et al.

CVPR 2024highlightarXiv:2310.15008
672
citations
#62

SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering

Antoine Guédon, Vincent Lepetit

CVPR 2024arXiv:2311.12775
654
citations
#63

Vision Transformers Need Registers

Timothée Darcet, Maxime Oquab, Julien Mairal et al.

ICLR 2024arXiv:2309.16588
649
citations
#64

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen et al.

ECCV 2024arXiv:2402.05054
639
citations
#65

CogAgent: A Visual Language Model for GUI Agents

Wenyi Hong, Weihan Wang, Qingsong Lv et al.

CVPR 2024highlightarXiv:2312.08914
629
citations
#66

SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

Yuan Liu, Cheng Lin, Zijiao Zeng et al.

ICLR 2024spotlightarXiv:2309.03453
629
citations
#67

Adversarial Diffusion Distillation

Axel Sauer, Dominik Lorenz, Andreas Blattmann et al.

ECCV 2024arXiv:2311.17042
629
citations
#68

Mip-Splatting: Alias-free 3D Gaussian Splatting

Zehao Yu, Anpei Chen, Binbin Huang et al.

CVPR 2024arXiv:2311.16493
627
citations
#69

CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing

Zhibin Gou, Zhihong Shao, Yeyun Gong et al.

ICLR 2024arXiv:2305.11738
621
citations
#70

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Tao Lu, Mulin Yu, Linning Xu et al.

CVPR 2024highlightarXiv:2312.00109
620
citations
#71

MusicRL: Aligning Music Generation to Human Preferences

Geoffrey Cideron, Sertan Girgin, Mauro Verzetti et al.

ICML 2024arXiv:2301.11325
616
citations
#72

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Qinghao Ye, Haiyang Xu, Jiabo Ye et al.

CVPR 2024highlightarXiv:2311.04257
614
citations
#73

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

Xiaogeng Liu, Nan Xu, Muhao Chen et al.

ICLR 2024arXiv:2310.04451
604
citations
#74

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

Shengbang Tong, Zhuang Liu, Yuexiang Zhai et al.

CVPR 2024arXiv:2401.06209
593
citations
#75

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

Melanie Sclar, Yejin Choi, Yulia Tsvetkov et al.

ICLR 2024arXiv:2310.11324
581
citations
#76

One-step Diffusion with Distribution Matching Distillation

Tianwei Yin, Michaël Gharbi, Richard Zhang et al.

CVPR 2024arXiv:2311.18828
579
citations
#77

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

Boyuan Chen, Zhuo Xu, Sean Kirmani et al.

CVPR 2024arXiv:2401.12168
578
citations
#78

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Longhui Yu, Weisen JIANG, Han Shi et al.

ICLR 2024spotlightarXiv:2309.12284
578
citations
#79

Safe RLHF: Safe Reinforcement Learning from Human Feedback

Juntao Dai, Xuehai Pan, Ruiyang Sun et al.

ICLR 2024spotlightarXiv:2310.12773
567
citations
#80

Diffusion Model Alignment Using Direct Preference Optimization

Bram Wallace, Meihua Dang, Rafael Rafailov et al.

CVPR 2024arXiv:2311.12908
561
citations
#81

MambaIR: A Simple Baseline for Image Restoration with State-Space Model

Hang Guo, Jinmin Li, Tao Dai et al.

ECCV 2024arXiv:2402.15648
560
citations
#82

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Tianle Cai, Yuhong Li, Zhengyang Geng et al.

ICML 2024arXiv:2401.10774
549
citations
#83

Language Model Beats Diffusion - Tokenizer is key to visual generation

Lijun Yu, José Lezama, Nitesh Bharadwaj Gundavarapu et al.

ICLR 2024arXiv:2310.05737
548
citations
#84

AgentBench: Evaluating LLMs as Agents

Xiao Liu, Hao Yu, Hanchen Zhang et al.

ICLR 2024arXiv:2308.03688
543
citations
#85

Grounding Image Matching in 3D with MASt3R

Vincent Leroy, Yohann Cabon, Jerome Revaud

ECCV 2024arXiv:2406.09756
541
citations
#86

GAIA: a benchmark for General AI Assistants

Grégoire Mialon, Clémentine Fourrier, Thomas Wolf et al.

ICLR 2024arXiv:2311.12983
531
citations
#87

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

Le Yu, Bowen Yu, Haiyang Yu et al.

ICML 2024arXiv:2311.03099
531
citations
#88

RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Harrison Lee, Samrat Phatale, Hassan Mansoor et al.

ICML 2024arXiv:2309.00267
527
citations
#89

Towards Understanding Sycophancy in Language Models

Mrinank Sharma, Meg Tong, Tomek Korbak et al.

ICLR 2024arXiv:2310.13548
526
citations
#90

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

Xiang Yue, Xingwei Qu, Ge Zhang et al.

ICLR 2024spotlightarXiv:2309.05653
522
citations
#91

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

David Charatan, Sizhe Lester Li, Andrea Tagliasacchi et al.

CVPR 2024arXiv:2312.12337
516
citations
#92

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Haoxin Chen, Yong Zhang, Xiaodong Cun et al.

CVPR 2024arXiv:2401.09047
512
citations
#93

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

Weize Chen, Yusheng Su, Jingwei Zuo et al.

ICLR 2024arXiv:2308.10848
503
citations
#94

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

Yanwei Li, Chengyao Wang, Jiaya Jia

ECCV 2024arXiv:2311.17043
499
citations
#95

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Renrui Zhang, Dongzhi Jiang, Yichi Zhang et al.

ECCV 2024arXiv:2403.14624
498
citations
#96

SplaTAM: Splat Track & Map 3D Gaussians for Dense RGB-D SLAM

Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula et al.

CVPR 2024arXiv:2312.02126
497
citations
#97

Self-Rewarding Language Models

Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho et al.

ICML 2024arXiv:2401.10020
497
citations
#98

A decoder-only foundation model for time-series forecasting

Abhimanyu Das, Weihao Kong, Rajat Sen et al.

ICML 2024oralarXiv:2310.10688
495
citations
#99

Patches Are All You Need?

Asher Trockman, J Kolter

ICLR 2024arXiv:2201.09792
494
citations
#100

Eureka: Human-Level Reward Design via Coding Large Language Models

Yecheng Jason Ma, William Liang, Guanzhi Wang et al.

ICLR 2024arXiv:2310.12931
491
citations
#101

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Sicong Leng, Hang Zhang, Guanzheng Chen et al.

CVPR 2024highlightarXiv:2311.16922
487
citations
#102

RepViT: Revisiting Mobile CNN From ViT Perspective

Ao Wang, Hui Chen, Zijia Lin et al.

CVPR 2024arXiv:2307.09283
481
citations
#103

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Zixiang Chen, Yihe Deng, Huizhuo Yuan et al.

ICML 2024arXiv:2401.01335
480
citations
#104

Benchmarking Large Language Models in Retrieval-Augmented Generation

Jiawei Chen, Hongyu Lin, Xianpei Han et al.

AAAI 2024paperarXiv:2309.01431
475
citations
#105

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Enxin Song, Wenhao Chai, Guanhong Wang et al.

CVPR 2024arXiv:2307.16449
471
citations
#106

CoTracker: It is Better to Track Together

Nikita Karaev, Ignacio Rocco, Ben Graham et al.

ECCV 2024arXiv:2307.07635
466
citations
#107

Gaussian Splatting SLAM

Hidenobu Matsuki, Riku Murai, Paul Kelly et al.

CVPR 2024highlightarXiv:2312.06741
462
citations
#108

SALMONN: Towards Generic Hearing Abilities for Large Language Models

Changli Tang, Wenyi Yu, Guangzhi Sun et al.

ICLR 2024arXiv:2310.13289
462
citations
#109

Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting

Zeyu Yang, Hongye Yang, Zijie Pan et al.

ICLR 2024oralarXiv:2310.10642
460
citations
#110

Ferret: Refer and Ground Anything Anywhere at Any Granularity

Haoxuan You, Haotian Zhang, Zhe Gan et al.

ICLR 2024spotlightarXiv:2310.07704
457
citations
#111

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

Nanye Ma, Mark Goldstein, Michael Albergo et al.

ECCV 2024arXiv:2401.08740
448
citations
#112

YaRN: Efficient Context Window Extension of Large Language Models

Bowen Peng, Jeffrey Quesnelle, Honglu Fan et al.

ICLR 2024arXiv:2309.00071
440
citations
#113

Generative Multimodal Models are In-Context Learners

Quan Sun, Yufeng Cui, Xiaosong Zhang et al.

CVPR 2024arXiv:2312.13286
438
citations
#114

TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting

Shiyu Wang, Haixu Wu, Xiaoming Shi et al.

ICLR 2024oralarXiv:2405.14616
438
citations
#115

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Bowen Wen, Wei Yang, Jan Kautz et al.

CVPR 2024highlightarXiv:2312.08344
435
citations
#116

MobileNetV4: Universal Models for the Mobile Ecosystem

Danfeng Qin, Chas Leichner, Manolis Delakis et al.

ECCV 2024arXiv:2404.10518
434
citations
#117

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

Yangsibo Huang, Samyak Gupta, Mengzhou Xia et al.

ICLR 2024spotlightarXiv:2310.06987
430
citations
#118

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

Guocheng Qian, Jinjie Mai, Abdullah Hamdi et al.

ICLR 2024arXiv:2306.17843
429
citations
#119

Unified Training of Universal Time Series Forecasting Transformers

Gerald Woo, Chenghao Liu, Akshat Kumar et al.

ICML 2024arXiv:2402.02592
428
citations
#120

Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng et al.

ICLR 2024arXiv:2310.06694
426
citations
#121

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Jinbo Xing, Menghan Xia, Yong Zhang et al.

ECCV 2024arXiv:2310.12190
424
citations
#122

GPT-4V(ision) is a Generalist Web Agent, if Grounded

Boyuan Zheng, Boyu Gou, Jihyung Kil et al.

ICML 2024arXiv:2401.01614
424
citations
#123

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Fuxiao Liu, Kevin Lin, Linjie Li et al.

ICLR 2024arXiv:2306.14565
422
citations
#124

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Dan Kondratyuk, Lijun Yu, Xiuye Gu et al.

ICML 2024arXiv:2312.14125
420
citations
#125

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

Yi Wang, Yinan He, Yizhuo Li et al.

ICLR 2024spotlightarXiv:2307.06942
419
citations
#126

Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning

Linhao Luo, Yuan-Fang Li, Reza Haffari et al.

ICLR 2024arXiv:2310.01061
415
citations
#127

AnyDoor: Zero-shot Object-level Image Customization

Xi Chen, Lianghua Huang, Yu Liu et al.

CVPR 2024arXiv:2307.09481
415
citations
#128

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Haoran Xu, Amr Sharaf, Yunmo Chen et al.

ICML 2024arXiv:2401.08417
414
citations
#129

WildChat: 1M ChatGPT Interaction Logs in the Wild

Wenting Zhao, Xiang Ren, Jack Hessel et al.

ICLR 2024oralarXiv:2405.01470
411
citations
#130

GLaMM: Pixel Grounding Large Multimodal Model

Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly et al.

CVPR 2024arXiv:2311.03356
411
citations
#131

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Bin Xiao, Haiping Wu, Weijian Xu et al.

CVPR 2024arXiv:2311.06242
409
citations
#132

VideoMamba: State Space Model for Efficient Video Understanding

Kunchang Li, Xinhao Li, Yi Wang et al.

ECCV 2024arXiv:2403.06977
407
citations
#133

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Collin Burns, Pavel Izmailov, Jan Kirchner et al.

ICML 2024arXiv:2312.09390
406
citations
#134

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

Youliang Yuan, Wenxiang Jiao, Wenxuan Wang et al.

ICLR 2024arXiv:2308.06463
403
citations
#135

Llemma: An Open Language Model for Mathematics

Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster et al.

ICLR 2024arXiv:2310.10631
402
citations
#136

LESS: Selecting Influential Data for Targeted Instruction Tuning

Mengzhou Xia, Sadhika Malladi, Suchin Gururangan et al.

ICML 2024arXiv:2402.04333
400
citations
#137

Universal Guidance for Diffusion Models

Arpit Bansal, Hong-Min Chu, Avi Schwarzschild et al.

ICLR 2024arXiv:2302.07121
399
citations
#138

Genie: Generative Interactive Environments

Jake Bruce, Michael Dennis, Ashley Edwards et al.

ICML 2024oralarXiv:2402.15391
397
citations
#139

Prometheus: Inducing Fine-Grained Evaluation Capability in Language Models

Seungone Kim, Jamin Shin, yejin cho et al.

ICLR 2024arXiv:2310.08491
396
citations
#140

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

Haoning Wu, Zicheng Zhang, Weixia Zhang et al.

ICML 2024arXiv:2312.17090
393
citations
#141

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

Tianrui Guan, Fuxiao Liu, Xiyang Wu et al.

CVPR 2024arXiv:2310.14566
392
citations
#142

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Zhang Li, Biao Yang, Qiang Liu et al.

CVPR 2024highlightarXiv:2311.06607
392
citations
#143

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

Suyu Ge, Yunan Zhang, Liyuan Liu et al.

ICLR 2024arXiv:2310.01801
390
citations
#144

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

Michal Geyer, Omer Bar Tal, Shai Bagon et al.

ICLR 2024arXiv:2307.10373
389
citations
#145

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Qidong Huang, Xiaoyi Dong, Pan Zhang et al.

CVPR 2024highlightarXiv:2311.17911
385
citations
#146

Large Language Models Are Not Robust Multiple Choice Selectors

Chujie Zheng, Hao Zhou, Fandong Meng et al.

ICLR 2024oralarXiv:2309.03882
383
citations
#147

Instant3D: Fast Text-to-3D with Sparse-view Generation and Large Reconstruction Model

Jiahao Li, Hao Tan, Kai Zhang et al.

ICLR 2024arXiv:2311.06214
381
citations
#148

Finite Scalar Quantization: VQ-VAE Made Simple

Fabian Mentzer, David Minnen, Eirikur Agustsson et al.

ICLR 2024arXiv:2309.15505
379
citations
#149

How Language Model Hallucinations Can Snowball

Muru Zhang, Ofir Press, William Merrill et al.

ICML 2024arXiv:2305.13534
378
citations
#150

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

Jing Shi, Wei Xiong, Zhe Lin et al.

CVPR 2024arXiv:2304.03411
377
citations
#151

ExpeL: LLM Agents Are Experiential Learners

Andrew Zhao, Daniel Huang, Quentin Xu et al.

AAAI 2024paperarXiv:2308.10144
376
citations
#152

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

Chi Yan, Delin Qu, Dong Wang et al.

CVPR 2024highlightarXiv:2311.11700
376
citations
#153

DriveLM: Driving with Graph Visual Question Answering

Chonghao Sima, Katrin Renz, Kashyap Chitta et al.

ECCV 2024arXiv:2312.14150
376
citations
#154

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Yuedong Chen, Haofei Xu, Chuanxia Zheng et al.

ECCV 2024arXiv:2403.14627
374
citations
#155

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Shuhuai Ren, Linli Yao, Shicheng Li et al.

CVPR 2024arXiv:2312.02051
372
citations
#156

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Jiawei Zhao, Zhenyu Zhang, Beidi Chen et al.

ICML 2024arXiv:2403.03507
371
citations
#157

LangSplat: 3D Language Gaussian Splatting

Minghan Qin, Wanhua Li, Jiawei ZHOU et al.

CVPR 2024highlightarXiv:2312.16084
368
citations
#158

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Zirui Liu, Jiayi Yuan, Hongye Jin et al.

ICML 2024arXiv:2402.02750
368
citations
#159

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Liang Chen, Haozhe Zhao, Tianyu Liu et al.

ECCV 2024arXiv:2403.06764
368
citations
#160

Compact 3D Gaussian Representation for Radiance Field

Joo Chan Lee, Daniel Rho, Xiangyu Sun et al.

CVPR 2024highlightarXiv:2311.13681
366
citations
#161

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Peng Jin, Ryuichi Takanobu, Cai Zhang et al.

CVPR 2024highlightarXiv:2311.08046
364
citations
#162

Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution

Chrisantha Fernando, Dylan Banarse, Henryk Michalewski et al.

ICML 2024arXiv:2309.16797
364
citations
#163

The Linear Representation Hypothesis and the Geometry of Large Language Models

Kiho Park, Yo Joong Choe, Victor Veitch

ICML 2024arXiv:2311.03658
363
citations
#164

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Tianyu Yu, Yuan Yao, Haoye Zhang et al.

CVPR 2024arXiv:2312.00849
361
citations
#165

Evaluating Text-to-Visual Generation with Image-to-Text Generation

Zhiqiu Lin, Deepak Pathak, Baiqi Li et al.

ECCV 2024arXiv:2404.01291
357
citations
#166

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Bin Zhu, Bin Lin, Munan Ning et al.

ICLR 2024arXiv:2310.01852
357
citations
#167

Effective Data Augmentation With Diffusion Models

Brandon Trabucco, Kyle Doherty, Max Gurinas et al.

ICLR 2024arXiv:2302.07944
356
citations
#168

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan et al.

CVPR 2024arXiv:2312.07920
355
citations
#169

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou, Chenlin Meng, Stefano Ermon

ICML 2024arXiv:2310.16834
354
citations
#170

MOMENT: A Family of Open Time-series Foundation Models

Mononito Goswami, Konrad Szafer, Arjun Choudhry et al.

ICML 2024arXiv:2402.03885
354
citations
#171

Analyzing and Improving the Training Dynamics of Diffusion Models

Tero Karras, Miika Aittala, Jaakko Lehtinen et al.

CVPR 2024arXiv:2312.02696
353
citations
#172

Rewrite the Stars

Xu Ma, Xiyang Dai, Yue Bai et al.

CVPR 2024arXiv:2403.19967
352
citations
#173

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng et al.

ICLR 2024spotlightarXiv:2309.11998
352
citations
#174

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2024arXiv:2402.19479
351
citations
#175

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian et al.

ICLR 2024arXiv:2306.03078
350
citations
#176

Learning Interactive Real-World Simulators

Sherry Yang, Yilun Du, Seyed Ghasemipour et al.

ICLR 2024arXiv:2310.06114
350
citations
#177

Wavelet Convolutions for Large Receptive Fields

Shahaf Finder, Roy Amoyal, Eran Treister et al.

ECCV 2024arXiv:2407.05848
348
citations
#178

V?: Guided Visual Search as a Core Mechanism in Multimodal LLMs

Penghao Wu, Saining Xie

CVPR 2024arXiv:2312.14135
345
citations
#179

Executable Code Actions Elicit Better LLM Agents

Xingyao Wang, Yangyi Chen, Lifan Yuan et al.

ICML 2024arXiv:2402.01030
344
citations
#180

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

Kai Shen, Zeqian Ju, Xu Tan et al.

ICLR 2024spotlightarXiv:2304.09116
344
citations
#181

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

Mingqiao Ye, Martin Danelljan, Fisher Yu et al.

ECCV 2024arXiv:2312.00732
344
citations
#182

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Kristen Grauman, Andrew Westbury, Lorenzo Torresani et al.

CVPR 2024arXiv:2311.18259
343
citations
#183

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang et al.

ICLR 2024spotlightarXiv:2308.13137
341
citations
#184

LoRA+: Efficient Low Rank Adaptation of Large Models

Soufiane Hayou, Nikhil Ghosh, Bin Yu

ICML 2024arXiv:2402.12354
341
citations
#185

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Yuhui Li, Fangyun Wei, Chao Zhang et al.

ICML 2024arXiv:2401.15077
338
citations
#186

Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions

Federico Bianchi, Mirac Suzgun, Giuseppe Attanasio et al.

ICLR 2024arXiv:2309.07875
338
citations
#187

DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genomes

Zhihan Zhou, Yanrong Ji, Weijian Li et al.

ICLR 2024arXiv:2306.15006
338
citations
#188

Preference Ranking Optimization for Human Alignment

Feifan Song, Bowen Yu, Minghao Li et al.

AAAI 2024paperarXiv:2306.17492
337
citations
#189

Poly Kernel Inception Network for Remote Sensing Detection

Xinhao Cai, Qiuxia Lai, Yuwei Wang et al.

CVPR 2024arXiv:2403.06258
337
citations
#190

What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning

Wei Liu, Weihao Zeng, Keqing He et al.

ICLR 2024arXiv:2312.15685
337
citations
#191

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

Yidong Wang, Zhuohao Yu, Wenjin Yao et al.

ICLR 2024arXiv:2306.05087
336
citations
#192

ControlVideo: Training-free Controllable Text-to-video Generation

Yabo Zhang, Yuxiang Wei, Dongsheng jiang et al.

ICLR 2024arXiv:2305.13077
335
citations
#193

Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

Shijie Zhou, Haoran Chang, Sicheng Jiang et al.

CVPR 2024highlightarXiv:2312.03203
335
citations
#194

BLINK: Multimodal Large Language Models Can See but Not Perceive

Xingyu Fu, Yushi Hu, Bangzheng Li et al.

ECCV 2024arXiv:2404.12390
333
citations
#195

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Parth Sarthi, Salman Abdullah, Aditi Tuli et al.

ICLR 2024arXiv:2401.18059
333
citations
#196

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

Dongjun Kim, Chieh-Hsin Lai, WeiHsiang Liao et al.

ICLR 2024arXiv:2310.02279
333
citations
#197

Text-to-3D using Gaussian Splatting

Zilong Chen, Feng Wang, Yikai Wang et al.

CVPR 2024arXiv:2309.16585
333
citations
#198

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning

Nathaniel Li, Alexander Pan, Anjali Gopal et al.

ICML 2024arXiv:2403.03218
333
citations
#199

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

Yiwen Chen, Zilong Chen, Chi Zhang et al.

CVPR 2024arXiv:2311.14521
333
citations
#200

Improved Techniques for Training Consistency Models

Yang Song, Prafulla Dhariwal

ICLR 2024arXiv:2310.14189
332
citations
PreviousNext