Most Cited 2024 Poster "feature discriminability" Papers

12,324 papers found • Page 1 of 62

Filters:Most Cited 2024 poster feature discriminability Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

Improved Baselines with Visual Instruction Tuning

Haotian Liu, Chunyuan Li, Yuheng Li et al.

CVPR 2024highlightarXiv:2310.03744

4359

citations

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Dustin Podell, Zion English, Kyle Lacey et al.

ICLR 2024spotlightarXiv:2307.01952

3991

citations

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Shilong Liu, Zhaoyang Zeng, Tianhe Ren et al.

ECCV 2024arXiv:2303.05499

3440

citations

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao

ECCV 2024arXiv:2402.13616

3033

citations

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann et al.

ICML 2024arXiv:2403.03206

2965

citations

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

Deyao Zhu, jun chen, Xiaoqian Shen et al.

ICLR 2024arXiv:2304.10592

2806

citations

DETRs Beat YOLOs on Real-time Object Detection

Yian Zhao, Wenyu Lv, Shangliang Xu et al.

CVPR 2024arXiv:2304.08069

2565

citations

Let's Verify Step by Step

Hunter Lightman, Vineet Kosaraju, Yuri Burda et al.

ICLR 2024arXiv:2305.20050

2488

citations

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Zhe Chen, Jiannan Wu, Wenhai Wang et al.

CVPR 2024arXiv:2312.14238

2295

citations

#10

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Tri Dao

ICLR 2024arXiv:2307.08691

2224

citations

#11

MMBENCH: Is Your Multi-Modal Model an All-around Player?

Yuan Liu, Haodong Duan, Yuanhan Zhang et al.

ECCV 2024arXiv:2307.06281

1745

citations

#12

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Xiang Yue, Yuansheng Ni, Kai Zhang et al.

CVPR 2024arXiv:2311.16502

1715

citations

#13

SWE-bench: Can Language Models Resolve Real-world Github Issues?

Carlos E Jimenez, John Yang, Alexander Wettig et al.

ICLR 2024arXiv:2310.06770

1485

citations

#14

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Lihe Yang, Bingyi Kang, Zilong Huang et al.

CVPR 2024arXiv:2401.10891

1479

citations

#15

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion

Chong Mou, Xintao Wang, Liangbin Xie et al.

AAAI 2024paperarXiv:2302.08453

1460

citations

#16

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Lianghui Zhu, Bencheng Liao, Qian Zhang et al.

ICML 2024arXiv:2401.09417

1457

citations

#17

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Akari Asai, Zeqiu Wu, Yizhong Wang et al.

ICLR 2024arXiv:2310.11511

1435

citations

#18

Efficient Streaming Language Models with Attention Sinks

Guangxuan Xiao, Yuandong Tian, Beidi Chen et al.

ICLR 2024arXiv:2309.17453

1396

citations

#19

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen et al.

ICLR 2024arXiv:2308.00352

1367

citations

#20

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Xin Li, Jing Yu Koh, Alexander Ku et al.

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

Yong Liu, Tengge Hu, Haoran Zhang et al.

ICLR 2024oralarXiv:2310.06625

1356

citations

#22

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

Yuwei GUO, Ceyuan Yang, Anyi Rao et al.

ICLR 2024oralarXiv:2307.04725

1330

citations

#23

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Yilun Du, Shuang Li, Antonio Torralba et al.

ICML 2024arXiv:2305.14325

1274

citations

#24

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Pan Lu, Hritik Bansal, Tony Xia et al.

ICLR 2024arXiv:2310.02255

1235

citations

#25

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Yujia Qin, Shihao Liang, Yining Ye et al.

ICLR 2024spotlightarXiv:2307.16789

1197

citations

#26

WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions

Can Xu, Qingfeng Sun, Kai Zheng et al.

ICLR 2024arXiv:2304.12244

1162

citations

#27

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Tri Dao, Albert Gu

ICML 2024arXiv:2405.21060

1146

citations

#28

Graph of Thoughts: Solving Elaborate Problems with Large Language Models

Maciej Besta, Nils Blach, Ales Kubicek et al.

AAAI 2024paperarXiv:2308.09687

1116

citations

#29

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Guanjun Wu, Taoran Yi, Jiemin Fang et al.

CVPR 2024arXiv:2310.08528

1110

citations

#30

VBench: Comprehensive Benchmark Suite for Video Generative Models

Ziqi Huang, Yinan He, Jiashuo Yu et al.

CVPR 2024highlightarXiv:2311.17982

1072

citations

#31

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

Weihao Yu, Zhengyuan Yang, Linjie Li et al.

ICML 2024arXiv:2308.02490

1066

citations

#32

Grounding Multimodal Large Language Models to the World

Zhiliang Peng, Wenhui Wang, Li Dong et al.

ICLR 2024arXiv:2306.14824

1059

citations

#33

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng et al.

ICML 2024arXiv:2403.04132

1026

citations

#34

DUSt3R: Geometric 3D Vision Made Easy

Shuzhe Wang, Vincent Leroy, Yohann Cabon et al.

CVPR 2024arXiv:2312.14132

1005

citations

#35

A Generalist Agent

Jackie Kay, Sergio Gómez Colmenarejo, Mahyar Bordbar et al.

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Lin Chen, Jinsong Li, Xiaoyi Dong et al.

ECCV 2024arXiv:2311.12793

970

citations

#37

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Xiangyu Qi, Yi Zeng, Tinghao Xie et al.

ICLR 2024arXiv:2310.03693

966

citations

#38

Teaching Large Language Models to Self-Debug

Xinyun Chen, Maxwell Lin, Nathanael Schaerli et al.

ICLR 2024arXiv:2304.05128

959

citations

#39

WebArena: A Realistic Web Environment for Building Autonomous Agents

Shuyan Zhou, Frank F Xu, Hao Zhu et al.

ICLR 2024arXiv:2307.13854

916

citations

#40

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

Kunchang Li, Yali Wang, Yinan He et al.

CVPR 2024highlightarXiv:2311.17005

902

citations

#41

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Jiaxiang Tang, Jiawei Ren, Hang Zhou et al.

ICLR 2024arXiv:2309.16653

884

citations

#42

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Ziyang Luo, Can Xu, Pu Zhao et al.

ICLR 2024arXiv:2306.08568

881

citations

#43

MVDream: Multi-view Diffusion for 3D Generation

Yichun Shi, Peng Wang, Jianglong Ye et al.

ICLR 2024arXiv:2308.16512

880

citations

#44

Model Alignment as Prospect Theoretic Optimization

Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff et al.

ICML 2024spotlightarXiv:2402.01306

871

citations

#45

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Mantas Mazeika, Long Phan, Xuwang Yin et al.

ICML 2024arXiv:2402.04249

802

citations

#46

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

Chi-Min Chan, Weize Chen, Yusheng Su et al.

ICLR 2024arXiv:2308.07201

766

citations

#47

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

Ming Jin, Shiyu Wang, Lintao Ma et al.

ICLR 2024arXiv:2310.01728

765

citations

#48

LISA: Reasoning Segmentation via Large Language Model

Xin Lai, Zhuotao Tian, Yukang Chen et al.

CVPR 2024arXiv:2308.00692

742

citations

#49

Large Language Models Cannot Self-Correct Reasoning Yet

Jie Huang, Xinyun Chen, Swaroop Mishra et al.

ICLR 2024arXiv:2310.01798

738

citations

#50

NExT-GPT: Any-to-Any Multimodal LLM

Shengqiong Wu, Hao Fei, Leigang Qu et al.

ICML 2024arXiv:2309.05519

726

citations

#51

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

Miao Xiong, Zhiyuan Hu, Xinyang Lu et al.

ICLR 2024arXiv:2306.13063

715

citations

#52

LRM: Large Reconstruction Model for Single Image to 3D

Yicong Hong, Kai Zhang, Jiuxiang Gu et al.

ICLR 2024arXiv:2311.04400

711

citations

#53

Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction

Ziyi Yang, Xinyu Gao, Wen Zhou et al.

CVPR 2024arXiv:2309.13101

710

citations

#54

DoRA: Weight-Decomposed Low-Rank Adaptation

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin et al.

ICML 2024arXiv:2402.09353

706

citations

#55

VILA: On Pre-training for Visual Language Models

Ji Lin, Danny Yin, Wei Ping et al.

CVPR 2024arXiv:2312.07533

701

citations

#56

A Simple and Effective Pruning Approach for Large Language Models

Mingjie Sun, Zhuang Liu, Anna Bair et al.

ICLR 2024arXiv:2306.11695

700

citations

#57

Training Diffusion Models with Reinforcement Learning

Kevin Black, Michael Janner, Yilun Du et al.

ICLR 2024arXiv:2305.13301

691

citations

#58

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Li Hu

CVPR 2024arXiv:2311.17117

684

citations

#59

Large Language Models as Optimizers

Chengrun Yang, Xuezhi Wang, Yifeng Lu et al.

ICLR 2024arXiv:2309.03409

683

citations

#60

YOLO-World: Real-Time Open-Vocabulary Object Detection

Tianheng Cheng, Lin Song, Yixiao Ge et al.

CVPR 2024arXiv:2401.17270

682

citations

#61

Wonder3D: Single Image to 3D using Cross-Domain Diffusion

Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin et al.

CVPR 2024highlightarXiv:2310.15008

672

citations

#62

SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering

Antoine Guédon, Vincent Lepetit

CVPR 2024arXiv:2311.12775

654

citations

#63

Vision Transformers Need Registers

Timothée Darcet, Maxime Oquab, Julien Mairal et al.

ICLR 2024arXiv:2309.16588

649

citations

#64

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen et al.

ECCV 2024arXiv:2402.05054

639

citations

#65

CogAgent: A Visual Language Model for GUI Agents

Wenyi Hong, Weihan Wang, Qingsong Lv et al.

CVPR 2024highlightarXiv:2312.08914

629

citations

#66

SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

Yuan Liu, Cheng Lin, Zijiao Zeng et al.

ICLR 2024spotlightarXiv:2309.03453

629

citations

#67

Adversarial Diffusion Distillation

Axel Sauer, Dominik Lorenz, Andreas Blattmann et al.

ECCV 2024arXiv:2311.17042

629

citations

#68

Mip-Splatting: Alias-free 3D Gaussian Splatting

Zehao Yu, Anpei Chen, Binbin Huang et al.

CVPR 2024arXiv:2311.16493

627

citations

#69

CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing

Zhibin Gou, Zhihong Shao, Yeyun Gong et al.

ICLR 2024arXiv:2305.11738

621

citations

#70

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Tao Lu, Mulin Yu, Linning Xu et al.

CVPR 2024highlightarXiv:2312.00109

620

citations

#71

MusicRL: Aligning Music Generation to Human Preferences

Geoffrey Cideron, Sertan Girgin, Mauro Verzetti et al.

ICML 2024arXiv:2301.11325

616

citations

#72

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Qinghao Ye, Haiyang Xu, Jiabo Ye et al.

CVPR 2024highlightarXiv:2311.04257

614

citations

#73

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

Xiaogeng Liu, Nan Xu, Muhao Chen et al.

ICLR 2024arXiv:2310.04451

604

citations

#74

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

Shengbang Tong, Zhuang Liu, Yuexiang Zhai et al.

CVPR 2024arXiv:2401.06209

593

citations

#75

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

Melanie Sclar, Yejin Choi, Yulia Tsvetkov et al.

ICLR 2024arXiv:2310.11324

581

citations

#76

One-step Diffusion with Distribution Matching Distillation

Tianwei Yin, Michaël Gharbi, Richard Zhang et al.

CVPR 2024arXiv:2311.18828

579

citations

#77

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

Boyuan Chen, Zhuo Xu, Sean Kirmani et al.

CVPR 2024arXiv:2401.12168

578

citations

#78

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Longhui Yu, Weisen JIANG, Han Shi et al.

ICLR 2024spotlightarXiv:2309.12284

578

citations

#79

Safe RLHF: Safe Reinforcement Learning from Human Feedback

Juntao Dai, Xuehai Pan, Ruiyang Sun et al.

ICLR 2024spotlightarXiv:2310.12773

567

citations

#80

Diffusion Model Alignment Using Direct Preference Optimization

Bram Wallace, Meihua Dang, Rafael Rafailov et al.

CVPR 2024arXiv:2311.12908

561

citations

#81

MambaIR: A Simple Baseline for Image Restoration with State-Space Model

Hang Guo, Jinmin Li, Tao Dai et al.

ECCV 2024arXiv:2402.15648

560

citations

#82

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Tianle Cai, Yuhong Li, Zhengyang Geng et al.

ICML 2024arXiv:2401.10774

549

citations

#83

Language Model Beats Diffusion - Tokenizer is key to visual generation

Lijun Yu, José Lezama, Nitesh Bharadwaj Gundavarapu et al.

ICLR 2024arXiv:2310.05737

548

citations

#84

AgentBench: Evaluating LLMs as Agents

Xiao Liu, Hao Yu, Hanchen Zhang et al.

ICLR 2024arXiv:2308.03688

543

citations

#85

Grounding Image Matching in 3D with MASt3R

Vincent Leroy, Yohann Cabon, Jerome Revaud

ECCV 2024arXiv:2406.09756

541

citations

#86

GAIA: a benchmark for General AI Assistants

Grégoire Mialon, Clémentine Fourrier, Thomas Wolf et al.

ICLR 2024arXiv:2311.12983

531

citations

#87

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

Le Yu, Bowen Yu, Haiyang Yu et al.

ICML 2024arXiv:2311.03099

531

citations

#88

RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Harrison Lee, Samrat Phatale, Hassan Mansoor et al.

ICML 2024arXiv:2309.00267

527

citations

#89

Towards Understanding Sycophancy in Language Models

Mrinank Sharma, Meg Tong, Tomek Korbak et al.

ICLR 2024arXiv:2310.13548

526

citations

#90

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

Xiang Yue, Xingwei Qu, Ge Zhang et al.

ICLR 2024spotlightarXiv:2309.05653

522

citations

#91

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

David Charatan, Sizhe Lester Li, Andrea Tagliasacchi et al.

CVPR 2024arXiv:2312.12337

516

citations

#92

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Haoxin Chen, Yong Zhang, Xiaodong Cun et al.

CVPR 2024arXiv:2401.09047

512

citations

#93

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

Weize Chen, Yusheng Su, Jingwei Zuo et al.

ICLR 2024arXiv:2308.10848

503

citations

#94

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

Yanwei Li, Chengyao Wang, Jiaya Jia

ECCV 2024arXiv:2311.17043

499

citations

#95

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Renrui Zhang, Dongzhi Jiang, Yichi Zhang et al.

ECCV 2024arXiv:2403.14624

498

citations

#96

SplaTAM: Splat Track & Map 3D Gaussians for Dense RGB-D SLAM

Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula et al.

CVPR 2024arXiv:2312.02126

497

citations

#97

Self-Rewarding Language Models

Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho et al.

ICML 2024arXiv:2401.10020

497

citations

#98

A decoder-only foundation model for time-series forecasting

Abhimanyu Das, Weihao Kong, Rajat Sen et al.

ICML 2024oralarXiv:2310.10688

495

citations

#99

Patches Are All You Need?

Asher Trockman, J Kolter

ICLR 2024arXiv:2201.09792

494

citations

#100

Eureka: Human-Level Reward Design via Coding Large Language Models

Yecheng Jason Ma, William Liang, Guanzhi Wang et al.

ICLR 2024arXiv:2310.12931

491

citations

#101

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Sicong Leng, Hang Zhang, Guanzheng Chen et al.

CVPR 2024highlightarXiv:2311.16922

487

citations

#102

RepViT: Revisiting Mobile CNN From ViT Perspective

Ao Wang, Hui Chen, Zijia Lin et al.

CVPR 2024arXiv:2307.09283

481

citations

#103

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Zixiang Chen, Yihe Deng, Huizhuo Yuan et al.

ICML 2024arXiv:2401.01335

480

citations

#104

Benchmarking Large Language Models in Retrieval-Augmented Generation

Jiawei Chen, Hongyu Lin, Xianpei Han et al.

AAAI 2024paperarXiv:2309.01431

475

citations

#105

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Enxin Song, Wenhao Chai, Guanhong Wang et al.

CVPR 2024arXiv:2307.16449

471

citations

#106

CoTracker: It is Better to Track Together

Nikita Karaev, Ignacio Rocco, Ben Graham et al.

ECCV 2024arXiv:2307.07635

466

citations

#107

Gaussian Splatting SLAM

Hidenobu Matsuki, Riku Murai, Paul Kelly et al.

CVPR 2024highlightarXiv:2312.06741

462

citations

#108

SALMONN: Towards Generic Hearing Abilities for Large Language Models

Changli Tang, Wenyi Yu, Guangzhi Sun et al.

ICLR 2024arXiv:2310.13289

462

citations

#109

Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting

Zeyu Yang, Hongye Yang, Zijie Pan et al.

ICLR 2024oralarXiv:2310.10642

460

citations

#110

Ferret: Refer and Ground Anything Anywhere at Any Granularity

Haoxuan You, Haotian Zhang, Zhe Gan et al.

ICLR 2024spotlightarXiv:2310.07704

457

citations

#111

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

Nanye Ma, Mark Goldstein, Michael Albergo et al.

ECCV 2024arXiv:2401.08740

448

citations

#112

YaRN: Efficient Context Window Extension of Large Language Models

Bowen Peng, Jeffrey Quesnelle, Honglu Fan et al.

ICLR 2024arXiv:2309.00071

440

citations

#113

Generative Multimodal Models are In-Context Learners

Quan Sun, Yufeng Cui, Xiaosong Zhang et al.

CVPR 2024arXiv:2312.13286

438

citations

#114

TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting

Shiyu Wang, Haixu Wu, Xiaoming Shi et al.

ICLR 2024oralarXiv:2405.14616

438

citations

#115

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Bowen Wen, Wei Yang, Jan Kautz et al.

CVPR 2024highlightarXiv:2312.08344

435

citations

#116

MobileNetV4: Universal Models for the Mobile Ecosystem

Danfeng Qin, Chas Leichner, Manolis Delakis et al.

ECCV 2024arXiv:2404.10518

434

citations

#117

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

Yangsibo Huang, Samyak Gupta, Mengzhou Xia et al.

ICLR 2024spotlightarXiv:2310.06987

430

citations

#118

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

Guocheng Qian, Jinjie Mai, Abdullah Hamdi et al.

ICLR 2024arXiv:2306.17843

429

citations

#119

Unified Training of Universal Time Series Forecasting Transformers

Gerald Woo, Chenghao Liu, Akshat Kumar et al.

ICML 2024arXiv:2402.02592

428

citations

#120

Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng et al.

ICLR 2024arXiv:2310.06694

426

citations

#121

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Jinbo Xing, Menghan Xia, Yong Zhang et al.

ECCV 2024arXiv:2310.12190

424

citations

#122

GPT-4V(ision) is a Generalist Web Agent, if Grounded

Boyuan Zheng, Boyu Gou, Jihyung Kil et al.

ICML 2024arXiv:2401.01614

424

citations

#123

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Fuxiao Liu, Kevin Lin, Linjie Li et al.

ICLR 2024arXiv:2306.14565

422

citations

#124

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Dan Kondratyuk, Lijun Yu, Xiuye Gu et al.

ICML 2024arXiv:2312.14125

420

citations

#125

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

Yi Wang, Yinan He, Yizhuo Li et al.

ICLR 2024spotlightarXiv:2307.06942

419

citations

#126

Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning

Linhao Luo, Yuan-Fang Li, Reza Haffari et al.

ICLR 2024arXiv:2310.01061

415

citations

#127

AnyDoor: Zero-shot Object-level Image Customization

Xi Chen, Lianghua Huang, Yu Liu et al.

CVPR 2024arXiv:2307.09481

415

citations

#128

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Haoran Xu, Amr Sharaf, Yunmo Chen et al.

ICML 2024arXiv:2401.08417

414

citations

#129

WildChat: 1M ChatGPT Interaction Logs in the Wild

Wenting Zhao, Xiang Ren, Jack Hessel et al.

ICLR 2024oralarXiv:2405.01470

411

citations

#130

GLaMM: Pixel Grounding Large Multimodal Model

Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly et al.

CVPR 2024arXiv:2311.03356

411

citations

#131

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Bin Xiao, Haiping Wu, Weijian Xu et al.

CVPR 2024arXiv:2311.06242

409

citations

#132

VideoMamba: State Space Model for Efficient Video Understanding

Kunchang Li, Xinhao Li, Yi Wang et al.

ECCV 2024arXiv:2403.06977

407

citations

#133

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Collin Burns, Pavel Izmailov, Jan Kirchner et al.

ICML 2024arXiv:2312.09390

406

citations

#134

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

Youliang Yuan, Wenxiang Jiao, Wenxuan Wang et al.

ICLR 2024arXiv:2308.06463

403

citations

#135

Llemma: An Open Language Model for Mathematics

Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster et al.

ICLR 2024arXiv:2310.10631

402

citations

#136

LESS: Selecting Influential Data for Targeted Instruction Tuning

Mengzhou Xia, Sadhika Malladi, Suchin Gururangan et al.

ICML 2024arXiv:2402.04333

400

citations

#137

Universal Guidance for Diffusion Models

Arpit Bansal, Hong-Min Chu, Avi Schwarzschild et al.

ICLR 2024arXiv:2302.07121

399

citations

#138

Genie: Generative Interactive Environments

Jake Bruce, Michael Dennis, Ashley Edwards et al.

ICML 2024oralarXiv:2402.15391

397

citations

#139

Prometheus: Inducing Fine-Grained Evaluation Capability in Language Models

Seungone Kim, Jamin Shin, yejin cho et al.

ICLR 2024arXiv:2310.08491

396

citations

#140

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

Haoning Wu, Zicheng Zhang, Weixia Zhang et al.

ICML 2024arXiv:2312.17090

393

citations

#141

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

Tianrui Guan, Fuxiao Liu, Xiyang Wu et al.

CVPR 2024arXiv:2310.14566

392

citations

#142

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Zhang Li, Biao Yang, Qiang Liu et al.

CVPR 2024highlightarXiv:2311.06607

392

citations

#143

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

Suyu Ge, Yunan Zhang, Liyuan Liu et al.

ICLR 2024arXiv:2310.01801

390

citations

#144

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

Michal Geyer, Omer Bar Tal, Shai Bagon et al.

ICLR 2024arXiv:2307.10373

389

citations

#145

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Qidong Huang, Xiaoyi Dong, Pan Zhang et al.

CVPR 2024highlightarXiv:2311.17911

385

citations

#146

Large Language Models Are Not Robust Multiple Choice Selectors

Chujie Zheng, Hao Zhou, Fandong Meng et al.

ICLR 2024oralarXiv:2309.03882

383

citations

#147

Instant3D: Fast Text-to-3D with Sparse-view Generation and Large Reconstruction Model

Jiahao Li, Hao Tan, Kai Zhang et al.

ICLR 2024arXiv:2311.06214

381

citations

#148

Finite Scalar Quantization: VQ-VAE Made Simple

Fabian Mentzer, David Minnen, Eirikur Agustsson et al.

ICLR 2024arXiv:2309.15505

379

citations

#149

How Language Model Hallucinations Can Snowball

Muru Zhang, Ofir Press, William Merrill et al.

ICML 2024arXiv:2305.13534

378

citations

#150

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

Jing Shi, Wei Xiong, Zhe Lin et al.

CVPR 2024arXiv:2304.03411

377

citations

#151

ExpeL: LLM Agents Are Experiential Learners

Andrew Zhao, Daniel Huang, Quentin Xu et al.

AAAI 2024paperarXiv:2308.10144

376

citations

#152

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

Chi Yan, Delin Qu, Dong Wang et al.

CVPR 2024highlightarXiv:2311.11700

376

citations

#153

DriveLM: Driving with Graph Visual Question Answering

Chonghao Sima, Katrin Renz, Kashyap Chitta et al.

ECCV 2024arXiv:2312.14150

376

citations

#154

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Yuedong Chen, Haofei Xu, Chuanxia Zheng et al.

ECCV 2024arXiv:2403.14627

374

citations

#155

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Shuhuai Ren, Linli Yao, Shicheng Li et al.

CVPR 2024arXiv:2312.02051

372

citations

#156

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Jiawei Zhao, Zhenyu Zhang, Beidi Chen et al.

ICML 2024arXiv:2403.03507

371

citations

#157

LangSplat: 3D Language Gaussian Splatting

Minghan Qin, Wanhua Li, Jiawei ZHOU et al.

CVPR 2024highlightarXiv:2312.16084

368

citations

#158

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Zirui Liu, Jiayi Yuan, Hongye Jin et al.

ICML 2024arXiv:2402.02750

368

citations

#159

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Liang Chen, Haozhe Zhao, Tianyu Liu et al.

ECCV 2024arXiv:2403.06764

368

citations

#160

Compact 3D Gaussian Representation for Radiance Field

Joo Chan Lee, Daniel Rho, Xiangyu Sun et al.

CVPR 2024highlightarXiv:2311.13681

366

citations

#161

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Peng Jin, Ryuichi Takanobu, Cai Zhang et al.

CVPR 2024highlightarXiv:2311.08046

364

citations

#162

Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution

Chrisantha Fernando, Dylan Banarse, Henryk Michalewski et al.

ICML 2024arXiv:2309.16797

364

citations

#163

The Linear Representation Hypothesis and the Geometry of Large Language Models

Kiho Park, Yo Joong Choe, Victor Veitch

ICML 2024arXiv:2311.03658

363

citations

#164

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Tianyu Yu, Yuan Yao, Haoye Zhang et al.

CVPR 2024arXiv:2312.00849

361

citations

#165

Evaluating Text-to-Visual Generation with Image-to-Text Generation

Zhiqiu Lin, Deepak Pathak, Baiqi Li et al.

ECCV 2024arXiv:2404.01291

357

citations

#166

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Bin Zhu, Bin Lin, Munan Ning et al.

ICLR 2024arXiv:2310.01852

357

citations

#167

Effective Data Augmentation With Diffusion Models

Brandon Trabucco, Kyle Doherty, Max Gurinas et al.

ICLR 2024arXiv:2302.07944

356

citations

#168

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan et al.

CVPR 2024arXiv:2312.07920

355

citations

#169

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou, Chenlin Meng, Stefano Ermon

ICML 2024arXiv:2310.16834

354

citations

#170

MOMENT: A Family of Open Time-series Foundation Models

Mononito Goswami, Konrad Szafer, Arjun Choudhry et al.

ICML 2024arXiv:2402.03885

354

citations

#171

Analyzing and Improving the Training Dynamics of Diffusion Models

Tero Karras, Miika Aittala, Jaakko Lehtinen et al.

CVPR 2024arXiv:2312.02696

353

citations

#172

Rewrite the Stars

Xu Ma, Xiyang Dai, Yue Bai et al.

CVPR 2024arXiv:2403.19967

352

citations

#173

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng et al.

ICLR 2024spotlightarXiv:2309.11998

352

citations

#174

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.

CVPR 2024arXiv:2402.19479

351

citations

#175

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian et al.

ICLR 2024arXiv:2306.03078

350

citations

#176

Learning Interactive Real-World Simulators

Sherry Yang, Yilun Du, Seyed Ghasemipour et al.

ICLR 2024arXiv:2310.06114

350

citations

#177

Wavelet Convolutions for Large Receptive Fields

Shahaf Finder, Roy Amoyal, Eran Treister et al.

ECCV 2024arXiv:2407.05848

348

citations

#178

V?: Guided Visual Search as a Core Mechanism in Multimodal LLMs

Penghao Wu, Saining Xie

CVPR 2024arXiv:2312.14135

345

citations

#179

Executable Code Actions Elicit Better LLM Agents

Xingyao Wang, Yangyi Chen, Lifan Yuan et al.

ICML 2024arXiv:2402.01030

344

citations

#180

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

Kai Shen, Zeqian Ju, Xu Tan et al.

ICLR 2024spotlightarXiv:2304.09116

344

citations

#181

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

Mingqiao Ye, Martin Danelljan, Fisher Yu et al.

ECCV 2024arXiv:2312.00732

344

citations

#182

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Kristen Grauman, Andrew Westbury, Lorenzo Torresani et al.

CVPR 2024arXiv:2311.18259

343

citations

#183

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang et al.

ICLR 2024spotlightarXiv:2308.13137

341

citations

#184

LoRA+: Efficient Low Rank Adaptation of Large Models

Soufiane Hayou, Nikhil Ghosh, Bin Yu

ICML 2024arXiv:2402.12354

341

citations

#185

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Yuhui Li, Fangyun Wei, Chao Zhang et al.

ICML 2024arXiv:2401.15077

338

citations

#186

Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions

Federico Bianchi, Mirac Suzgun, Giuseppe Attanasio et al.

ICLR 2024arXiv:2309.07875

338

citations

#187

DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genomes

Zhihan Zhou, Yanrong Ji, Weijian Li et al.

ICLR 2024arXiv:2306.15006

338

citations

#188

Preference Ranking Optimization for Human Alignment

Feifan Song, Bowen Yu, Minghao Li et al.

AAAI 2024paperarXiv:2306.17492

337

citations

#189

Poly Kernel Inception Network for Remote Sensing Detection

Xinhao Cai, Qiuxia Lai, Yuwei Wang et al.

CVPR 2024arXiv:2403.06258

337

citations

#190

What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning

Wei Liu, Weihao Zeng, Keqing He et al.

ICLR 2024arXiv:2312.15685

337

citations

#191

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

Yidong Wang, Zhuohao Yu, Wenjin Yao et al.

ICLR 2024arXiv:2306.05087

336

citations

#192

ControlVideo: Training-free Controllable Text-to-video Generation

Yabo Zhang, Yuxiang Wei, Dongsheng jiang et al.

ICLR 2024arXiv:2305.13077

335

citations

#193

Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

Shijie Zhou, Haoran Chang, Sicheng Jiang et al.

CVPR 2024highlightarXiv:2312.03203

335

citations

#194

BLINK: Multimodal Large Language Models Can See but Not Perceive

Xingyu Fu, Yushi Hu, Bangzheng Li et al.

ECCV 2024arXiv:2404.12390

333

citations

#195

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Parth Sarthi, Salman Abdullah, Aditi Tuli et al.

ICLR 2024arXiv:2401.18059

333

citations

#196

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

Dongjun Kim, Chieh-Hsin Lai, WeiHsiang Liao et al.

ICLR 2024arXiv:2310.02279

333

citations

#197

Text-to-3D using Gaussian Splatting

Zilong Chen, Feng Wang, Yikai Wang et al.

CVPR 2024arXiv:2309.16585

333

citations

#198

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning

Nathaniel Li, Alexander Pan, Anjali Gopal et al.

ICML 2024arXiv:2403.03218

333

citations

#199

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

Yiwen Chen, Zilong Chen, Chi Zhang et al.

CVPR 2024arXiv:2311.14521

333

citations

#200

Improved Techniques for Training Consistency Models

Yang Song, Prafulla Dhariwal

ICLR 2024arXiv:2310.14189

332

citations

← Previous

1 2 3...62