ICML 2024 "speculative decoding" Papers

7 papers found

Filters:ICML 2024 speculative decoding Clear all

Conference

AAAI 2025 (3,028)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NeurIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,140)oral (1,594)spotlight (1,421)highlight (975)

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Yichao Fu, Peter Bailis, Ion Stoica et al.

ICML 2024posterarXiv:2402.02057

GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding

Cunxiao Du, Jing Jiang, Xu Yuanchen et al.

ICML 2024posterarXiv:2402.02082

Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

Luca Beurer-Kellner, Marc Fischer, Martin Vechev

ICML 2024posterarXiv:2403.06988

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Tianle Cai, Yuhong Li, Zhengyang Geng et al.

ICML 2024posterarXiv:2401.10774

Online Speculative Decoding

Xiaoxuan Liu, Lanxiang Hu, Peter Bailis et al.

ICML 2024posterarXiv:2310.07177

Tandem Transformers for Inference Efficient LLMs

Aishwarya P S, Pranav Nair, Yashas Samaga et al.

ICML 2024posterarXiv:2402.08644

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

Haoran You, Yichao Fu, Zheng Wang et al.

ICML 2024posterarXiv:2406.07368