ICML 2024 "speculative decoding" Papers
7 papers found
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu, Peter Bailis, Ion Stoica et al.
ICML 2024posterarXiv:2402.02057
GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding
Cunxiao Du, Jing Jiang, Xu Yuanchen et al.
ICML 2024posterarXiv:2402.02082
Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation
Luca Beurer-Kellner, Marc Fischer, Martin Vechev
ICML 2024posterarXiv:2403.06988
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai, Yuhong Li, Zhengyang Geng et al.
ICML 2024posterarXiv:2401.10774
Online Speculative Decoding
Xiaoxuan Liu, Lanxiang Hu, Peter Bailis et al.
ICML 2024posterarXiv:2310.07177
Tandem Transformers for Inference Efficient LLMs
Aishwarya P S, Pranav Nair, Yashas Samaga et al.
ICML 2024posterarXiv:2402.08644
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You, Yichao Fu, Zheng Wang et al.
ICML 2024posterarXiv:2406.07368