"speculative decoding" Papers
17 papers found
Approximately Aligned Decoding
Daniel Melcer, Sujan Kumar Gonugondla, Pramuditha Perera et al.
Block Verification Accelerates Speculative Decoding
Ziteng Sun, Uri Mendlovic, Yaniv Leviathan et al.
EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization
Yize Wu, KE GAO, Ling Li et al.
Grouped Speculative Decoding for Autoregressive Image Generation
Junhyuk So, Juncheol Shin, Hyunho Kook et al.
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Ranajoy Sadhukhan, Jian Chen, Zhuoming Chen et al.
SpecEM: Training-Free LLM Ensembling via Iterative Drafting, Verification, and Online Feedback
Bo Lv, Nayu Liu, Chen Tang et al.
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
Rui Pan, Yinwei Dai, Zhihao Zhang et al.
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Zilong (Ryan) Wang, Zifeng Wang, Long Le et al.
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia, Yongqi Li, Jun Zhang et al.
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
Jialiang Kang, Han Shu, Wenshuo Li et al.
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu, Peter Bailis, Ion Stoica et al.
GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding
Cunxiao Du, Jing Jiang, Xu Yuanchen et al.
Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation
Luca Beurer-Kellner, Marc Fischer, Martin Vechev
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai, Yuhong Li, Zhengyang Geng et al.
Online Speculative Decoding
Xiaoxuan Liu, Lanxiang Hu, Peter Bailis et al.
Tandem Transformers for Inference Efficient LLMs
Aishwarya P S, Pranav Nair, Yashas Samaga et al.
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You, Yichao Fu, Zheng Wang et al.