ICLR "speculative decoding" Papers
5 papers found
Block Verification Accelerates Speculative Decoding
Ziteng Sun, Uri Mendlovic, Yaniv Leviathan et al.
ICLR 2025posterarXiv:2403.10444
18
citations
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Ranajoy Sadhukhan, Jian Chen, Zhuoming Chen et al.
ICLR 2025posterarXiv:2408.11049
61
citations
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Zilong (Ryan) Wang, Zifeng Wang, Long Le et al.
ICLR 2025posterarXiv:2407.08223
75
citations
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia, Yongqi Li, Jun Zhang et al.
ICLR 2025posterarXiv:2410.06916
39
citations
Towards Optimal Multi-draft Speculative Decoding
Zhengmian Hu, Tong Zheng, Vignesh Viswanathan et al.
ICLR 2025posterarXiv:2502.18779
11
citations