"llm inference acceleration" Papers
4 papers found
EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization
Yize Wu, KE GAO, Ling Li et al.
NeurIPS 2025posterarXiv:2502.02493
1
citations
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia, Yongqi Li, Jun Zhang et al.
ICLR 2025posterarXiv:2410.06916
39
citations
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
Minsik Cho, Mohammad Rastegari, Devang Naik
ICML 2024poster
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai, Yuhong Li, Zhengyang Geng et al.
ICML 2024poster