2025 "llm inference acceleration" Papers
3 papers found
EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization
Yize Wu, KE GAO, Ling Li et al.
NEURIPS 2025posterarXiv:2502.02493
1
citations
MUSTAFAR: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
Donghyeon Joo, Helya Hosseini, Ramyad Hadidi et al.
NEURIPS 2025posterarXiv:2505.22913
2
citations
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia, Yongqi Li, Jun Zhang et al.
ICLR 2025posterarXiv:2410.06916
39
citations