2024 Poster "llm inference acceleration" Papers
2 papers found
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
Minsik Cho, Mohammad Rastegari, Devang Naik
ICML 2024posterarXiv:2405.05329
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai, Yuhong Li, Zhengyang Geng et al.
ICML 2024posterarXiv:2401.10774