2024 "llm inference efficiency" Papers
3 papers found
CLLMs: Consistency Large Language Models
Siqi Kou, Lanxiang Hu, Zhezhi He et al.
ICML 2024posterarXiv:2403.00835
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
Harry Dong, Xinyu Yang, Zhenyu Zhang et al.
ICML 2024posterarXiv:2402.09398
Online Cascade Learning for Efficient Inference over Streams
Lunyiu Nie, Zhimin Ding, Erdong Hu et al.
ICML 2024posterarXiv:2402.04513