Poster "llm inference optimization" Papers
4 papers found
DynaPipe: Dynamic Layer Redistribution for Efficient Serving of LLMs with Pipeline Parallelism
HongXin Xu, Tianyu Guo, Xianwei Zhang
NeurIPS 2025poster
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
Junyoung Park, Dalton Jones, Matthew Morse et al.
NeurIPS 2025posterarXiv:2504.15364
11
citations
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu, Peter Bailis, Ion Stoica et al.
ICML 2024poster
CaM: Cache Merging for Memory-efficient LLMs Inference
Yuxin Zhang, Yuxuan Du, Gen Luo et al.
ICML 2024poster