2025 Poster "llm inference optimization" Papers
3 papers found
DynaPipe: Dynamic Layer Redistribution for Efficient Serving of LLMs with Pipeline Parallelism
HongXin Xu, Tianyu Guo, Xianwei Zhang
NeurIPS 2025poster
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
Junyoung Park, Dalton Jones, Matthew Morse et al.
NeurIPS 2025posterarXiv:2504.15364
11
citations
Learned Prefix Caching for Efficient LLM Inference
Dongsheng Yang, Austin Li, Kai Li et al.
NeurIPS 2025poster