NeurIPS 2025 "llm inference optimization" Papers
5 papers found
Accurate KV Cache Eviction via Anchor Direction Projection for Efficient LLM Inference
Zijie Geng, Jie Wang, Ziqi Liu et al.
NeurIPS 2025poster
ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Xinhao Luo, Zihan Liu, Yangjie Zhou et al.
NeurIPS 2025posterarXiv:2508.18850
2
citations
DynaPipe: Dynamic Layer Redistribution for Efficient Serving of LLMs with Pipeline Parallelism
HongXin Xu, Tianyu Guo, Xianwei Zhang
NeurIPS 2025poster
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
Junyoung Park, Dalton Jones, Matthew Morse et al.
NeurIPS 2025posterarXiv:2504.15364
11
citations
Learned Prefix Caching for Efficient LLM Inference
Dongsheng Yang, Austin Li, Kai Li et al.
NeurIPS 2025poster