Poster "distributed llm serving" Papers
2 papers found
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
Vikranth Srivatsa, Zijian He, Reyna Abhyankar et al.
ICLR 2025posterarXiv:2407.00023
41
citations
DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving
Foteini Strati, Sara McAllister, Amar Phanishayee et al.
ICML 2024posterarXiv:2403.01876