Poster "kv state caching" Papers
2 papers found
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
Xinyu Yang, Tianqi Chen, Beidi Chen
ICLR 2025posterarXiv:2502.05431
16
citations
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
Vikranth Srivatsa, Zijian He, Reyna Abhyankar et al.
ICLR 2025posterarXiv:2407.00023
41
citations