ICML 2024 "kv cache compression" Papers
3 papers found
CaM: Cache Merging for Memory-efficient LLMs Inference
Yuxin Zhang, Yuxuan Du, Gen Luo et al.
ICML 2024poster
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
Harry Dong, Xinyu Yang, Zhenyu Zhang et al.
ICML 2024poster
LoCoCo: Dropping In Convolutions for Long Context Compression
Ruisi Cai, Yuandong Tian, Zhangyang “Atlas” Wang et al.
ICML 2024poster