NEURIPS Poster "kv cache reduction" Papers
2 papers found
Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads
Zhoutong Wu, Yuan Zhang, Yiming Dong et al.
NEURIPS 2025posterarXiv:2510.16807
Zebra-Llama: Towards Extremely Efficient Hybrid Models
Mingyu Yang, Mehdi Rezagholizadeh, Guihong Li et al.
NEURIPS 2025posterarXiv:2505.17272
7
citations