2025 "multi-head latent attention" Papers
2 papers found
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts
Neil He, Rishabh Anand, Hiren Madhu et al.
NeurIPS 2025posterarXiv:2505.24722
8
citations
Zebra-Llama: Towards Extremely Efficient Hybrid Models
Mingyu Yang, Mehdi Rezagholizadeh, Guihong Li et al.
NeurIPS 2025posterarXiv:2505.17272
6
citations