"multi-head attention" Papers
5 papers found
On the Optimization and Generalization of Multi-head Attention
Christos Thrampoulidis, Rouzbeh Ghaderi, Hossein Taheri et al.
ICLR 2025posterarXiv:2310.12680
44
citations
SAS: Simulated Attention Score
Chuanyang Zheng, Jiankai Sun, Yihang Gao et al.
NeurIPS 2025posterarXiv:2507.07694
2
citations
CHAI: Clustered Head Attention for Efficient LLM Inference
Saurabh Agarwal, Bilge Acun, Basil Hosmer et al.
ICML 2024poster
Evolving Subnetwork Training for Large Language Models
hanqi li, Lu Chen, Da Ma et al.
ICML 2024poster
Improving Transformers with Dynamically Composable Multi-Head Attention
Da Xiao, Qingye Meng, Shengping Li et al.
ICML 2024poster