Poster "multi-head attention" Papers
5 papers found
On the Optimization and Generalization of Multi-head Attention
Christos Thrampoulidis, Rouzbeh Ghaderi, Hossein Taheri et al.
ICLR 2025posterarXiv:2310.12680
44
citations
SAS: Simulated Attention Score
Chuanyang Zheng, Jiankai Sun, Yihang Gao et al.
NeurIPS 2025posterarXiv:2507.07694
2
citations
CHAI: Clustered Head Attention for Efficient LLM Inference
Saurabh Agarwal, Bilge Acun, Basil Hosmer et al.
ICML 2024posterarXiv:2403.08058
Evolving Subnetwork Training for Large Language Models
hanqi li, Lu Chen, Da Ma et al.
ICML 2024posterarXiv:2406.06962
Improving Transformers with Dynamically Composable Multi-Head Attention
Da Xiao, Qingye Meng, Shengping Li et al.
ICML 2024posterarXiv:2405.08553