2024 "multi-head attention" Papers
3 papers found
CHAI: Clustered Head Attention for Efficient LLM Inference
Saurabh Agarwal, Bilge Acun, Basil Hosmer et al.
ICML 2024posterarXiv:2403.08058
Evolving Subnetwork Training for Large Language Models
hanqi li, Lu Chen, Da Ma et al.
ICML 2024posterarXiv:2406.06962
Improving Transformers with Dynamically Composable Multi-Head Attention
Da Xiao, Qingye Meng, Shengping Li et al.
ICML 2024posterarXiv:2405.08553