ICLR Poster "self-attention layer" Papers
2 papers found
Attention layers provably solve single-location regression
Pierre Marion, Raphaël Berthier, Gérard Biau et al.
ICLR 2025posterarXiv:2410.01537
10
citations
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec, Felix Dangel, Sidak Pal Singh
ICLR 2025posterarXiv:2410.10986
10
citations