"self-attention layer" Papers
4 papers found
Attention layers provably solve single-location regression
Pierre Marion, Raphaël Berthier, Gérard Biau et al.
ICLR 2025posterarXiv:2410.01537
10
citations
Attention Mechanism, Max-Affine Partition, and Universal Approximation
Hude Liu, Jerry Yao-Chieh Hu, Zhao Song et al.
NeurIPS 2025posterarXiv:2504.19901
6
citations
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec, Felix Dangel, Sidak Pal Singh
ICLR 2025posterarXiv:2410.10986
10
citations
SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
Xixu Hu, Runkai Zheng, Jindong Wang et al.
ECCV 2024posterarXiv:2402.03317
5
citations