"sigmoid attention" Papers
2 papers found
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Jason Ramapuram, Federico Danieli, Eeshan Gunesh Dhekane et al.
ICLR 2025posterarXiv:2409.04431
34
citations
When Attention Sink Emerges in Language Models: An Empirical View
Xiangming Gu, Tianyu Pang, Chao Du et al.
ICLR 2025posterarXiv:2410.10781
90
citations