"exploding gradients" Papers
2 papers found
Revisiting Glorot Initialization for Long-Range Linear Recurrences
Noga Bar, Mariia Seleznova, Yotam Alexander et al.
NeurIPS 2025posterarXiv:2505.19827
Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models
Akhil Kedia, Mohd Abbas Zaidi, Sushil Khyalia et al.
ICML 2024posterarXiv:2403.09635