"vanishing gradients" Papers
3 papers found
Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks
Dongyoung Lim, Sotirios Sabanis
ICML 2024poster
Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models
Akhil Kedia, Mohd Abbas Zaidi, Sushil Khyalia et al.
ICML 2024poster
Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
Antonio Orvieto, Soham De, Caglar Gulcehre et al.
ICML 2024poster