Poster "preconditioned gradient descent" Papers
2 papers found
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions
Yanna Ding, Songtao Lu, Yingdong Lu et al.
NeurIPS 2025posterarXiv:2510.18638
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Khashayar Gatmiry, Nikunj Saunshi, Sashank J. Reddi et al.
ICML 2024poster