"gradient descent" Papers

13 papers found

Learning High-Degree Parities: The Crucial Role of the Initialization

Emmanuel Abbe, Elisabetta Cornacchia, Jan Hązła et al.

ICLR 2025posterarXiv:2412.04910
3
citations

Transformers are almost optimal metalearners for linear classification

Roey Magen, Gal Vardi

NeurIPS 2025posterarXiv:2510.19797
1
citations

Asymptotics of feature learning in two-layer networks after one gradient-step

Hugo Cui, Luca Pesce, Yatin Dandi et al.

ICML 2024spotlight

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

Nikhil Vyas, Depen Morwani, Rosie Zhao et al.

ICML 2024spotlight

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?

Khashayar Gatmiry, Nikunj Saunshi, Sashank J. Reddi et al.

ICML 2024poster

Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point

David Martínez-Rubio, Christophe Roux, Sebastian Pokutta

ICML 2024poster

Differentiability and Optimization of Multiparameter Persistent Homology

Luis Scoccola, Siddharth Setlur, David Loiseaux et al.

ICML 2024poster

Interpreting and Improving Diffusion Models from an Optimization Perspective

Frank Permenter, Chenyang Yuan

ICML 2024poster

Learning Associative Memories with Gradient Descent

Vivien Cabannnes, Berfin Simsek, Alberto Bietti

ICML 2024poster

Non-stationary Online Convex Optimization with Arbitrary Delays

Yuanyu Wan, Chang Yao, Mingli Song et al.

ICML 2024poster

Position: Do pretrained Transformers Learn In-Context by Gradient Descent?

Lingfeng Shen, Aayush Mishra, Daniel Khashabi

ICML 2024poster

Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context

Xiang Cheng, Yuxin Chen, Suvrit Sra

ICML 2024poster

Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot

Zixuan Wang, Stanley Wei, Daniel Hsu et al.

ICML 2024poster