"gradient descent" Papers

18 papers found

Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification

Hyunji Jung, Hanseul Cho, Chulhee Yun

ICLR 2025posterarXiv:2504.12712
4
citations

Hamiltonian Descent Algorithms for Optimization: Accelerated Rates via Randomized Integration Time

Qiang Fu, Andre Wibisono

NeurIPS 2025spotlightarXiv:2505.12553
2
citations

Learning High-Degree Parities: The Crucial Role of the Initialization

Emmanuel Abbe, Elisabetta Cornacchia, Jan Hązła et al.

ICLR 2025posterarXiv:2412.04910
3
citations

Simple and Optimal Sublinear Algorithms for Mean Estimation

Beatrice Bertolotti, Matteo Russo, Chris Schwiegelshohn et al.

NeurIPS 2025posterarXiv:2406.05254

Transformer Learns Optimal Variable Selection in Group-Sparse Classification

Chenyang Zhang, Xuran Meng, Yuan Cao

ICLR 2025posterarXiv:2504.08638
4
citations

Transformers are almost optimal metalearners for linear classification

Roey Magen, Gal Vardi

NeurIPS 2025posterarXiv:2510.19797
1
citations

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

Jianhao Huang, Zixuan Wang, Jason Lee

ICLR 2025posterarXiv:2502.21212
18
citations

Asymptotics of feature learning in two-layer networks after one gradient-step

Hugo Cui, Luca Pesce, Yatin Dandi et al.

ICML 2024spotlight

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

Nikhil Vyas, Depen Morwani, Rosie Zhao et al.

ICML 2024spotlight

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?

Khashayar Gatmiry, Nikunj Saunshi, Sashank J. Reddi et al.

ICML 2024poster

Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point

David Martínez-Rubio, Christophe Roux, Sebastian Pokutta

ICML 2024poster

Differentiability and Optimization of Multiparameter Persistent Homology

Luis Scoccola, Siddharth Setlur, David Loiseaux et al.

ICML 2024poster

Interpreting and Improving Diffusion Models from an Optimization Perspective

Frank Permenter, Chenyang Yuan

ICML 2024poster

Learning Associative Memories with Gradient Descent

Vivien Cabannnes, Berfin Simsek, Alberto Bietti

ICML 2024poster

Non-stationary Online Convex Optimization with Arbitrary Delays

Yuanyu Wan, Chang Yao, Mingli Song et al.

ICML 2024poster

Position: Do pretrained Transformers Learn In-Context by Gradient Descent?

Lingfeng Shen, Aayush Mishra, Daniel Khashabi

ICML 2024poster

Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context

Xiang Cheng, Yuxin Chen, Suvrit Sra

ICML 2024poster

Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot

Zixuan Wang, Stanley Wei, Daniel Hsu et al.

ICML 2024poster