"gradient descent" Papers
18 papers found
Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification
Hyunji Jung, Hanseul Cho, Chulhee Yun
Hamiltonian Descent Algorithms for Optimization: Accelerated Rates via Randomized Integration Time
Qiang Fu, Andre Wibisono
Learning High-Degree Parities: The Crucial Role of the Initialization
Emmanuel Abbe, Elisabetta Cornacchia, Jan Hązła et al.
Simple and Optimal Sublinear Algorithms for Mean Estimation
Beatrice Bertolotti, Matteo Russo, Chris Schwiegelshohn et al.
Transformer Learns Optimal Variable Selection in Group-Sparse Classification
Chenyang Zhang, Xuran Meng, Yuan Cao
Transformers are almost optimal metalearners for linear classification
Roey Magen, Gal Vardi
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
Jianhao Huang, Zixuan Wang, Jason Lee
Asymptotics of feature learning in two-layer networks after one gradient-step
Hugo Cui, Luca Pesce, Yatin Dandi et al.
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Nikhil Vyas, Depen Morwani, Rosie Zhao et al.
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Khashayar Gatmiry, Nikunj Saunshi, Sashank J. Reddi et al.
Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point
David Martínez-Rubio, Christophe Roux, Sebastian Pokutta
Differentiability and Optimization of Multiparameter Persistent Homology
Luis Scoccola, Siddharth Setlur, David Loiseaux et al.
Interpreting and Improving Diffusion Models from an Optimization Perspective
Frank Permenter, Chenyang Yuan
Learning Associative Memories with Gradient Descent
Vivien Cabannnes, Berfin Simsek, Alberto Bietti
Non-stationary Online Convex Optimization with Arbitrary Delays
Yuanyu Wan, Chang Yao, Mingli Song et al.
Position: Do pretrained Transformers Learn In-Context by Gradient Descent?
Lingfeng Shen, Aayush Mishra, Daniel Khashabi
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Xiang Cheng, Yuxin Chen, Suvrit Sra
Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot
Zixuan Wang, Stanley Wei, Daniel Hsu et al.