🧬Optimization

Stochastic Optimization

SGD and related optimization methods

376 papers(showing top 100)749 total citations
Compare with other topics
Mar '24 Feb '26326 papers

Related Topics (Optimization)

Also includes: stochastic optimization, stochastic gradient descent, sgd, adam, optimizer

Top Papers

#1

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

Hong Liu, Zhiyuan Li, David Hall et al.

ICLR 2024arXiv:2305.14342
222
citations
#2

Test-time Alignment of Diffusion Models without Reward Over-optimization

Sunwoo Kim, Minkyu Kim, Dongmin Park

ICLR 2025
39
citations
#3

How to Fine-Tune Vision Models with SGD

Ananya Kumar, Ruoqi Shen, Sebastien Bubeck et al.

ICLR 2024arXiv:2211.09359
35
citations
#4

Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity

Eduard Gorbunov, Nazarii Tupitsa, Sayantan Choudhury et al.

ICLR 2025
27
citations
#5

ASGO: Adaptive Structured Gradient Optimization

Kang An, Yuxing Liu, Rui Pan et al.

NeurIPS 2025arXiv:2503.20762
26
citations
#6

Self-Improvement for Neural Combinatorial Optimization: Sample Without Replacement, but Improvement

Dominik Grimm, Jonathan Pirnay

ICLR 2025arXiv:2403.15180
26
citations
#7

ZO-AdaMU Optimizer: Adapting Perturbation by the Momentum and Uncertainty in Zeroth-Order Optimization

Shuoran Jiang, Qingcai Chen, Yang Xiang et al.

AAAI 2024arXiv:2312.15184
zeroth-order optimizationmemory-efficient traininglarge language modelsmomentum adaptation+3
20
citations
#8

B2Opt: Learning to Optimize Black-box Optimization with Little Budget

Xiaobin Li, Kai Wu, Xiaoyu Zhang et al.

AAAI 2025arXiv:2304.11787
18
citations
#9

Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization

Zhitong Xu, Haitao Wang, Jeff Phillips et al.

ICLR 2025
18
citations
#10

Does SGD really happen in tiny subspaces?

Minhak Song, Kwangjun Ahn, Chulhee Yun

ICLR 2025
16
citations
#11

AdaGrad under Anisotropic Smoothness

Yuxing Liu, Rui Pan, Tong Zhang

ICLR 2025arXiv:2406.15244
adaptive gradient methodsanisotropic smoothnessconvergence guaranteeslarge batch training+3
14
citations
#12

Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics

Sebastian Sanokowski, Wilhelm Berghammer, Haoyu Wang et al.

ICLR 2025arXiv:2502.08696
discrete diffusion modelscombinatorial optimizationstatistical physicspolicy gradient theorem+4
14
citations
#13

Trust Region Methods for Nonconvex Stochastic Optimization beyond Lipschitz Smoothness

Chenghan Xie, Chenxi Li, Chuwen Zhang et al.

AAAI 2024arXiv:2310.17319
trust region methodsnonconvex stochastic optimizationgeneralized smoothnessdistributionally robust optimization+4
13
citations
#14

Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems

Juno Kim, Kakei Yamamoto, Kazusato Oko et al.

ICLR 2024arXiv:2312.01127
13
citations
#15

Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods

Sara Klein, Simon Weissmann, Leif Döring

ICLR 2024arXiv:2310.02671
12
citations
#16

Emergence and scaling laws in SGD learning of shallow neural networks

Yunwei Ren, Eshaan Nichani, Denny Wu et al.

NeurIPS 2025arXiv:2504.19983
stochastic gradient descentshallow neural networksscaling lawsextensive-width regime+4
12
citations
#17

Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance

Dimitris Oikonomou, Nicolas Loizou

ICLR 2025
11
citations
#18

Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression

Adam Block, Dylan Foster, Akshay Krishnamurthy et al.

ICLR 2024arXiv:2310.11428
11
citations
#19

Relaxing the Additivity Constraints in Decentralized No-Regret High-Dimensional Bayesian Optimization

Anthony Bardou, Patrick Thiran, Thomas Begin

ICLR 2024arXiv:2305.19838
10
citations
#20

Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning

Jacob Springer, Vaishnavh Nagarajan, Aditi Raghunathan

ICLR 2024arXiv:2405.20439
10
citations
#21

The Optimization Landscape of SGD Across the Feature Learning Strength

Alexander Atanasov, Alexandru Meterez, James Simon et al.

ICLR 2025arXiv:2410.04642
10
citations
#22

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

Marlon Becker, Frederick Altrock, Benjamin Risse

NeurIPS 2025
10
citations
#23

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

Wenze Chen, Shiyu Huang, Yuan Chiang et al.

AAAI 2024arXiv:2207.05631
reinforcement learningdiverse strategy discoverypolicy optimizationinformation-theoretic diversity+3
9
citations
#24

Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data

Chen Fan, Mark Schmidt, Christos Thrampoulidis

NeurIPS 2025arXiv:2502.04664
8
citations
#25

Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods

Avery Ma, Yangchen Pan, Amir-massoud Farahmand

ICLR 2024arXiv:2308.06703
8
citations
#26

Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise

Rui Pan, Yuxing Liu, Xiaoyu Wang et al.

ICLR 2024arXiv:2312.14567
7
citations
#27

Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity

Artavazd Maranjyan, Alexander Tyurin, Peter Richtarik

ICML 2025arXiv:2501.16168
7
citations
#28

Expensive Multi-Objective Bayesian Optimization Based on Diffusion Models

Bingdong Li, Zixiang Di, Yongfan Lu et al.

AAAI 2025arXiv:2405.08674
7
citations
#29

Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

Chenyu Zhang, Xu Chen, Xuan Di

ICLR 2025arXiv:2408.08192
7
citations
#30

Optimizing Posterior Samples for Bayesian Optimization via Rootfinding

Taiwo Adebiyi, Bach Do, Ruda Zhang

ICLR 2025
6
citations
#31

Efficient Distributed Optimization under Heavy-Tailed Noise

Su Hyeong Lee, Manzil Zaheer, Tian Li

ICML 2025arXiv:2502.04164
6
citations
#32

Distributionally Robust Optimization with Bias and Variance Reduction

Ronak Mehta, Vincent Roulet, Krishna Pillutla et al.

ICLR 2024arXiv:2310.13863
6
citations
#33

General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization

Kwangjun Ahn, Gagik Magakyan, Ashok Cutkosky

ICML 2025arXiv:2411.07061
5
citations
#34

PSMGD: Periodic Stochastic Multi-Gradient Descent for Fast Multi-Objective Optimization

Mingjing Xu, Peizhong Ju, Jia Liu et al.

AAAI 2025arXiv:2412.10961
5
citations
#35

Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization

Qi Zhang, Yi Zhou, Ashley Prater-Bennette et al.

AAAI 2024arXiv:2404.01200
distributionally robust optimizationnon-convex optimizationstochastic constrained optimizationcressie-read divergence+4
4
citations
#36

MAST: model-agnostic sparsified training

Yury Demidovich, Grigory Malinovsky, Egor Shulgin et al.

ICLR 2025
4
citations
#37

Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions

Ofir Gaash, Kfir Y. Levy, Yair Carmon

NeurIPS 2025arXiv:2502.16492
stochastic gradient descentgradient clippingconvex optimizationgeneralized smoothness+4
4
citations
#38

Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime

Amit Attia, Matan Schliserman, Uri Sherman et al.

NeurIPS 2025
4
citations
#39

Nesterov acceleration in benignly non-convex landscapes

Kanan Gupta, Stephan Wojtowytsch

ICLR 2025arXiv:2410.08395
4
citations
#40

Direct Distributional Optimization for Provable Alignment of Diffusion Models

Ryotaro Kawata, Kazusato Oko, Atsushi Nitanda et al.

ICLR 2025
3
citations
#41

Gradient-Variation Online Adaptivity for Accelerated Optimization with Hölder Smoothness

Yuheng Zhao, Yu-Hu Yan, Kfir Y. Levy et al.

NeurIPS 2025arXiv:2511.02276
3
citations
#42

Gradient Multi-Normalization for Efficient LLM Training

Meyer Scetbon, Chao Ma, Wenbo Gong et al.

NeurIPS 2025
gradient normalizationstateless optimizersmemory-efficient traininglarge language models+3
3
citations
#43

Complexity Lower Bounds of Adaptive Gradient Algorithms for Non-convex Stochastic Optimization under Relaxed Smoothness

Michael Crawshaw, Mingrui Liu

ICLR 2025
3
citations
#44

Towards Stability and Generalization Bounds in Decentralized Minibatch Stochastic Gradient Descent

Jiahuan Wang, Hong Chen

AAAI 2024
3
citations
#45

Convergence of Distributed Adaptive Optimization with Local Updates

Ziheng Cheng, Margalit Glasgow

ICLR 2025arXiv:2409.13155
distributed adaptive optimizationlocal updatesintermittent communicationcommunication complexity+4
3
citations
#46

Incremental Quasi-Newton Methods with Faster Superlinear Convergence Rates

Zhuanghua Liu, Luo Luo, Bryan Kian Hsiang Low

AAAI 2024arXiv:2402.02359
incremental quasi-newton methodsfinite-sum optimizationbfgs updatesymmetric rank-1 update+4
3
citations
#47

Pareto-Optimality, Smoothness, and Stochasticity in Learning-Augmented One-Max-Search

Ziyad Benomar, Lorenzo Croissant, Vianney Perchet et al.

ICML 2025arXiv:2502.05720
3
citations
#48

Aligned Multi Objective Optimization

Yonathan Efroni, Ben Kretzu, Daniel Jiang et al.

ICML 2025arXiv:2502.14096
3
citations
#49

Quantum Optimization via Gradient-Based Hamiltonian Descent

Jiaqi Leng, Bin Shi

ICML 2025arXiv:2505.14670
3
citations
#50

Regularized Langevin Dynamics for Combinatorial Optimization

Shengyu Feng, Yiming Yang

ICML 2025arXiv:2502.00277
2
citations
#51

Global Optimization with a Power-Transformed Objective and Gaussian Smoothing

Chen Xu

ICML 2025arXiv:2412.05204
2
citations
#52

Gradient correlation is a key ingredient to accelerate SGD with momentum

Julien Hermant, Marien Renaud, Jean-François Aujol et al.

ICLR 2025arXiv:2410.07870
2
citations
#53

Hamiltonian Descent Algorithms for Optimization: Accelerated Rates via Randomized Integration Time

Qiang Fu, Andre Wibisono

NeurIPS 2025arXiv:2505.12553
hamiltonian dynamicsoptimization algorithmsaccelerated convergence ratesrandomized integration time+4
2
citations
#54

Towards Faster Decentralized Stochastic Optimization with Communication Compression

Rustem Islamov, Yuan Gao, Sebastian Stich

ICLR 2025
2
citations
#55

Second-Order Convergence in Private Stochastic Non-Convex Optimization

Youming Tao, Zuyuan Zhang, Dongxiao Yu et al.

NeurIPS 2025
2
citations
#56

Optimal Rates in Continual Linear Regression via Increasing Regularization

Ran Levinstein, Amit Attia, Matan Schliserman et al.

NeurIPS 2025arXiv:2506.06501
continual linear regressionrandom task orderingsisotropic regularizationimplicit regularization+4
2
citations
#57

Long-time asymptotics of noisy SVGD outside the population limit

Victor Priser, PASCAL BIANCHI, Adil Salim

ICLR 2025
2
citations
#58

Newton Meets Marchenko-Pastur: Massively Parallel Second-Order Optimization with Hessian Sketching and Debiasing

Elad Romanov, Fangzhao Zhang, Mert Pilanci

ICLR 2025arXiv:2410.01374
second-order optimizationhessian sketchingnewton methoddistributed optimization+4
2
citations
#59

Zeroth-Order Methods for Nonconvex Stochastic Problems with Decision-Dependent Distributions

Yuya Hikima, Akiko Takeda

AAAI 2025arXiv:2412.20330
2
citations
#60

A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning

Minyoung Kim, Timothy Hospedales

AAAI 2025arXiv:2410.10417
1
citations
#61

Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates

Puning Zhao, Jiafei Wu, Zhe Liu et al.

AAAI 2025arXiv:2408.09891
1
citations
#62

Sample-and-Bound for Non-convex Optimization

Yaoguang Zhai, Zhizhen Qin, Sicun Gao

AAAI 2024arXiv:2401.04812
non-convex optimizationglobal optimizationmonte carlo tree searchbranch-and-bound+2
1
citations
#63

Coupling-based Convergence Diagnostic and Stepsize Scheme for Stochastic Gradient Descent

Xiang Li, Qiaomin Xie

AAAI 2025arXiv:2412.11341
1
citations
#64

Incremental Gradient Descent with Small Epoch Counts is Surprisingly Slow on Ill-Conditioned Problems

Yujun Kim, Jaeyoung Cha, Chulhee Yun

ICML 2025arXiv:2506.04126
1
citations
#65

Learning Curves of Stochastic Gradient Descent in Kernel Regression

Haihan Zhang, Weicheng Lin, Yuanshi Liu et al.

ICML 2025arXiv:2505.22048
1
citations
#66

Consensus Based Stochastic Optimal Control

Liyao Lyu, Jingrun Chen

ICML 2025
1
citations
#67

A Near-Optimal Single-Loop Stochastic Algorithm for Convex Finite-Sum Coupled Compositional Optimization

Bokun Wang, Tianbao Yang

ICML 2025arXiv:2312.02277
1
citations
#68

Controlling the Flow: Stability and Convergence for Stochastic Gradient Descent with Decaying Regularization

Sebastian Kassing, Simon Weissmann, Leif Döring

NeurIPS 2025arXiv:2505.11434
1
citations
#69

SGD with memory: fundamental properties and stochastic acceleration

Dmitry Yarotsky, Maksim Velikanov

ICLR 2025
1
citations
#70

Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning

Xinsong Feng, Zihan Yu, Yanhai Xiong et al.

ICLR 2025arXiv:2502.05537
1
citations
#71

Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling

Yuma Ichikawa, Yamato Arai

ICLR 2025arXiv:2409.02135
combinatorial optimizationcontinuous relaxationdiscrete langevin dynamicsparallel optimization+4
not collected
#72

Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold

Xinghan Li, Haodong Wen, Kaifeng Lyu

NeurIPS 2025arXiv:2511.02773
adaptive gradient methodssharpness minimizationstochastic differential equationsoverparameterized models+2
not collected
#73

MGDA Converges under Generalized Smoothness, Provably

Qi Zhang, Peiyao Xiao, Shaofeng Zou et al.

ICLR 2025
not collected
#74

Solving hidden monotone variational inequalities with surrogate losses

Ryan D'Orazio, Danilo Vucetic, Zichu Liu et al.

ICLR 2025arXiv:2411.05228
variational inequalitiessurrogate lossesmin-max optimizationprojected bellman error+4
not collected
#75

Exploiting Hidden Symmetry to Improve Objective Perturbation for DP Linear Learners with a Nonsmooth L1-Norm

Du Chen, Geoffrey A. Chua

ICLR 2025
not collected
#76

Nonlinearly Preconditioned Gradient Methods: Momentum and Stochastic Analysis

Konstantinos Oikonomidis, Jan Quan, Panagiotis Patrinos

NeurIPS 2025
not collected
#77

Preference Optimization on Pareto Sets: On a Theory of Multi-Objective Optimization

Abhishek Roy, Geelon So, Yian Ma

NeurIPS 2025
multi-objective optimizationpareto-optimal solutionspreference optimizationconstrained optimization+3
not collected
#78

A Near-Optimal Algorithm for Decentralized Convex-Concave Finite-Sum Minimax Optimization

Hongxu Chen, Ke Wei, Haishan Ye et al.

NeurIPS 2025
not collected
#79

Learning from A Single Markovian Trajectory: Optimality and Variance Reduction

Zhenyu Sun, Ermin Wei

NeurIPS 2025
stochastic non-convex optimizationmarkov chain samplingvariance reduction methodssingle trajectory learning+4
not collected
#80

Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson–Romberg Extrapolation

Marina Sheshukova, Denis Belomestny, Alain Oliviero Durmus et al.

ICLR 2025arXiv:2410.05106
stochastic gradient descentrichardson-romberg extrapolationstrongly convex minimizationpolyak-ruppert averaging+4
not collected
#81

Asymptotic theory of SGD with a general learning-rate

Or Goldreich, Ziyang Wei, SOHAM BONNERJEE et al.

NeurIPS 2025
not collected
#82

A Unified Analysis of Stochastic Gradient Descent with Arbitrary Data Permutations and Beyond

Yipeng Li, Xinchen Lyu, Zhenyu Liu

NeurIPS 2025
not collected
#83

Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations

Shaocong Ma, Heng Huang

ICLR 2025arXiv:2510.19975
not collected
#84

A Regularized Newton Method for Nonconvex Optimization with Global and Local Complexity Guarantees

Yuhao Zhou, Jintao Xu, Bingrui Li et al.

NeurIPS 2025
not collected
#85

Revisiting Large-Scale Non-convex Distributionally Robust Optimization

Qi Zhang, Yi Zhou, Simon Khan et al.

ICLR 2025
distributionally robust optimizationnon-convex optimizationstochastic gradient descentgeneralized smoothness+3
not collected
#86

PROFIT: A Specialized Optimizer for Deep Fine Tuning

Anirudh Chakravarthy, Shuai Zheng, Xin Huang et al.

NeurIPS 2025
not collected
#87

Revisiting Consensus Error: A Fine-grained Analysis of Local SGD under Second-order Data Heterogeneity

Kumar Kshitij Patel, Ali Zindari, Sebastian Stich et al.

NeurIPS 2025
distributed optimizationlocal sgdfederated averagingdata heterogeneity+4
not collected
#88

Searching for Optimal Solutions with LLMs via Bayesian Optimization

Dhruv Agarwal, Manoj Ghuhan Arivazhagan, Rajarshi Das et al.

ICLR 2025
not collected
#89

On the Almost Sure Convergence of the Stochastic Three Points Algorithm

Taha EL BAKKALI EL KADI, Omar Saadi

ICLR 2025arXiv:2501.13886
stochastic three points algorithmderivative-free optimizationunconstrained optimizationalmost sure convergence+4
not collected
#90

Decreasing Entropic Regularization Averaged Gradient for Semi-Discrete Optimal Transport

Ferdinand Genans, Antoine Godichon-Baggioni, François-Xavier Vialard et al.

NeurIPS 2025
not collected
#91

A Gradient Guided Diffusion Framework for Chance Constrained Programming

Boyang Zhang, Zhiguo Wang, Ya-Feng Liu

NeurIPS 2025
not collected
#92

Optimistic Online-to-Batch Conversions for Accelerated Convergence and Universality

Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou

NeurIPS 2025
not collected
#93

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Shuze Daniel Liu, Shuhang Chen, Shangtong Zhang

NeurIPS 2025arXiv:2401.07844
not collected
#94

Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping

Zijian Liu, Zhengyuan Zhou

ICLR 2025arXiv:2412.19529
stochastic optimizationheavy-tailed noisenonconvex optimizationgradient clipping+4
not collected
#95

Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise

Enea Monzio Compagnoni, Tianlin Liu, Rustem Islamov et al.

ICLR 2025arXiv:2411.15958
adaptive optimization methodsstochastic differential equationsgradient noise analysiscurvature adaptation+4
not collected
#96

Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression

Jingfeng Wu, Pierre Marion, Peter Bartlett

NeurIPS 2025
not collected
#97

Tight High-Probability Bounds for Nonconvex Heavy-Tailed Scenario under Weaker Assumptions

Weixin An, Yuanyuan Liu, Fanhua Shang et al.

NeurIPS 2025
not collected
#98

Gaussian Approximation and Concentration of Constant Learning-Rate Stochastic Gradient Descent

Ziyang Wei, Jiaqi Li, Zhipeng Lou et al.

NeurIPS 2025
stochastic gradient descentconstant learning rategaussian approximationcentral limit theorem+3
not collected
#99

A Gradient Guidance Perspective on Stepwise Preference Optimization for Diffusion Models

Joshua Tian Jin Tee, Hee Suk Yoon, Abu Hanif Muhammad Syarubany et al.

NeurIPS 2025
preference optimizationdiffusion modelsgradient guidancetext-to-image models+3
not collected
#100

Leveraging Variable Sparsity to Refine Pareto Stationarity in Multi-Objective Optimization

Zeou Hu, Yaoliang Yu

ICLR 2025
not collected