Stochastic Optimization
SGD and related optimization methods
Related Topics (Optimization)
Top Papers
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu, Zhiyuan Li, David Hall et al.
Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse Problems
Hyungjin Chung, Suhyeon Lee, Jong Chul YE
Offline Actor-Critic for Average Reward MDPs
William Powell, Jeongyeol Kwon, Qiaomin Xie et al.
End-to-End Rate-Distortion Optimized 3D Gaussian Representation
Henan Wang, Hanxin Zhu, Tianyu He et al.
The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing
Shen Nie, Hanzhong Guo, Cheng Lu et al.
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally
Qiuhong Shen, Xingyi Yang, Xinchao Wang
Test-time Alignment of Diffusion Models without Reward Over-optimization
Sunwoo Kim, Minkyu Kim, Dongmin Park
Stable Neural Stochastic Differential Equations in Analyzing Irregular Time Series Data
YongKyung Oh, Dongyoung Lim, Sungil Kim
How to Fine-Tune Vision Models with SGD
Ananya Kumar, Ruoqi Shen, Sebastien Bubeck et al.
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
Junkang Wu, Yuexiang Xie, Zhengyi Yang et al.
Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity
Eduard Gorbunov, Nazarii Tupitsa, Sayantan Choudhury et al.
Self-Improvement for Neural Combinatorial Optimization: Sample Without Replacement, but Improvement
Dominik Grimm, Jonathan Pirnay
ASGO: Adaptive Structured Gradient Optimization
Kang An, Yuxing Liu, Rui Pan et al.
Runtime Analysis of the SMS-EMOA for Many-Objective Optimization
Weijie Zheng, Benjamin Doerr
Quasi-Monte Carlo for 3D Sliced Wasserstein
Khai Nguyen, Nicola Bariletto, Nhat Ho
The AdEMAMix Optimizer: Better, Faster, Older
Matteo Pagliardini, Pierre Ablin, David Grangier
Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping
Zijian Liu, Zhengyuan Zhou
Implicit bias of SGD in $L_2$-regularized linear DNNs: One-way jumps from high to low rank
Zihan Wang, Arthur Jacot
Self-Consistency Preference Optimization
Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang et al.
ZO-AdaMU Optimizer: Adapting Perturbation by the Momentum and Uncertainty in Zeroth-Order Optimization
Shuoran Jiang, Qingcai Chen, Yang Xiang et al.
Domain Randomization via Entropy Maximization
Gabriele Tiboni, Pascal Klink, Jan Peters et al.
Adversarial Adaptive Sampling: Unify PINN and Optimal Transport for the Approximation of PDEs
Kejun Tang, Jiayu Zhai, Xiaoliang Wan et al.
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu, Tim Xiao, Weiyang Liu et al.
Constrained Bayesian Optimization under Partial Observations: Balanced Improvements and Provable Convergence
Shengbo Wang, Ke Li
Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization
Zhitong Xu, Haitao Wang, Jeff Phillips et al.
B2Opt: Learning to Optimize Black-box Optimization with Little Budget
Xiaobin Li, Kai Wu, Xiaoyu Zhang et al.
Temporally and Distributionally Robust Optimization for Cold-Start Recommendation
Xinyu Lin, Wenjie Wang, Jujia Zhao et al.
Understanding Optimization in Deep Learning with Central Flows
Jeremy Cohen, Alex Damian, Ameet Talwalkar et al.
Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling
Wei Guo, Molei Tao, Yongxin Chen
Grokking at the Edge of Numerical Stability
Lucas Prieto, Melih Barsbey, Pedro Mediano et al.
No Preference Left Behind: Group Distributional Preference Optimization
Binwei Yao, Zefan Cai, Yun-Shiuan Chuang et al.
Learning to Optimize Permutation Flow Shop Scheduling via Graph-Based Imitation Learning
Longkang Li, Siyuan Liang, Zihao Zhu et al.
Does SGD really happen in tiny subspaces?
Minhak Song, Kwangjun Ahn, Chulhee Yun
Decentralized Riemannian Conjugate Gradient Method on the Stiefel Manifold
Jun Chen, Haishan Ye, Mengmeng Wang et al.
Adaptive teachers for amortized samplers
Minsu Kim, Sanghyeok Choi, Taeyoung Yun et al.
Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics
Sebastian Sanokowski, Wilhelm Berghammer, Haoyu Wang et al.
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
XiangCheng Zhang, Fang Kong, Baoxiang Wang et al.
FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch
Virginia Aglietti, Ira Ktena, Jessica Schrouff et al.
Deep Distributed Optimization for Large-Scale Quadratic Programming
Augustinos Saravanos, Hunter Kuperman, Alex Oshin et al.
AdaGrad under Anisotropic Smoothness
Yuxing Liu, Rui Pan, Tong Zhang
Light Schrödinger Bridge
Alexander Korotin, Nikita Gushchin, Evgeny Burnaev
Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems
Juno Kim, Kakei Yamamoto, Kazusato Oko et al.
Trust Region Methods for Nonconvex Stochastic Optimization beyond Lipschitz Smoothness
Chenghan Xie, Chenxi Li, Chuwen Zhang et al.
Emergence and scaling laws in SGD learning of shallow neural networks
Yunwei Ren, Eshaan Nichani, Denny Wu et al.
Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods
Sara Klein, Simon Weissmann, Leif Döring
In Search of Adam’s Secret Sauce
Antonio Orvieto, Robert Gower
Mitigating the Curse of Dimensionality for Certified Robustness via Dual Randomized Smoothing
Song Xia, Yi Yu, Jiang Xudong et al.
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression
Adam Block, Dylan Foster, Akshay Krishnamurthy et al.
SDGMNet: Statistic-Based Dynamic Gradient Modulation for Local Descriptor Learning
Yuxin Deng, Jiayi Ma
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Dimitris Oikonomou, Nicolas Loizou
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
Zeman Li, Xinwei Zhang, Peilin Zhong et al.
Cumulative Regret Analysis of the Piyavskii–Shubert Algorithm and Its Variants for Global Optimization
Kaan Gokcesu, Hakan Gökcesu
Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach
Haotian Ju, Hongyang Zhang, Dongyue Li
On the Crucial Role of Initialization for Matrix Factorization
Bingcong Li, Liang Zhang, Aryan Mokhtari et al.
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker, Frederick Altrock, Benjamin Risse
Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning
Jacob Springer, Vaishnavh Nagarajan, Aditi Raghunathan
Variational Inference for SDEs Driven by Fractional Noise
Rembert Daems, Manfred Opper, Guillaume Crevecoeur et al.
Relaxing the Additivity Constraints in Decentralized No-Regret High-Dimensional Bayesian Optimization
Anthony Bardou, Patrick Thiran, Thomas Begin
The Optimization Landscape of SGD Across the Feature Learning Strength
Alexander Atanasov, Alexandru Meterez, James Simon et al.
Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models
Yingqing Guo, Yukang Yang, Hui Yuan et al.
Neural structure learning with stochastic differential equations
Benjie Wang, Joel Jennings, Wenbo Gong
Improved Active Learning via Dependent Leverage Score Sampling
Atsushi Shimizu, Xiaoou Cheng, Christopher Musco et al.
Transition Path Sampling with Improved Off-Policy Training of Diffusion Path Samplers
Kiyoung Seong, Seonghyun Park, Seonghwan Kim et al.
Few for Many: Tchebycheff Set Scalarization for Many-Objective Optimization
Xi Lin, Yilu Liu, Xiaoyuan Zhang et al.
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
Yunhong Lu, Qichao Wang, Hengyuan Cao et al.
Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation
Zhao Song, Mingquan Ye, Junze Yin et al.
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Bingrui Li, Wei Huang, Andi Han et al.
Deep Nonlinear Sufficient Dimension Reduction
Yinfeng Chen, Yuling Jiao, Rui Qiu et al.
Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics
Omar Chehab, Anna Korba, Austin Stromme et al.
DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization
Wenze Chen, Shiyu Huang, Yuan Chiang et al.
Pareto Front-Diverse Batch Multi-Objective Bayesian Optimization
Alaleh Ahmadianshalchi, Syrine Belakaria, Janardhan Rao Doppa
Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data
Chen Fan, Mark Schmidt, Christos Thrampoulidis
Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function
Maria-Florina Balcan, Anh Nguyen, Dravyansh Sharma
Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction
Guowei Xu, Jiale Tao, Wen Li et al.
Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Seongho Son, William Bankes, Sayak Ray Chowdhury et al.
Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments
Yun Qu, Cheems Wang, Yixiu Mao et al.
Offline-to-Online Hyperparameter Transfer for Stochastic Bandits
Dravyansh Sharma, Arun Suggala
DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation
Wenliang Zhao, Haolin Wang, Jie Zhou et al.
Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling
Jakob Hollenstein, Georg Martius, Justus Piater
Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods
Avery Ma, Yangchen Pan, Amir-massoud Farahmand
Sharpness-Aware Minimization: General Analysis and Improved Rates
Dimitris Oikonomou, Nicolas Loizou
Improved Metric Distortion via Threshold Approvals
Elliot Anshelevich, Aris Filos-Ratsikas, Christopher Jerrett et al.
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li, Ming Lin, Tomer Galanti et al.
On the Limitations of Temperature Scaling for Distributions with Overlaps
Muthu Chidambaram, Rong Ge
Second Order Bounds for Contextual Bandits with Function Approximation
Aldo Pacchiano
POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality
Joey Wilson, Marcelino M. de Almeida, Sachit Mahajan et al.
Decision Tree Induction Through LLMs via Semantically-Aware Evolution
Tennison Liu, Nicolas Huynh, Mihaela van der Schaar
Universal generalization guarantees for Wasserstein distributionally robust models
Tam Le, Jerome Malick
Error Bounds for Gaussian Process Regression Under Bounded Support Noise with Applications to Safety Certification
Robert Reed, Luca Laurenti, Morteza Lahijanian
Robust and Conjugate Spatio-Temporal Gaussian Processes
William Laplante, Matias Altamirano, Andrew Duncan et al.
Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning
Yang Xu, Washim Mondal, Vaneet Aggarwal
Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity
Artavazd Maranjyan, Alexander Tyurin, Peter Richtarik
Two-timescale Extragradient for Finding Local Minimax Points
Jiseok Chae, Kyuwon Kim, Donghwan Kim
Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise
Rui Pan, Yuxing Liu, Xiaoyu Wang et al.
Regret Analysis of Repeated Delegated Choice
Suho Shin, Keivan Rezaei, Mohammad Hajiaghayi et al.
Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum
SARIT KHIRIRAT, Abdurakhmon Sadiev, Artem Riabinin et al.
Expensive Multi-Objective Bayesian Optimization Based on Diffusion Models
Bingdong Li, Zixiang Di, Yongfan Lu et al.
Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games
Yang Cai, Gabriele Farina, Julien Grand-Clément et al.
Stochastic Online Instrumental Variable Regression: Regrets for Endogeneity and Bandit Feedback
Riccardo Della Vecchia, Debabrota Basu
Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation
Chenyu Zhang, Xu Chen, Xuan Di