AdaFisher: Adaptive Second Order Optimization via Fisher Information

0citations
Project
0
Citations
#1791
in ICLR 2025
of 3827 papers
5
Authors
3
Data Points

Abstract

First-order optimization methods are currently the mainstream in training deep neural networks (DNNs). Optimizers like Adam incorporate limited curvature information by employing the diagonal matrix preconditioning of the stochastic gradient during the training. Despite their widespread, second-order optimization algorithms exhibit superior convergence properties compared to their first-order counterparts e.g. Adam and SGD. However, their practicality in training DNNs is still limited due to increased per-iteration computations compared to the first-order methods. We presentAdaFisher--an adaptive second-order optimizer that leverages adiagonal block-Kroneckerapproximation of the Fisher information matrix for adaptive gradient preconditioning. AdaFisher aims to bridge the gap between enhancedconvergence/generalizationcapabilities and computational efficiency in second-order optimization framework for training DNNs. Despite the slow pace of second-order optimizers, we showcase that AdaFisher can be reliably adopted for image classification, language modeling and stands out for its stability and robustness in hyper-parameter tuning. We demonstrate that AdaFisheroutperforms the SOTA optimizersin terms of both accuracy and convergence speed. Code is available from https://github.com/AtlasAnalyticsLab/AdaFisher.

Citation History

Jan 26, 2026
0
Jan 27, 2026
0
Jan 27, 2026
0