Yuanzhi Li

38
Papers
995
Total Citations

Papers (38)

Convergence Analysis of Two-layer Neural Networks with ReLU Activation

NeurIPS 2017arXiv
674
citations

LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain

NeurIPS 2016arXiv
133
citations

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

ICLR 2025
98
citations

Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls

NeurIPS 2017arXiv
54
citations

Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates

NeurIPS 2016arXiv
30
citations

Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods

NeurIPS 2016arXiv
6
citations

Physics of Language Models: Part 3.1, Knowledge Storage and Extraction

ICML 2024
0
citations

Algorithms and matching lower bounds for approximately-convex optimization

NeurIPS 2016
0
citations

Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning

AAAI 2024arXiv
0
citations

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks

NeurIPS 2019
0
citations

When Is Generalizable Reinforcement Learning Tractable?

NeurIPS 2021
0
citations

Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels

NeurIPS 2021
0
citations

Towards Understanding the Mixture-of-Experts Layer in Deep Learning

NeurIPS 2022
0
citations

The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning

NeurIPS 2022
0
citations

Learning (Very) Simple Generative Models Is Hard

NeurIPS 2022
0
citations

Vision Transformers provably learn spatial structure

NeurIPS 2022
0
citations

Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals

NeurIPS 2023
0
citations

How Does Adaptive Optimization Impact Local Neural Network Geometry?

NeurIPS 2023
0
citations

SPRING: Studying Papers and Reasoning to play Games

NeurIPS 2023
0
citations

The probability flow ODE is provably fast

NeurIPS 2023
0
citations

Recovery guarantee of weighted low-rank approximation via alternating minimization

ICML 2016
0
citations

Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition

ICML 2017
0
citations

Faster Principal Component Regression and Stable Matrix Chebyshev Approximation

ICML 2017
0
citations

Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU

ICML 2017
0
citations

Near-Optimal Design of Experiments via Regret Minimization

ICML 2017
0
citations

Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations

ICML 2017
0
citations

Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits

ICML 2018
0
citations

An Alternative View: When Does SGD Escape Local Minima?

ICML 2018
0
citations

The Well-Tempered Lasso

ICML 2018
0
citations

A Convergence Theory for Deep Learning via Over-Parameterization

ICML 2019
0
citations

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

NeurIPS 2018
0
citations

Online Improper Learning with an Approximation Oracle

NeurIPS 2018
0
citations

NEON2: Finding Local Minima via First-Order Oracles

NeurIPS 2018
0
citations

On the Convergence Rate of Training Recurrent Neural Networks

NeurIPS 2019
0
citations

Complexity of Highly Parallel Non-Smooth Convex Optimization

NeurIPS 2019
0
citations

What Can ResNet Learn Efficiently, Going Beyond Kernels?

NeurIPS 2019
0
citations

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

NeurIPS 2019
0
citations

Can SGD Learn Recurrent Neural Networks with Provable Generalization?

NeurIPS 2019
0
citations