Yuanzhi Li

38

Papers

995

Total Citations

Papers (38)

Convergence Analysis of Two-layer Neural Networks with ReLU Activation

NeurIPS 2017arXiv

LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain

NeurIPS 2016arXiv

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls

NeurIPS 2017arXiv

Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates

NeurIPS 2016arXiv

Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods

NeurIPS 2016arXiv

Physics of Language Models: Part 3.1, Knowledge Storage and Extraction

Algorithms and matching lower bounds for approximately-convex optimization

Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks

When Is Generalizable Reinforcement Learning Tractable?

Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels

Towards Understanding the Mixture-of-Experts Layer in Deep Learning

The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning

Learning (Very) Simple Generative Models Is Hard

Vision Transformers provably learn spatial structure

Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals

How Does Adaptive Optimization Impact Local Neural Network Geometry?

SPRING: Studying Papers and Reasoning to play Games

The probability flow ODE is provably fast

Recovery guarantee of weighted low-rank approximation via alternating minimization

Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition

Faster Principal Component Regression and Stable Matrix Chebyshev Approximation

Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU

Near-Optimal Design of Experiments via Regret Minimization

Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations

Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits

An Alternative View: When Does SGD Escape Local Minima?

The Well-Tempered Lasso

A Convergence Theory for Deep Learning via Over-Parameterization

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Online Improper Learning with an Approximation Oracle

NEON2: Finding Local Minima via First-Order Oracles

On the Convergence Rate of Training Recurrent Neural Networks

Complexity of Highly Parallel Non-Smooth Convex Optimization

What Can ResNet Learn Efficiently, Going Beyond Kernels?

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Can SGD Learn Recurrent Neural Networks with Provable Generalization?