Kaifeng Lyu

11

Papers

344

Total Citations

1

Affiliations

Affiliations

Tsinghua University

Papers (11)

Safety Alignment Should be Made More Than Just a Few Tokens Deep

RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval

A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules

A Quadratic Synchronization Rule for Distributed Deep Learning

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition

NeurIPS 2025arXiv

Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction