Lu Yin

9

Papers

38

Total Citations

Papers (9)

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications

AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs

NeurIPS 2025arXiv

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

MagShield: Towards Better Robustness in Sparse Inertial Motion Capture Under Magnetic Disturbances

Advancing Dynamic Sparse Training by Exploring Optimization Opportunities

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs ``Difficult" Downstream Tasks in LLMs

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

Dynamic Sparsity Is Channel-Level Sparsity Learner