Shiwei Liu

9

Papers

37

Total Citations

Papers (9)

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Advancing Dynamic Sparse Training by Exploring Optimization Opportunities

Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs ``Difficult" Downstream Tasks in LLMs

CaM: Cache Merging for Memory-efficient LLMs Inference

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective