Shiwei Liu

17
Papers
37
Total Citations

Papers (17)

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

ICLR 2025
22
citations

From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications

ICML 2025
15
citations

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

ICML 2025
0
citations

Advancing Dynamic Sparse Training by Exploring Optimization Opportunities

ICML 2024
0
citations

Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once

ICML 2024
0
citations

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs ``Difficult" Downstream Tasks in LLMs

ICML 2024
0
citations

CaM: Cache Merging for Memory-efficient LLMs Inference

ICML 2024
0
citations

Data Augmented Flatness-aware Gradient Projection for Continual Learning

ICCV 2023
0
citations

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

ICML 2024
0
citations

Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective

AAAI 2025
0
citations

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

NeurIPS 2021
0
citations

Dynamic Sparse Network for Time Series Classification: Learning What to “See”

NeurIPS 2022
0
citations

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

NeurIPS 2023
0
citations

Predicting mutational effects on protein-protein binding via a side-chain diffusion probabilistic model

NeurIPS 2023
0
citations

Don’t just prune by magnitude! Your mask topology is a secret weapon

NeurIPS 2023
0
citations

Dynamic Sparsity Is Channel-Level Sparsity Learner

NeurIPS 2023
0
citations

Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask?

NeurIPS 2023
0
citations