Shiwei Liu

Papers

Total Citations

Papers (17)

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

ICLR 2025

citations

From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications

ICML 2025

citations

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs ``Difficult" Downstream Tasks in LLMs

ICML 2024

citations

CaM: Cache Merging for Memory-efficient LLMs Inference

ICML 2024

citations

Data Augmented Flatness-aware Gradient Projection for Continual Learning

ICCV 2023

citations

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

ICML 2024

citations

Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective

AAAI 2025

citations

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

NeurIPS 2021

citations

Dynamic Sparse Network for Time Series Classification: Learning What to “See”

NeurIPS 2022

citations

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

NeurIPS 2023

citations

Predicting mutational effects on protein-protein binding via a side-chain diffusion probabilistic model

NeurIPS 2023

citations

Don’t just prune by magnitude! Your mask topology is a secret weapon

NeurIPS 2023

citations

Dynamic Sparsity Is Channel-Level Sparsity Learner

NeurIPS 2023

citations

Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask?

NeurIPS 2023

citations

Shiwei Liu

Papers (17)

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Advancing Dynamic Sparse Training by Exploring Optimization Opportunities

Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs ``Difficult" Downstream Tasks in LLMs

CaM: Cache Merging for Memory-efficient LLMs Inference

Data Augmented Flatness-aware Gradient Projection for Continual Learning

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

Dynamic Sparse Network for Time Series Classification: Learning What to “See”

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

Predicting mutational effects on protein-protein binding via a side-chain diffusion probabilistic model

Don’t just prune by magnitude! Your mask topology is a secret weapon

Dynamic Sparsity Is Channel-Level Sparsity Learner

Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask?