Shiwei Liu
17
Papers
37
Total Citations
Papers (17)
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
ICLR 2025
22
citations
From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
ICML 2025
15
citations
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
ICML 2025
0
citations
Advancing Dynamic Sparse Training by Exploring Optimization Opportunities
ICML 2024
0
citations
Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once
ICML 2024
0
citations
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs ``Difficult" Downstream Tasks in LLMs
ICML 2024
0
citations
CaM: Cache Merging for Memory-efficient LLMs Inference
ICML 2024
0
citations
Data Augmented Flatness-aware Gradient Projection for Continual Learning
ICCV 2023
0
citations
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
ICML 2024
0
citations
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective
AAAI 2025
0
citations
Sparse Training via Boosting Pruning Plasticity with Neuroregeneration
NeurIPS 2021
0
citations
Dynamic Sparse Network for Time Series Classification: Learning What to “See”
NeurIPS 2022
0
citations
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
NeurIPS 2023
0
citations
Predicting mutational effects on protein-protein binding via a side-chain diffusion probabilistic model
NeurIPS 2023
0
citations
Don’t just prune by magnitude! Your mask topology is a secret weapon
NeurIPS 2023
0
citations
Dynamic Sparsity Is Channel-Level Sparsity Learner
NeurIPS 2023
0
citations
Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask?
NeurIPS 2023
0
citations