"model compression" Papers

72 papers found • Page 2 of 2

ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking

Wenshuo Li, Xinghao Chen, Han Shu et al.

ICML 2024poster

Exploring Intrinsic Dimension for Vision-Language Model Pruning

Hanzhang Wang, Jiawen Zhang, Qingyuan Ma

ICML 2024poster

Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation

Zhewei Yao, Xiaoxia Wu, Cheng Li et al.

AAAI 2024paperarXiv:2303.08302
70
citations

Extreme Compression of Large Language Models via Additive Quantization

Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.

ICML 2024poster

Flextron: Many-in-One Flexible Large Language Model

Ruisi Cai, Saurav Muralidharan, Greg Heinrich et al.

ICML 2024poster

Fluctuation-Based Adaptive Structured Pruning for Large Language Models

Yongqi An, Xu Zhao, Tao Yu et al.

AAAI 2024paperarXiv:2312.11983
96
citations

FrameQuant: Flexible Low-Bit Quantization for Transformers

Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.

ICML 2024poster

Generative Model-Based Feature Knowledge Distillation for Action Recognition

Guiqin Wang, Peng Zhao, Yanjiang Shi et al.

AAAI 2024paperarXiv:2312.08644
6
citations

Good Teachers Explain: Explanation-Enhanced Knowledge Distillation

Amin Parchami, Moritz Böhle, Sukrut Rao et al.

ECCV 2024posterarXiv:2402.03119
18
citations

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs ``Difficult" Downstream Tasks in LLMs

Lu Yin, Ajay Jaiswal, Shiwei Liu et al.

ICML 2024poster

KernelWarehouse: Rethinking the Design of Dynamic Convolution

Chao Li, Anbang Yao

ICML 2024poster

Lightweight Image Super-Resolution via Flexible Meta Pruning

Yulun Zhang, Kai Zhang, Luc Van Gool et al.

ICML 2024poster

Localizing Task Information for Improved Model Merging and Compression

Ke Wang, Nikolaos Dimitriadis, Guillermo Ortiz-Jimenez et al.

ICML 2024poster

OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models

Changhun Lee, Jungyu Jin, Taesu Kim et al.

AAAI 2024paperarXiv:2306.02272
100
citations

Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion

Cunhang Fan, Yujie Chen, Jun Xue et al.

AAAI 2024paperarXiv:2401.12997
5
citations

Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models

Peijie Dong, Lujun Li, Zhenheng Tang et al.

ICML 2024poster

Rethinking Optimization and Architecture for Tiny Language Models

Yehui Tang, Kai Han, Fangcheng Liu et al.

ICML 2024poster

Reweighted Solutions for Weighted Low Rank Approximation

David Woodruff, Taisuke Yasuda

ICML 2024poster

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Jiwon Song, Kyungseok Oh, Taesu Kim et al.

ICML 2024poster

Soft Prompt Recovers Compressed LLMs, Transferably

Zhaozhuo Xu, Zirui Liu, Beidi Chen et al.

ICML 2024poster

Towards efficient deep spiking neural networks construction with spiking activity based pruning

Yaxin Li, Qi Xu, Jiangrong Shen et al.

ICML 2024poster

Transferring Knowledge From Large Foundation Models to Small Downstream Models

Shikai Qiu, Boran Han, Danielle Robinson et al.

ICML 2024poster