ICML 2024 "model compression" Papers

27 papers found

Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Yeonhong Park, Jake Hyun, SangLyul Cho et al.

ICML 2024poster

Bayesian Knowledge Distillation: A Bayesian Perspective of Distillation with Uncertainty Quantification

Luyang Fang, Yongkai Chen, Wenxuan Zhong et al.

ICML 2024poster

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Wei Huang, Yangdong Liu, Haotong Qin et al.

ICML 2024poster

CHAI: Clustered Head Attention for Efficient LLM Inference

Saurabh Agarwal, Bilge Acun, Basil Hosmer et al.

ICML 2024poster

Compressing Large Language Models by Joint Sparsification and Quantization

Jinyang Guo, Jianyu Wu, Zining Wang et al.

ICML 2024poster

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Junyuan Hong, Jinhao Duan, Chenhui Zhang et al.

ICML 2024poster

DFD: Distilling the Feature Disparity Differently for Detectors

Kang Liu, Yingyi Zhang, Jingyun Zhang et al.

ICML 2024poster

DistiLLM: Towards Streamlined Distillation for Large Language Models

Jongwoo Ko, Sungnyun Kim, Tianyi Chen et al.

ICML 2024poster

Do Topological Characteristics Help in Knowledge Distillation?

Jungeun Kim, Junwon You, Dongjin Lee et al.

ICML 2024poster

Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module

Yixing Xu, Chao Li, Dong Li et al.

ICML 2024poster

ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

Yunshan Zhong, Jiawei Hu, You Huang et al.

ICML 2024spotlight

ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking

Wenshuo Li, Xinghao Chen, Han Shu et al.

ICML 2024poster

Exploring Intrinsic Dimension for Vision-Language Model Pruning

Hanzhang Wang, Jiawen Zhang, Qingyuan Ma

ICML 2024poster

Extreme Compression of Large Language Models via Additive Quantization

Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.

ICML 2024poster

Flextron: Many-in-One Flexible Large Language Model

Ruisi Cai, Saurav Muralidharan, Greg Heinrich et al.

ICML 2024poster

FrameQuant: Flexible Low-Bit Quantization for Transformers

Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.

ICML 2024poster

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs ``Difficult" Downstream Tasks in LLMs

Lu Yin, Ajay Jaiswal, Shiwei Liu et al.

ICML 2024poster

KernelWarehouse: Rethinking the Design of Dynamic Convolution

Chao Li, Anbang Yao

ICML 2024poster

Lightweight Image Super-Resolution via Flexible Meta Pruning

Yulun Zhang, Kai Zhang, Luc Van Gool et al.

ICML 2024poster

Localizing Task Information for Improved Model Merging and Compression

Ke Wang, Nikolaos Dimitriadis, Guillermo Ortiz-Jimenez et al.

ICML 2024poster

Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models

Peijie Dong, Lujun Li, Zhenheng Tang et al.

ICML 2024poster

Rethinking Optimization and Architecture for Tiny Language Models

Yehui Tang, Kai Han, Fangcheng Liu et al.

ICML 2024poster

Reweighted Solutions for Weighted Low Rank Approximation

David Woodruff, Taisuke Yasuda

ICML 2024poster

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Jiwon Song, Kyungseok Oh, Taesu Kim et al.

ICML 2024poster

Soft Prompt Recovers Compressed LLMs, Transferably

Zhaozhuo Xu, Zirui Liu, Beidi Chen et al.

ICML 2024poster

Towards efficient deep spiking neural networks construction with spiking activity based pruning

Yaxin Li, Qi Xu, Jiangrong Shen et al.

ICML 2024poster

Transferring Knowledge From Large Foundation Models to Small Downstream Models

Shikai Qiu, Boran Han, Danielle Robinson et al.

ICML 2024poster