2024 Poster "model compression" Papers
31 papers found
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Yeonhong Park, Jake Hyun, SangLyul Cho et al.
Bayesian Knowledge Distillation: A Bayesian Perspective of Distillation with Uncertainty Quantification
Luyang Fang, Yongkai Chen, Wenxuan Zhong et al.
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Wei Huang, Yangdong Liu, Haotong Qin et al.
BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion
Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells et al.
CHAI: Clustered Head Attention for Efficient LLM Inference
Saurabh Agarwal, Bilge Acun, Basil Hosmer et al.
CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization
K L Navaneet, Kossar Pourahmadi, Soroush Abbasi Koohpayegani et al.
Compressing Large Language Models by Joint Sparsification and Quantization
Jinyang Guo, Jianyu Wu, Zining Wang et al.
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Junyuan Hong, Jinhao Duan, Chenhui Zhang et al.
DFD: Distilling the Feature Disparity Differently for Detectors
Kang Liu, Yingyi Zhang, Jingyun Zhang et al.
Distilling Knowledge from Large-Scale Image Models for Object Detection
Gang Li, Wenhai Wang, Xiang Li et al.
DistiLLM: Towards Streamlined Distillation for Large Language Models
Jongwoo Ko, Sungnyun Kim, Tianyi Chen et al.
Do Topological Characteristics Help in Knowledge Distillation?
Jungeun Kim, Junwon You, Dongjin Lee et al.
DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
Aditya Annavajjala, Alind Khare, Animesh Agrawal et al.
Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module
Yixing Xu, Chao Li, Dong Li et al.
ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking
Wenshuo Li, Xinghao Chen, Han Shu et al.
Exploring Intrinsic Dimension for Vision-Language Model Pruning
Hanzhang Wang, Jiawen Zhang, Qingyuan Ma
Extreme Compression of Large Language Models via Additive Quantization
Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.
Flextron: Many-in-One Flexible Large Language Model
Ruisi Cai, Saurav Muralidharan, Greg Heinrich et al.
FrameQuant: Flexible Low-Bit Quantization for Transformers
Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.
Good Teachers Explain: Explanation-Enhanced Knowledge Distillation
Amin Parchami, Moritz Böhle, Sukrut Rao et al.
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs ``Difficult" Downstream Tasks in LLMs
Lu Yin, Ajay Jaiswal, Shiwei Liu et al.
KernelWarehouse: Rethinking the Design of Dynamic Convolution
Chao Li, Anbang Yao
Lightweight Image Super-Resolution via Flexible Meta Pruning
Yulun Zhang, Kai Zhang, Luc Van Gool et al.
Localizing Task Information for Improved Model Merging and Compression
Ke Wang, Nikolaos Dimitriadis, Guillermo Ortiz-Jimenez et al.
Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models
Peijie Dong, Lujun Li, Zhenheng Tang et al.
Rethinking Optimization and Architecture for Tiny Language Models
Yehui Tang, Kai Han, Fangcheng Liu et al.
Reweighted Solutions for Weighted Low Rank Approximation
David Woodruff, Taisuke Yasuda
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
Jiwon Song, Kyungseok Oh, Taesu Kim et al.
Soft Prompt Recovers Compressed LLMs, Transferably
Zhaozhuo Xu, Zirui Liu, Beidi Chen et al.
Towards efficient deep spiking neural networks construction with spiking activity based pruning
Yaxin Li, Qi Xu, Jiangrong Shen et al.
Transferring Knowledge From Large Foundation Models to Small Downstream Models
Shikai Qiu, Boran Han, Danielle Robinson et al.