ICML 2024 "low-bit quantization" Papers
6 papers found
Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
Haotong Qin, Xudong Ma, Xingyu Zheng et al.
ICML 2024poster
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Yeonhong Park, Jake Hyun, SangLyul Cho et al.
ICML 2024poster
BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization
Lancheng Zou, Wenqian Zhao, Shuo Yin et al.
ICML 2024poster
Extreme Compression of Large Language Models via Additive Quantization
Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.
ICML 2024poster
FrameQuant: Flexible Low-Bit Quantization for Transformers
Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.
ICML 2024poster
Sharpness-Aware Data Generation for Zero-shot Quantization
Hoang Dung, Cuong Pham, Trung Le et al.
ICML 2024poster