2024 Poster "low-bit quantization" Papers
8 papers found
Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
Haotong Qin, Xudong Ma, Xingyu Zheng et al.
ICML 2024poster
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Yeonhong Park, Jake Hyun, SangLyul Cho et al.
ICML 2024poster
BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization
Lancheng Zou, Wenqian Zhao, Shuo Yin et al.
ICML 2024poster
Extreme Compression of Large Language Models via Additive Quantization
Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.
ICML 2024posterarXiv:2401.06118
FrameQuant: Flexible Low-Bit Quantization for Transformers
Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.
ICML 2024posterarXiv:2403.06082
GenQ: Quantization in Low Data Regimes with Generative Synthetic Data
YUHANG LI, Youngeun Kim, Donghyun Lee et al.
ECCV 2024posterarXiv:2312.05272
6
citations
Sharpness-Aware Data Generation for Zero-shot Quantization
Hoang Dung, Cuong Pham, Trung Le et al.
ICML 2024poster
Towards Robust Full Low-bit Quantization of Super Resolution Networks
Denis Makhov, Irina Zhelavskaya, Ruslan Ostapets et al.
ECCV 2024poster
1
citations