LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits

5citations

arXiv:2502.08141

citations

#987

in ICML 2025

of 3340 papers

Top Authors

Data Points

Top Authors

Zikai Zhou Qizheng Zhang Hermann Kumbong Kunle Olukotun

Abstract

Fine-tuning large language models (LLMs) is increasingly costly as models scale to hundreds of billions of parameters, and even parameter-efficient fine-tuning (PEFT) methods like LoRA remain resource-intensive.We introduce LowRA, the first framework to enable LoRA fine-tuning below 2 bits per parameter with minimal performance loss.LowRA optimizes fine-grained quantization—mapping, threshold selection, and precision assignment—while leveraging efficient CUDA kernels for scalable deployment.Extensive evaluations across 4 LLMs and 4 datasets show that LowRA achieves a superior performance–precision trade-off above 2 bits and remains accurate down to 1.15 bits, reducing memory usage by up to 50\%. Our results highlight the potential of ultra-low-bit LoRA fine-tuning for resource-constrained environments.

Citation History

Jan 28, 2026

Feb 13, 2026

5+5

Feb 13, 2026