LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits

5
citations
#987
in ICML 2025
of 3340 papers
4
Top Authors
4
Data Points

Abstract

Fine-tuning large language models (LLMs) is increasingly costly as models scale to hundreds of billions of parameters, and even parameter-efficient fine-tuning (PEFT) methods like LoRA remain resource-intensive.We introduce LowRA, the first framework to enable LoRA fine-tuning below 2 bits per parameter with minimal performance loss.LowRA optimizes fine-grained quantization—mapping, threshold selection, and precision assignment—while leveraging efficient CUDA kernels for scalable deployment.Extensive evaluations across 4 LLMs and 4 datasets show that LowRA achieves a superior performance–precision trade-off above 2 bits and remains accurate down to 1.15 bits, reducing memory usage by up to 50\%. Our results highlight the potential of ultra-low-bit LoRA fine-tuning for resource-constrained environments.

Citation History

Jan 28, 2026
0
Feb 13, 2026
5+5
Feb 13, 2026
5
Feb 13, 2026
5