The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
LQ-LoRA: Jointly Fine-tune and Quantize Large Language Models

LQ-LoRA: Jointly Fine-tune and Quantize Large Language Models

Better quantization and better fine-tuned adapters

Nov 30, 2023
∙ Paid
8

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
LQ-LoRA: Jointly Fine-tune and Quantize Large Language Models
1
Share
LQ-LoRA decomposes the pre-trained LLM into quantized parameters and a LoRA adapter.

QLoRA is one of the most popular methods to fine-tune adapters on top of quantized LLMs. While QLoRA is very effective, it has also several drawbacks that we have discussed in previous articles:

Don't Merge Your LoRA Adapter Into a 4-bit LLM

Don't Merge Your LoRA Adapter Into a 4-bit LLM

Benjamin Marie, PhD
·
November 13, 2023
Read full story

There are alternatives to QLoRA. For instance, we have tried QA-LoRA which fine-tuned quantization-aware LoRA adapters. QA-LoRA is a good alternative to QLoRA but its official implementation wasn’t supporting recent LLMs and has since been removed from GitHub by its authors.

Fine-tune Quantized Llama 2 on Your GPU with QA-LoRA

Fine-tune Quantized Llama 2 on Your GPU with QA-LoRA

Benjamin Marie, PhD
·
October 12, 2023
Read full story

We need another alternative.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, I present LQ-LoRA: A method decomposing a pre-trained LLM into fixed quantized parameters and a trainable LoRA adapter. We will see how it works and why it performs better than QLoRA.

Reducing the Impact of Quantization Errors

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share