LQ-LoRA: Jointly Fine-tune and Quantize Large Language Models

Better quantization and better fine-tuned adapters

Nov 30, 2023

∙ Paid

LQ-LoRA decomposes the pre-trained LLM into quantized parameters and a LoRA adapter.

QLoRA is one of the most popular methods to fine-tune adapters on top of quantized LLMs. While QLoRA is very effective, it has also several drawbacks that we have discussed in previous articles:

Don't Merge Your LoRA Adapter Into a 4-bit LLM

Benjamin Marie, PhD

November 13, 2023

Read full story

There are alternatives to QLoRA. For instance, we have tried QA-LoRA which fine-tuned quantization-aware LoRA adapters. QA-LoRA is a good alternative to QLoRA but its official implementation wasn’t supporting recent LLMs and has since been removed from GitHub by its authors.

Fine-tune Quantized Llama 2 on Your GPU with QA-LoRA

Benjamin Marie, PhD

October 12, 2023

Read full story

We need another alternative.

In this article, I present LQ-LoRA: A method decomposing a pre-trained LLM into fixed quantized parameters and a trainable LoRA adapter. We will see how it works and why it performs better than QLoRA.

The Kaitchup – AI on a Budget

LQ-LoRA: Jointly Fine-tune and Quantize Large Language Models

Better quantization and better fine-tuned adapters

Don't Merge Your LoRA Adapter Into a 4-bit LLM

Fine-tune Quantized Llama 2 on Your GPU with QA-LoRA

Reducing the Impact of Quantization Errors

This post is for paid subscribers