The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Fine-tune Quantized Llama 2 on Your GPU with QA-LoRA

Fine-tune Quantized Llama 2 on Your GPU with QA-LoRA

Perfectly merge your fine-tuned adapters with quantized LLMs

Benjamin Marie's avatar
Benjamin Marie
Oct 12, 2023
∙ Paid
6

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Fine-tune Quantized Llama 2 on Your GPU with QA-LoRA
10
Share

QA-LoRA is a new approach for fine-tuning “quantization-aware” LoRA on top of quantized LLMs. I wrote a review of QA-LoRA in this article:

QA-LoRA: Quantization-Aware Fine-tuning for Large Language Models

QA-LoRA: Quantization-Aware Fine-tuning for Large Language Models

Benjamin Marie, PhD
·
October 9, 2023
Read full story

Now that we know how it works, we will see in this tutorial how to fine-tune Llama 2, quantized with GPTQ, using QA-LoRA. I will also show you how to merge the fine-tuned adapter.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

QA-LoRA is still a very young project. I had to correct the code (2 tiny corrections) to make it work for Llama 2. I released a patch and an adapter fine-tuned with QA-LoRA for Llama 2 quantized in 4-bit with AutoGPTQ.

Here is the notebook to reproduce my fine-tuning and merging using QA-LoRA:

Get the notebook (#21)

Since we will experiment with LoRA and Llama 2 quantized with GPTQ, I recommend reading these 2 other articles before this one:

GPTQ or bitsandbytes: Which Quantization Method to Use for LLMs — Examples with Llama 2

GPTQ or bitsandbytes: Which Quantization Method to Use for LLMs — Examples with Llama 2

Benjamin Marie, PhD
·
August 22, 2023
Read full story
LoRA Adapters: When a Naive Merge Leads to Poor Performance

LoRA Adapters: When a Naive Merge Leads to Poor Performance

Benjamin Marie, PhD
·
September 7, 2023
Read full story

Overview of the Implementation of QA-LoRA

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share