The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Mistral 7B: QLoRA Fine-tuning + 4-bit (Q4/NF4/INT4) VRAM Usage

VRAM requirements, LoRA target_modules, and 4-bit quantization (NF4/INT4)

Benjamin Marie's avatar
Benjamin Marie
Oct 23, 2023
∙ Paid
The mistral is a wind blowing in the northern Mediterranean Sea.

Mistral 7B is now a very popular large language model (LLM). It outperforms all the other pre-trained LLMs of similar size and is even better than larger LLMs such as Llama 2 13B.

It is also very well optimized for fast decoding, especially for long contexts, thanks to the use of a sliding window to compute attention and grouped-query attention (GQA). You can find more details in the arXiv paper presenting Mistral 7B.

Mistral 7B is good but small enough to be exploited with affordable hardware (VRAM requirements: ~15 GB to load; ~4 GB with 4-bit (Q4/NF4) quantization).

In this article, I will show you how to fine-tune Mistral 7B with QLoRA. We will use the dataset “ultrachat” that I modified for this article. Ultrachat was used by Hugging Face to create the very good Zephyr 7B. We will also see how to quantize Mistral7B with AutoGPTQ.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

I wrote notebooks implementing all the sections. You can find them here:

Get the notebooks (#22, #23)

Last update: March 8th, 2024

Mistral 7B QLoRA fine-tuning (TRL): VRAM requirements + LoRA target_modules

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 The Kaitchup · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture