Fine-tune Llama 2 on Your Computer with QLoRa and TRL
On Guanaco and with the correct padding
Llama 2 is a state-of-the-art large language model (LLM) released by Meta.
In the paper presenting the model, Llama 2 demonstrates impressive capabilities on public benchmarks for various natural language generation and coding tasks.
Meta also released Chat versions of Llama 2. These chat models can be used as chatbots. They mimic OpenAI’s ChatGPT capabilities and can solve many problems with the right prompts.
Both versions of Llama 2 are currently available in different sizes: 7B, 13B, and 70B parameters. Note: A 34B parameter version is presented in the paper but has not been released yet.
The 7B and 13B models are especially interesting if you want to run Llama 2 on your computer. With recent advances in quantization, using GPTQ or QLoRa, you can fine-tune and run these models on consumer hardware.
I have written about Llama 2 and GPTQ here:
In this article, I go through all the steps to fine-tune Llama 2 with QLoRa on instruction datasets. I use Hugging Face’s TRL library which simplifies LLM fine-tuning with instruction datasets. After implementing this article, you will have your own Llama 2 chat model running on your computer.
Note: Llama 2 is distributed with a license allowing commercial use. However, note that you cannot use Llama 2 for improving another LLM that is not Llama 2 as explicitly stated in the license. I wrote about this limit of the license in this other article:
How to get Llama 2?
Note: If you already have access to Llama 2 on Hugging Face, you may skip this part.
You must register to get it from Meta. The form to get it is there. You should receive an email from Meta within one hour.
Then, since I’ll use Hugging Face Hub, you will also need to create a Hugging Face account. The email address you used to create this account must be the same email that you used to get the Llama 2 weights.
Then, go to a Llama 2 model card, and follow the instructions (you should be logged in to your account and you will see a checkbox to check and a button to click at the top of the model card). This step takes more time, but you should get access to Llama 2 on the Hugging Face hub within 1 day.
You will also need to create an access token from your Hugging Face account. Go to “settings” in your Hugging Face account and generate one.