Simple and Quick Fine-Tuning of Falcon Models with QLoRA

A one command line tool to adapt a Falcon model to your data

Jun 16, 2023

The Falcon models are the large language models that are among the most popular now for various reasons:

They are very good, especially at problem-solving
They are smaller than other LLMs while performing better
They are entirely free (Apache 2.0 License)
They are available in several versions, including instruct-version that mimic the behavior of ChatGPT

With recent techniques like QLoRa, you can fine-tune Falcon models on consumer hardware. I’ve already discussed QLoRa and Falcon fine-tuning in previous articles.

Fine-tuning Falcon models with QLoRa is relatively easy with Hugging Face libraries. Yet, there is an easier way that requires even less coding: Falcontune.

Falcontune is an open-source project (Apache 2.0 license) developed by Rumen Mihaylov. We can read on the project page:

falcontune allows finetuning FALCONs (e.g., falcon-40b-4bit) on as little as one consumer-grade A100 40GB

Fine-tuning a 40b parameters model on 40GB VRAM sounds great. “4bit” tells us that QLoRa is used. But I wouldn't call the A100 40GB a “consumer-grade” GPU. That’s still a $5,000+ GPU. On the other hand, the 7B parameter version of Falcon that we will use here can definitely fit on a consumer GPU, e.g., an RTX 3060 with 12GB of VRAM.

Fine-tuning Falcon-7B and Falcon-40B with one command line

Note: The following commands are written for Falcon-7B. Replace “7B” with “40B” if you want to run them for Falcon-40B.

Requirements

I ran and tested everything on a free Google Colab instance.

We first need to get Falcontune

git clone https://github.com/rmihaylov/falcontune

Then to install all its dependencies

cd falcontune
pip install -r requirements.txt 
python setup.py install

Finally, we will need the Falcon model itself. I used the Falcon-7B by TheBloke for this article:

wget https://huggingface.co/TheBloke/falcon-7b-instruct-GPTQ/resolve/main/gptq_model-4bit-64g.safetensors

(The 40B version is here: https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ/resolve/main/gptq_model-4bit--1g.safetensors)

Let’s get also some toy datasets.

wget https://github.com/gururise/AlpacaDataCleaned/raw/main/alpaca_data_cleaned.json

And we are now ready.

The command line for fine-tuning

We did “setup.py install” earlier to get a “falcontune” command.

You simply need to run the following command to fine-tune Falcon-7b on the alpaca data:

falcontune finetune \
    --model=falcon-7b-instruct-4bit \
    --weights=./gptq_model-4bit-64g.safetensors \
    --dataset=./alpaca_data_cleaned.json \
    --data_type=alpaca \
    --lora_out_dir=./falcon-7b-instruct-4bit-alpaca/ \
    --mbatch_size=1 \
    --batch_size=2 \
    --epochs=3 \
    --lr=3e-4 \
    --cutoff_len=256 \
    --lora_r=8 \
    --lora_alpha=16 \
    --lora_dropout=0.05 \
    --warmup_steps=5 \
    --save_steps=50 \
    --save_total_limit=3 \
    --logging_steps=5 \
    --target_modules='["query_key_value"]' \
    --backend=triton

This should be quite slow (24 hours on a free Google Colab instance with split runtimes since it would disconnect after 12 hours). The Alpaca dataset is large. You may want to reduce its size for testing. We actually fine-tuned only 2,359,296 parameters thanks to LoRa.

If you want to use your own dataset, just have a look at the file “alpaca_data_cleaned.json” to see what data format is expected by falcontune.

During fine-tuning, CPU RAM and GPU VRAM consumption peaked at 4.0 GB and 8.3 GB, respectively. This is a very affordable configuration for homemade fine-tuning.

Remember that if you use the 40B version of Falcon you would need a much bigger machine.

To test inference you can run:

falcontune generate \
    --interactive \
    --model=falcon-7b-instruct-4bit \
    --weights=./gptq_model-4bit-64g.safetensors \
    --lora_apply_dir falcon-7b-instruct-4bit-alpaca/ \
    --max_new_tokens 50 \
    --use_cache \
    --do_sample \
    --instruction "How to prepare pasta?" \
    --backend triton

And that’s it! You have now a very cheap chat model on your machine.

Thank you for reading The Kaitchup – AI on a Budget. This post is public so feel free to share it.

The Kaitchup – AI on a Budget

Simple and Quick Fine-Tuning of Falcon Models with QLoRA

A one command line tool to adapt a Falcon model to your data

Fine-tuning Falcon-7B and Falcon-40B with one command line

Requirements

The command line for fine-tuning

Discussion about this post