Fine-tune Gemma 2 on Your Computer with LoRA and QLoRA
Using Hugging Face libraries and Unsloth
Gemma 2 are two LLMs of 9B and 27B parameters released by Google. A 2.6B model will be released later. These LLMs have been shown to perform very well in many language generation tasks. For specialized domains and tasks, this performance can be further improved with fine-tuning.
9B and 27B models are particularly suitable for high-end consumer GPUs (24 GB of RAM). The 9B model can be fine-tuned with LoRA, i.e., without quantization while the 27B model requires quantization but still leaves some memory available for batch training and longer sequences.
In this article, I present Gemma 2 with a particular focus on the differences with the first version of Gemma (let’s call it Gemma v1). Then, we will see how to fine-tune Gemma 2 on consumer hardware using LoRA and QLoRA. We will see the code for fine-tuning using Hugging Face libraries (Transformers, PEFT, and TRL) and Unsloth (for better efficiency).
The code for fine-tuning Gemma 2 with Unsloth and Transformers is also ready to run in this notebook: