Along with Llama 3.2 Vision, Meta has released new 1B and 3B parameter models. These models were created by distilling the Llama 3.1 8B, resulting in some of the best-performing models in their categories.
Moreover, these compact models are perfectly suited for budget configurations with limited GPU resources. For instance, the 3B model can be loaded onto an 8 GB GPU, while the 1B model fits on a 4 GB GPU.
Their small size even allows for full fine-tuning on a consumer GPU with 24 GB of memory.
In this article, we will begin by reviewing how Meta developed the Llama 3.2 1B and 3B models. Then, we will implement QLoRA, LoRA, and full fine-tuning for Llama 3.2, comparing the memory consumption of these fine-tuning methods to determine the GPU requirements for fine-tuning Llama 3.2 1B and 3B.
The notebook implementing Llama 3.2 fine-tuning is here: