Fine-tune Phi-3 Medium on Your Computer
With a QLoRA fine-tuning more memory-efficient than for Llama 3 8B
Microsoft’s Phi models, from Phi-1 to Phi-3 mini, had all been small models with less than 4 billion parameters. Microsoft even used to call Phi “SLMs” for “small language models”.
With the release of Phi-3 medium and its 14 billion parameters, Phi now also includes large models. According to Microsoft's evaluation of Phi-3, this medium version achieves a performance close to much larger and state-of-the-art large language models (LLM) such as Llama 3 70B.
Nonetheless, with its larger size, can we fine-tune Phi-3 medium on consumer hardware? Can quantization preserve its accuracy while sufficiently reducing its size?
In this article, we will answer these questions. I first briefly review Phi-3 medium and its architecture. Then, we will see how to fine-tune the model with QLoRA. Fine-tuning Phi-3 medium is possible on your computer if you have a GPU with 16 GB of RAM. It might even be possible on a 12 GB GPU with a small batch size.
All the code for fine-tuning and merging adapters for Phi-3 medium is implemented in this notebook: