Fine-tune Phi-3 Medium on Your Computer

With a QLoRA fine-tuning more memory-efficient than for Llama 3 8B

Jun 03, 2024

∙ Paid

Microsoft’s Phi models, from Phi-1 to Phi-3 mini, had all been small models with less than 4 billion parameters. Microsoft even used to call Phi “SLMs” for “small language models”.

Phi-3 mini: Fine-tuning and Quantization on Your Computer

Benjamin Marie

May 2, 2024

Read full story

With the release of Phi-3 medium and its 14 billion parameters, Phi now also includes large models. According to Microsoft's evaluation of Phi-3, this medium version achieves a performance close to much larger and state-of-the-art large language models (LLM) such as Llama 3 70B.

Nonetheless, with its larger size, can we fine-tune Phi-3 medium on consumer hardware? Can quantization preserve its accuracy while sufficiently reducing its size?

In this article, we will answer these questions. I first briefly review Phi-3 medium and its architecture. Then, we will see how to fine-tune the model with QLoRA. Fine-tuning Phi-3 medium is possible on your computer if you have a GPU with 16 GB of RAM. It might even be possible on a 12 GB GPU with a small batch size.

All the code for fine-tuning and merging adapters for Phi-3 medium is implemented in this notebook:

Get the notebook (#75)

The Kaitchup – AI on a Budget

Fine-tune Phi-3 Medium on Your Computer

With a QLoRA fine-tuning more memory-efficient than for Llama 3 8B

Phi-3 mini: Fine-tuning and Quantization on Your Computer

This post is for paid subscribers