Fine-tuning Phi-3.5 MoE and Mini on Your Computer

With code to quantize the models with bitsandbytes and AutoRound

Aug 22, 2024

∙ Paid

Microsoft released Phi-3.5. For now, it includes a new Mini version, a mixture of experts (MoE), and a vision language model (VLM):

They are all available with an MIT license.

We don’t know much about the model yet. Phi-3.5 Mini seems to outperform the previous version, especially in multilingual tasks. The architecture of the model remains the same.

Phi-3.5 MoE is a mixture of 16 Phi-3.5 Mini activating 2 of them during inference. The model has 41.9B parameters. 6.6B parameters are active during inference. According to the public benchmarks, it is better than Gemma 2 9B and Llama 3.1 8B.

Fine-tune Llama 3 on Your Computer

Benjamin Marie

April 22, 2024

Read full story

The vision model has the same capabilities as Microsoft’s Florence-2 but is larger (4.15B parameters).

In this article, we will see how to quantize and fine-tune Phi-3.5 Mini and Phi-3.5 MoE. For Phi-3.5 Mini, we will use both QLoRA and LoRA fine-tuning, with two different quantization algorithms: AutoRound and bitsandbytes. QLoRA and LoRA with Phi-3.5 MoE are not possible on consumer hardware. I provide the fine-tuning code but you will need at least a 32 GB GPU.

The code for quantization and QLoRA/LoRA fine-tuning for Phi-3.5 Mini and MoE are implemented in this notebook:

Get the notebook (#97)

The Kaitchup – AI on a Budget

Fine-tuning Phi-3.5 MoE and Mini on Your Computer

With code to quantize the models with bitsandbytes and AutoRound

Fine-tune Llama 3 on Your Computer

Phi-3.5 Mini and Phi-3.5 MoE GPU Requirements

This post is for paid subscribers