Fine-tuning Phi-3.5 MoE and Mini on Your Computer
With code to quantize the models with bitsandbytes and AutoRound
Microsoft released Phi-3.5. For now, it includes a new Mini version, a mixture of experts (MoE), and a vision language model (VLM):
They are all available with an MIT license.
We don’t know much about the model yet. Phi-3.5 Mini seems to outperform the previous version, especially in multilingual tasks. The architecture of the model remains the same.
Phi-3.5 MoE is a mixture of 16 Phi-3.5 Mini activating 2 of them during inference. The model has 41.9B parameters. 6.6B parameters are active during inference. According to the public benchmarks, it is better than Gemma 2 9B and Llama 3.1 8B.
The vision model has the same capabilities as Microsoft’s Florence-2 but is larger (4.15B parameters).
In this article, we will see how to quantize and fine-tune Phi-3.5 Mini and Phi-3.5 MoE. For Phi-3.5 Mini, we will use both QLoRA and LoRA fine-tuning, with two different quantization algorithms: AutoRound and bitsandbytes. QLoRA and LoRA with Phi-3.5 MoE are not possible on consumer hardware. I provide the fine-tuning code but you will need at least a 32 GB GPU.
The code for quantization and QLoRA/LoRA fine-tuning for Phi-3.5 Mini and MoE are implemented in this notebook: