In December 2024, Microsoft released Phi-4, a powerful 14B parameter model. While impressive, models of this size are difficult to run or fine-tune on consumer hardware.
To address this, Microsoft introduced Phi-4 Mini, a smaller 3.8B parameter version. It can be fine-tuned on a 24GB GPU and runs smoothly on a 12GB GPU. Alongside Phi-4 Mini, Microsoft also released Phi-4 Multimodal, which is capable of processing audio, visual, and text inputs, a rare feature among open models.
Microsoft achieved this by integrating and fine-tuning multimodal LoRA adapters on top of Phi-4 Mini, adding only 1.73B additional parameters.
In this article, we'll explore how Microsoft developed this multimodal LoRA model. We'll also evaluate Phi-4 Mini's performance and examine its efficiency after 8-bit, 4-bit, 3-bit, and 2-bit quantization. The model can be accurately quantized to 4-bit, which makes it a 3 GB model!
If you want to fine-tune the model, you can use the same code I shared in this article and accompanying notebook: