Phi-4: What's New and How to Fine-Tune It on Your Computer (+ quantized version)
The good student of GPT-4
With Phi-4, Microsoft reaffirms the trend of increasing size for the Phi model series. The first Phi model had 1.3 billion parameters, which Microsoft referred to as a "small model." Now, Phi-4 has 14 billion parameters, yet Microsoft continues to categorize it as small, even though it cannot be run out-of-the-box on a consumer-grade GPU.
As is typical for Phi models, Microsoft has published remarkable results on public benchmarks, mainly thanks to high-quality synthetic training datasets targeting the domains and tasks found in these benchmarks.
In this article, we will first explore how Microsoft developed Phi-4. Microsoft has disclosed significantly more details for this iteration, particularly regarding the synthetic data that constitutes the majority of its pre-training dataset. Next, we will examine how to use and fine-tune the model and discuss its known limitations.
Additionally, I’ve created a highly accurate quantized version of Phi-4. You can find it here:
Since Phi-4 uses the same architecture as Phi-3/3.5, you can use the same code that I published and explained in this article:
The code is in this notebook:
Note: The configuration of the tokenizer must be changed. I use this one: