SmolLM: Full Fine-tuning and Aligning Tiny LLMs on Your Computer
With supervised fine-tuning and distilled DPO
Recent large language models (LLMs) have billions of parameters. They are usually too large to run on low-cost hardware or small devices efficiently. Among the small LLMs, we currently have good models, such as Phi-3 mini (3.8B parameters), which perform well for language generation tasks but are still too large for many use cases.
Apple OpenELM LLMs are smaller alternatives. The smallest OpenELM has 270M parameters and can be quickly fine-tuned on-device. However, the model struggles to follow instructions, even after fine-tuning.
Hugging Face launched even smaller LLMs: SmolLM, 135M, 360M, and 1.7B parameter LLMs.
In this article, I review SmolLM. We will see how Hugging Face made them. I show how to fine-tune SmolLM for chat applications, focusing on the 135M and 360M versions, and align the models with human preferences using the DPO technique. The main purpose of this article is to show how to fully fine-tune and align tiny LLMs on consumer hardware. If you have high-quality training data for a specific domain that doesn't require complex reasoning, you can successfully fine-tune SmolLM within a few hours, on your computer.
My notebook for fine-tuning and aligning tiny LLMs on consumer GPUs is available here: