The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Gemma 3 270M: Can Tiny Models Learn New Tasks?

A case study with machine translation

Benjamin Marie's avatar
Benjamin Marie
Sep 01, 2025
∙ Paid
15
10
4
Share
Image generated with ChatGPT

At one end of the spectrum are giant open-weight models like DeepSeek and Kimi, which require multiple GPU nodes to keep their full weights in GPU memory. They’re among the strongest open-weights LLMs available.

At the other end are tiny, edge-friendly models, such as Qwen3-0.6B, SmolLM2-360M, and the recent LFM-2-350M, built to run on devices with tight memory and compute budgets, from smartphones to smartwatches. Google has also joined this camp with Gemma 3-270M, the smallest Gemma 3 variant.

That said, these ultralight models aren’t drop-in replacements for their larger counterparts. Used the same way, they often feel constrained, especially on instruction following, and their performance tends to degrade as prompts grow longer.

Google reports ~50 points of accuracy on IFEval for Gemma 3 270M, which is impressive for such a small model.

Gemma 3 270M

Still, it highlights clear limits: scoring ~50% on an old benchmark, which has likely contaminated the pre-training data,1 trails well-known old baselines such as Mistral 7B Instruct and Llama 2 7B. In practice, Gemma 3 270M often fails to follow instructions and tends to produce low-utility answers.

Used “as is” for general-purpose tasks, tiny models like Gemma 3 270M can be great, but are unreliable too often for users to trust them. To be more useful, they should be fine-tuned for a narrowly defined task and domain. Constraining the scope, with a specific input format, a specific objective, and a specific domain vocabulary optimized in the weights, can dramatically improve reliability.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, I show how to teach Gemma 3 270M a task it initially can’t perform, English→French translation, using an inexpensive full fine-tune you can run on a laptop. Out of the box, the model often produces broken French or misinterprets the instruction. After a targeted fine-tune, it becomes serviceable for this narrow task, yielding better translations than much larger models.

The fine-tuning notebook is here:

Get the notebook (#180)

You can adapt the notebook to other Gemma 3 variants and translation directions by changing the model and language names. I recommend a 6 GB GPU; with a reduced sequence length, a 4 GB GPU can work.

Structure of the article:

  1. Brief review of Gemma 3 270M: its architecture and pretraining.

  2. Dataset preparation for fine-tuning.

  3. Fine-tuning with Unsloth.

  4. Learning-curve comparison and evaluation.

  5. Practical tips for reading curves and iterating to improve results.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture