The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Gemma 3n: Fine-Tuning, Inference, and Submodel Extraction

Gemma 3n: Fine-Tuning, Inference, and Submodel Extraction

Running Gemma 3n with vLLM and fine-tuning with TRL

Benjamin Marie's avatar
Benjamin Marie
Jun 30, 2025
∙ Paid
6

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Gemma 3n: Fine-Tuning, Inference, and Submodel Extraction
1
Share
Image generated with ChatGPT

The Gemma 3n models are optimized for low-resource environments through selective parameter activation, allowing efficient inference with just 2B or 4B active parameters. Despite their compact footprint, they support a wide range of multimodal inputs, including text, images, audio, and video, and generate text outputs with a context window of up to 32K tokens. These models have also been trained across 140 languages.

Google released both base and instruct variants under the commercial-use-friendly Gemma license.

First released as a preview in May 2025, Gemma 3n has been trending on the Hugging Face Hub ever since, despite limited framework support in the beginning. That changed last week when Google released safetensors versions of the models, along with official support by the Transformers library. This means that popular inference frameworks like vLLM and SGLang, which use Transformers as a backend, can now run Gemma 3n, although not without a few hiccups, as we’ll see.

The Kaitchup – AI on a Budget is a reader-supported publication. To get access to 170+ notebooks and even more tutorials, subscribe.

In this article, we’ll explore how Gemma 3n and its Matformer architecture work. We'll walk through using the instruct (*-it) variant for inference, with vLLM, and demonstrate how to fine-tune the base model. I’ll also show how to extract specific subsets of weights, leveraging Matformer’s modular design.

For hands-on experiments with Gemma 3n, including inference with vLLM and fine-tuning workflows, check out this notebook:

Get the notebook (#174)

Gemma 3n and the MatFormer: Models within Models

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share