The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Google's Gemma: Fine-tuning, Quantization, and Inference on Your Computer
Copy link
Facebook
Email
Notes
More

Google's Gemma: Fine-tuning, Quantization, and Inference on Your Computer

More training tokens and a huge vocabulary

Benjamin Marie's avatar
Benjamin Marie
Feb 26, 2024
∙ Paid
3

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Google's Gemma: Fine-tuning, Quantization, and Inference on Your Computer
Copy link
Facebook
Email
Notes
More
1
Share
Gemini — Generated by DALL-E

The new Gemma models by Google are the first open LLMs built from the same research and technology used to create the Gemini models. They are only available in two sizes, 2B and 7B. Base and instruct versions, for chat applications, are also provided.

The models are already supported by numerous deep learning frameworks and are small enough to be used on consumer hardware.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, I present the main characteristics of the Gemma models. We will see that some of these characteristics are not standard and that Google seemed to have learned from Llama 2 and Mistral 7B to propose a good 7B model. In the second part of this article, I show how to use the Gemma models: Fine-tuning with QLoRA, inference, and quantization.

I made a notebook showing how to use, fine-tune, and quantize the Gemma models. It’s available here:

Get the notebook (#48)

Keep reading with a 7-day free trial

Subscribe to The Kaitchup – AI on a Budget to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More