Google's Gemma: Fine-tuning, Quantization, and Inference on Your Computer
More training tokens and a huge vocabulary
The new Gemma models by Google are the first open LLMs built from the same research and technology used to create the Gemini models. They are only available in two sizes, 2B and 7B. Base and instruct versions, for chat applications, are also provided.
The models are already supported by numerous deep learning frameworks and are small enough to be used on consumer hardware.
In this article, I present the main characteristics of the Gemma models. We will see that some of these characteristics are not standard and that Google seemed to have learned from Llama 2 and Mistral 7B to propose a good 7B model. In the second part of this article, I show how to use the Gemma models: Fine-tuning with QLoRA, inference, and quantization.
I made a notebook showing how to use, fine-tune, and quantize the Gemma models. It’s available here: