A Cheap Zephyr 7B Beta: Distilled DPO on Consumer Hardware

The recipe for training a Zephyr-like model without using A100 GPUs

Nov 09, 2023

∙ Paid

Hugging Face’s Zephyr 7B Beta is a 7 billion parameter chat model outperforming much larger LLMs. In the previous issue of The Kaitchup, we saw what makes the model so good: knowledge distillation.

Zephyr 7B Beta: A Good Teacher Is All You Need

Benjamin Marie, PhD

November 6, 2023

Read full story

Hugging Face trained Zephyr with DPO to align it with human preferences. In another article, we also saw why DPO is much simpler than standard reinforcement learning with human feedback (RLHF) while performing as well.

Fine-tune Your Own Instruct Version of Mistral 7B with Direct Preference Optimization (DPO)

Benjamin Marie, PhD

October 26, 2023

Read full story

While Zephyr 7B Beta was a relatively cheap model to make thanks to DPO and distillation, Hugging Face still needed 16 A100 80 GB GPUs for a few hours to train it. Note: In the cloud, the cost of training Zephyr would be a few hundred dollars.

In this article, we will see how to turn Mistral 7B into a Zephyr 7B Beta on consumer hardware using a parameter-efficient training method and quantization (QLoRA). I also adapt the training data made and used by Hugging Face, along with the training hyperparameters, to speed up training and reduce memory consumption.

My recipe for a cheap Zephyr 7B is implemented in this notebook:

Get the notebook (#26)

The Kaitchup – AI on a Budget

A Cheap Zephyr 7B Beta: Distilled DPO on Consumer Hardware

The recipe for training a Zephyr-like model without using A100 GPUs

Zephyr 7B Beta: A Good Teacher Is All You Need

Fine-tune Your Own Instruct Version of Mistral 7B with Direct Preference Optimization (DPO)

This post is for paid subscribers