The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Fine-tune Your Own Instruct Version of Mistral 7B with Direct Preference Optimization (DPO)
Copy link
Facebook
Email
Notes
More

Fine-tune Your Own Instruct Version of Mistral 7B with Direct Preference Optimization (DPO)

Making a cheap Zephyr 7B

Benjamin Marie's avatar
Benjamin Marie
Oct 26, 2023
∙ Paid
24

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Fine-tune Your Own Instruct Version of Mistral 7B with Direct Preference Optimization (DPO)
Copy link
Facebook
Email
Notes
More
25
2
Share
Mistral and Zephyr are winds. Are we entering a new era where LLMs don’t have animal names anymore? — Picture generated by Substack’s AI with the prompt ‘stormy weather‘

Hugging Face recently published Zephyr 7B, a chat model outperforming Llama 2 70B. Currently, Zephyr 7B is ranked first on the OpenLLM leaderboard.

How did they do?

Zephyr 7B is based on Mistral 7B fine-tuned with Direct Preference Optimization (DPO), a simple but effective alternative to reinforcement learning with human feedback (RLHF) that is usually used to fine-tune instruct LLMs.

In this article, I first present DPO and highlight its advantages over RLHF. Then, we will see how to fine-tune Mistral 7B with DPO using Hugging Face’s TRL. I adapted the settings and hyperparameters used by Hugging Face to train Zephyr 7B so that we can do it on consumer hardware.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

The notebook implementing DPO training for Mistral 7B is available here:

Get the notebook (#24)

Last update: March 13th, 2024

Keep reading with a 7-day free trial

Subscribe to The Kaitchup – AI on a Budget to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More