The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Fine-tuning Base LLMs vs. Fine-tuning Their Instruct Version

Fine-tuning Base LLMs vs. Fine-tuning Their Instruct Version

Should you fine-tune Llama 3 or Llama 3 Instruct, Gemma 2 or Gemma 2 it?

Benjamin Marie's avatar
Benjamin Marie
Aug 15, 2024
∙ Paid
13

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Fine-tuning Base LLMs vs. Fine-tuning Their Instruct Version
1
Share
Generated with DALL-E

Instruct large language models (LLMs) are specialized versions of base LLMs that have been fine-tuned on instruction datasets. These datasets consist of pairs of instructions or questions and corresponding answers, which are either written by humans or generated by AI. Instruct LLMs are commonly used in chat applications, with GPT-4 being a prominent example.

Instruct LLMs undergo several post-training stages, including supervised fine-tuning, reinforcement learning with human feedback (RLHF), and Direct Preference Optimization (DPO). Through these processes, the models are trained to respond more effectively to human prompts by adhering to a specific format or chat template, often defined by special tokens added to their vocabulary.

Fine-tune Your Own Instruct Version of Mistral 7B with Direct Preference Optimization (DPO)

Fine-tune Your Own Instruct Version of Mistral 7B with Direct Preference Optimization (DPO)

Benjamin Marie
·
October 26, 2023
Read full story

As a result, instruct LLMs produce outputs formatted according to the templates they learned during fine-tuning, whereas base LLMs generate text by predicting the next token without these learned constraints.

When we want to fine-tune an LLM on our own data, these differences raise an important question: Should we fine-tune the base version or the instruct version?

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, we will first explore the process of creating instruct LLMs. Then, we will discuss the potential drawbacks of fine-tuning an instruct model and why fine-tuning base models is almost always preferable.

The following notebook implements fine-tuning and examples of inference with both types of models:

Get the notebook (#95)

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share