Instruct large language models (LLM), a.k.a., chat models, are base LLMs fine-tuned on instruction datasets. In an instruction dataset, each training example is an instruction paired with a possible answer to this instruction. In other words, each example has two columns, an instruction, and an answer.
Nonetheless, chat models are causal LLMs. They expect a sequence of tokens as input to predict the next token of this input. It means that we have to format each training example so that the columns “instruction” and “answer” become a unique sequence of tokens.
The best way to do this is to create and apply a chat template. With a chat template, we can systematically apply the same format to all the training examples and make sure that the same format is used at inference time. Templates are also very handy to make sure that we don’t forget any of the special tokens (BOS, EOS, etc.), always include the same “system” instructions, and don’t introduce any unwanted space that would significantly reduce the quality of the output.
In this article, I explain how to create and modify a chat template. We will see how Llama 3’s chat template works and apply a new one through fine-tuning. For demonstration, I fine-tune Llama 3 8B with a chat template and a dataset of pirate speak to turn it into a pirate chat model.
I also made a notebook showing this fine-tuning and how to create/modify chat templates for Llama 3. The same method can be applied to any other LLMs: