After fine-tuning a large language model (LLM) on your data, you might observe that the model doesn’t know when to stop generating tokens. Although the first tokens answer your prompt appropriately, the model continues to produce irrelevant tokens until it reaches the maximum sequence length.
This is a very common issue.
This issue arises when the end-of-sequence (EOS) token is not properly configured. This can occur with models like Llama 3, Qwen2, and many other LLMs.
Properly configuring the EOS token for fine-tuning can be challenging and may require several iterations to find an effective solution.
In this article, I present and implement three simple tests to diagnose issues with the EOS token. Using Llama 3 as an example, we will see how to address each issue effectively. These tests and solutions are applicable to any generative LLM, ensuring you can resolve similar problems regardless of the model you're working with.
The following notebook implements the tests and shows how to teach LLMs when to stop generating: