My LLM Can't Stop Generating, How to Fix It?

Detect and fix issues with the EOS token

Jun 10, 2024

∙ Paid

A cartoon-style illustration of a llama repeating the phrase 'I can't stop' endlessly. The llama should have a humorous and slightly frantic expression, with speech bubbles or text surrounding it to show the repeated phrase. The background can be simple to keep the focus on the llama and its repetitive speech. — Generated with DALL-E

After fine-tuning a large language model (LLM) on your data, you might observe that the model doesn’t know when to stop generating tokens. Although the first tokens answer your prompt appropriately, the model continues to produce irrelevant tokens until it reaches the maximum sequence length.

This is a very common issue.

This issue arises when the end-of-sequence (EOS) token is not properly configured. This can occur with models like Llama 3, Qwen2, and many other LLMs.

Properly configuring the EOS token for fine-tuning can be challenging and may require several iterations to find an effective solution.

In this article, I present and implement three simple tests to diagnose issues with the EOS token. Using Llama 3 as an example, we will see how to address each issue effectively. These tests and solutions are applicable to any generative LLM, ensuring you can resolve similar problems regardless of the model you're working with.

The following notebook implements the tests and shows how to teach LLMs when to stop generating:

Get the notebook (#77)

The Kaitchup – AI on a Budget

My LLM Can't Stop Generating, How to Fix It?

Detect and fix issues with the EOS token

This post is for paid subscribers