"Thinking" LLMs with Simple Fine-tuning and Budget Forcing

How to activate "reasoning" in LLMs

Feb 13, 2025

∙ Paid

Recent research shows that enhancing the reasoning capabilities of large language models (LLMs) can be surprisingly affordable. Works like LIMO and s1 demonstrate that fine-tuning on a small but well-curated dataset can be enough to outperform GPT-4 in tasks requiring advanced reasoning.

In this article, we’ll train a 7B parameter model to reason using just 1,000 supervised fine-tuning samples—without reinforcement learning. We’ll apply s1’s budget-forcing technique at inference time, encouraging the model to "think" more before generating answers.

While previous studies relied on full fine-tuning, which requires multiple high-end GPUs when working with long sequences, we will take a more cost-effective approach. We’ll leverage LoRA fine-tuning, significantly reducing computational costs while unlocking strong reasoning capabilities.

Check out the notebook below for a step-by-step guide on fine-tuning LLMs with LoRA on the s1 dataset, and then using vLLM for inference with an adapter and budget forcing.

Get the notebook (#144)

The Kaitchup – AI on a Budget

"Thinking" LLMs with Simple Fine-tuning and Budget Forcing

How to activate "reasoning" in LLMs

This post is for paid subscribers