Recent research shows that enhancing the reasoning capabilities of large language models (LLMs) can be surprisingly affordable. Works like LIMO and s1 demonstrate that fine-tuning on a small but well-curated dataset can be enough to outperform GPT-4 in tasks requiring advanced reasoning.
In this article, we’ll train a 7B parameter model to reason using just 1,000 supervised fine-tuning samples—without reinforcement learning. We’ll apply s1’s budget-forcing technique at inference time, encouraging the model to "think" more before generating answers.
While previous studies relied on full fine-tuning, which requires multiple high-end GPUs when working with long sequences, we will take a more cost-effective approach. We’ll leverage LoRA fine-tuning, significantly reducing computational costs while unlocking strong reasoning capabilities.
Check out the notebook below for a step-by-step guide on fine-tuning LLMs with LoRA on the s1 dataset, and then using vLLM for inference with an adapter and budget forcing.