Fine-Tuning Your LLM to "Think" Like DeepSeek R1, on Your Computer

Experiments with SFT, Llama 3.2 3B, and Training Data Generated by DeepSeek R1

Feb 03, 2025

∙ Paid

DeepSeek-R1 is a massive model that requires multiple high-end GPUs to run locally, making it impractical for most users. However, as we explored in a previous article, DeepSeek AI has also released distilled versions that are significantly smaller and based on more widely adopted models like Llama 3.1 and Qwen2.5.

DeepSeek-R1: Reinforcement Learning with Simple and Verifiable Rewards

Benjamin Marie

Jan 22

Read full story

The AI community is actively working on replicating DeepSeek-R1, and we are now seeing datasets generated by R1 that could be leveraged to train other models to "think" in a similar way.

With these datasets, fine-tuning existing LLMs to emulate R1’s reasoning becomes easier. In this article, we will explore how to fine-tune an adapter to enhance an LLM’s capabilities using community-generated R1 datasets. We will use supervised fine-tuning, a more accessible and cost-effective approach compared to the reinforcement learning method (GRPO) used by DeepSeek AI. This process is feasible on consumer-grade hardware, requiring only a 24GB GPU.

For demonstration purposes, we will fine-tune Llama 3.2 3B, though the same approach can be applied to other models like Qwen2.5 or Gemma 2. The fine-tuning follows a standard LoRA method but incorporates special considerations for embeddings and tokenization to better handle the “thinking” part.

The following notebook provides a step-by-step guide on fine-tuning Llama 3.2 3B to "think" like R1.

Get the notebook (#141)

The Kaitchup – AI on a Budget

Fine-Tuning Your LLM to "Think" Like DeepSeek R1, on Your Computer

Experiments with SFT, Llama 3.2 3B, and Training Data Generated by DeepSeek R1

DeepSeek-R1: Reinforcement Learning with Simple and Verifiable Rewards

Instruction Datasets Generated by R1

This post is for paid subscribers