The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Fine-Tuning Your LLM to "Think" Like DeepSeek R1, on Your Computer

Fine-Tuning Your LLM to "Think" Like DeepSeek R1, on Your Computer

Experiments with SFT, Llama 3.2 3B, and Training Data Generated by DeepSeek R1

Benjamin Marie's avatar
Benjamin Marie
Feb 03, 2025
∙ Paid
7

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Fine-Tuning Your LLM to "Think" Like DeepSeek R1, on Your Computer
6
1
Share
Generated with ChatGPT

DeepSeek-R1 is a massive model that requires multiple high-end GPUs to run locally, making it impractical for most users. However, as we explored in a previous article, DeepSeek AI has also released distilled versions that are significantly smaller and based on more widely adopted models like Llama 3.1 and Qwen2.5.

DeepSeek-R1: Reinforcement Learning with Simple and Verifiable Rewards

DeepSeek-R1: Reinforcement Learning with Simple and Verifiable Rewards

Benjamin Marie
·
Jan 22
Read full story

The AI community is actively working on replicating DeepSeek-R1, and we are now seeing datasets generated by R1 that could be leveraged to train other models to "think" in a similar way.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

With these datasets, fine-tuning existing LLMs to emulate R1’s reasoning becomes easier. In this article, we will explore how to fine-tune an adapter to enhance an LLM’s capabilities using community-generated R1 datasets. We will use supervised fine-tuning, a more accessible and cost-effective approach compared to the reinforcement learning method (GRPO) used by DeepSeek AI. This process is feasible on consumer-grade hardware, requiring only a 24GB GPU.

For demonstration purposes, we will fine-tune Llama 3.2 3B, though the same approach can be applied to other models like Qwen2.5 or Gemma 2. The fine-tuning follows a standard LoRA method but incorporates special considerations for embeddings and tokenization to better handle the “thinking” part.

The following notebook provides a step-by-step guide on fine-tuning Llama 3.2 3B to "think" like R1.

Get the notebook (#141)

Instruction Datasets Generated by R1

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share