The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
DeepSeek-R1: Reinforcement Learning with Simple and Verifiable Rewards

DeepSeek-R1: Reinforcement Learning with Simple and Verifiable Rewards

Qwen2.5 and Llama 3.x are good students

Benjamin Marie's avatar
Benjamin Marie
Jan 22, 2025
∙ Paid
16

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
DeepSeek-R1: Reinforcement Learning with Simple and Verifiable Rewards
4
2
Share
Generated with ChatGPT

In a previous article, I wrote that we would see smaller language models (LLMs) trained by DeepSeek-V3 “in the coming months.” I was wrong; it only took a few weeks.

DeepSeek-V3: Understanding and Running the Best Open LLM Locally

DeepSeek-V3: Understanding and Running the Best Open LLM Locally

Benjamin Marie
·
Jan 6
Read full story

DeepSeek AI rapidly post-trained DeepSeek-V3 (the base version) with a straightforward reinforcement learning (RL) pipeline to create a new model called DeepSeek-R1. This model now achieves state-of-the-art results across various benchmarks, outperforming even commercial models like GPT-4o.

With its massive 685 billion parameters, running your own copy of R1 remains prohibitively expensive. However, DeepSeek AI offers an affordable API for accessing the model and they also released distilled R1 models based on Llama 3.1/3.3 and Qwen2.5. The resulting distilled models are very impressive and capable of running on consumer hardware.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, we will explore the simple RL pipeline used to turn DeepSeek-V3 into R1 and review the distillation process used to train Qwen2.5 and Llama 3 models. I also quantized some of the released models to 4-bit precision. Since they are based on Qwen2.5 and Llama 3, these models can run with most inference frameworks. Additionally, we’ll check their reasoning capabilities and output quality.

The following notebook shows how I quantized the models and ran them with Transformers on a single GPU. It also compares outputs from the original Llama 3.1/Qwen2.5 before and after training by R1.

Get the notebook (#138)

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share