Fine-Tuning Qwen3: Base vs. Reasoning Models

Is it reasonable to fine-tune a "reasoning" model?

May 08, 2025

∙ Paid

Qwen3 LLMs are both very capable and easy to run. Some of the models are small enough to be fine-tuned or run inference on a single GPU.

The Qwen team released two types of models: Qwen3 and Qwen3-Base. The naming is a bit different from what you might be used to. For example, with Llama models, the name without any suffix (like Llama 3.1 8B) refers to the base, pre-trained version, while Llama 3.1 8B Instruct is the post-trained one. For Qwen3, it's the opposite:

Qwen3 is the post-trained model (chat/instruction-tuned+reasoning).
Qwen3-Base is the raw pre-trained model, without alignment or instruction tuning.

So, if you want to fine-tune one of these models on your own data, which should you choose?

In a previous article, I explained why fine-tuning a post-trained (instruction-tuned) model isn't always a good idea, and why the base model is usually a better place to start. That argument holds up even more when working with models designed for reasoning.

Fine-tuning Base LLMs vs. Fine-tuning Their Instruct Version

Benjamin Marie

August 15, 2024

Read full story

In this post, I’ll fine-tune both Qwen3-14B and Qwen3-14B-Base and then compare how the resulting models behave at inference time with reasoning on and off. The fine-tuning was done using Unsloth on a single GPU. I’ll also show what kind of GPU memory you need to get it working.

The code and setup are in this notebook:

Get the notebook (#163)

The Kaitchup – AI on a Budget

Fine-Tuning Qwen3: Base vs. Reasoning Models

Is it reasonable to fine-tune a "reasoning" model?

Fine-tuning Base LLMs vs. Fine-tuning Their Instruct Version

GPU Requirements for Fine-Tuning Qwen3

This post is for paid subscribers