The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
How Well Does Qwen3 Handle 4-bit and 2-bit Quantization?

How Well Does Qwen3 Handle 4-bit and 2-bit Quantization?

Let's review Qwen3 and check which one you should use

Benjamin Marie's avatar
Benjamin Marie
May 01, 2025
∙ Paid
10

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
How Well Does Qwen3 Handle 4-bit and 2-bit Quantization?
1
Share
Image generated with ChatGPT

Qwen3 models have finally arrived, and they don’t disappoint!

Despite their compact sizes, they perform remarkably well across benchmarks. The 14B and 32B models are very promising, and they are great for consumer-grade hardware. But perhaps the most intriguing is Qwen3-30B-A3B: a 30-billion parameter model with only 3 billion active parameters at inference. This MoE design makes it very lightweight, a quantized version can fit comfortably on a 24 GB GPU, and run efficiently when paired with GPU-friendly formats like GPTQ+Marlin.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, we’ll explore how well the Qwen3 models handle quantization, and the short answer is: surprisingly well. These models are particularly quantization-friendly, with even 2-bit versions showing strong performance. I’ll walk through the quantization process, share evaluation results, and demonstrate how to run the models efficiently using vLLM, both with and without the reasoning mode enabled.

Since this is my first deep dive into the Qwen3 series, I’ll also briefly introduce the models.
You’ll find a companion notebook below that walks through quantization, evaluation, and inference, with and without reasoning:

Get the notebook (#161)

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share