The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Qwen3-30B-A3B vs Qwen3-32B: Is the MoE Model Really Worth It?
Copy link
Facebook
Email
Notes
More

Qwen3-30B-A3B vs Qwen3-32B: Is the MoE Model Really Worth It?

Qwen3 MoE is a good choice, but don't quantize it

Benjamin Marie's avatar
Benjamin Marie
May 26, 2025
∙ Paid
3

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Qwen3-30B-A3B vs Qwen3-32B: Is the MoE Model Really Worth It?
Copy link
Facebook
Email
Notes
More
4
Share
Image generated with ChatGPT

In previous articles, we saw how to quantize and fine-tune Qwen3 models.

How Well Does Qwen3 Handle 4-bit and 2-bit Quantization?

How Well Does Qwen3 Handle 4-bit and 2-bit Quantization?

Benjamin Marie
·
May 1
Read full story
Fine-Tuning Qwen3: Base vs. Reasoning Models

Fine-Tuning Qwen3: Base vs. Reasoning Models

Benjamin Marie
·
May 8
Read full story

Qwen3 is available in a range of sizes, from 0.6B to 235B parameters, including two Mixture-of-Experts (MoE) variants. Among these, Qwen3-32B and Qwen3-30B-A3B stand out as models of similar size. The latter is an MoE model that activates only 3B parameters during inference, making it significantly faster than the dense Qwen3-32B, though with a slight trade-off in accuracy on most tasks.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

But how significant is this accuracy gap? Should Qwen3-30B-A3B be your default choice for faster inference? And what happens to this performance difference after quantization? While dense models like Qwen3-32B tend to quantize well, MoE models, especially those with very small expert networks like Qwen3-30B-A3B, may be significantly more challenging to quantize.

In this article, we’ll answer these questions. We’ll start by explaining how Qwen3-30B-A3B works. Then, we’ll compare its performance with Qwen3-32B before and after 2-bit and 4-bit quantization. Then, we will look into the actual speedup delivered by the MoE architecture, and how it holds after quantization.

You can find all evaluation and quantization code in the accompanying notebook:

Get the notebook (#166)

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More