The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

How Good is NVFP4 for Reasoning and with Small Models?

and should you care about the calibration step?

Benjamin Marie's avatar
Benjamin Marie
Sep 15, 2025
∙ Paid
4
2
1
Share

In a previous article, we saw that NVFP4 performs on par with other quantization techniques, with its main advantage being hardware-accelerated support on Blackwell GPUs, enabling more than 2x faster inference.

NVFP4: Same Accuracy with 2.3x Higher Throughput for 4-Bit LLMs

NVFP4: Same Accuracy with 2.3x Higher Throughput for 4-Bit LLMs

Benjamin Marie
·
Aug 25
Read full story

In these experiments, I evaluated NVFP4 only on Llama 3.3, a 70B-parameter model, with relatively short sequences (<2,500 tokens). NVIDIA’s own results indicate that NVFP4 also works well on very large models and reasoning tasks, performing slightly below FP8 in these configurations. That’s notable because reasoning typically produces very long sequences and, with quantized activations, errors tend to accumulate with each new generated token.

But does NVFP4 remain accurate for much smaller models and on reasoning tasks?

Quantization often causes a noticeable drop in downstream accuracy for smaller models.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, we set out to answer that question so you know whether it’s safe to use NVFP4 on small models when running on Blackwell GPUs. I first evaluated NVFP4 for all the Qwen3 models from 8B to 0.6B parameters. Then, I experimented with Qwen3-1.7B and Qwen3-4B on math-reasoning benchmarks such as MATH-500 and AIME. I also varied the calibration datasets and sequence lengths to see whether calibration choices improve accuracy.

Here is the code I used to calibrate NVFP4 with a specified minimum sequence length:

Get the notebook (#182)

NVFP4 for Small Models: Experiments with Qwen3

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture