Qwen3.5 9B, 4B, 2B & 0.8B: GPU Requirements, VRAM Usage & KV Cache Breakdown (262K Context)
How much memory do Qwen3.5 small models really need?
After the large 397B models and the medium models, Qwen3.5 released the small variants: 9B, 4B, 2B, and 0.8.
Unlike last week’s medium checkpoints, these are finally in the “actually runnable” category on consumer hardware in full precision, i.e., without quantization.
In this article, I’ll focus on their deployment-time memory footprint. Again, thanks to the use of Gated Deltanet (a form of “linear attention”) in 75% of the layers, these models have a small KV cache, keeping memory usage low even at long context lengths.



