GLM-5 Memory Requirements Explained: MLA + DeepSeek Sparse Attention (DSA)
How GLM-5 fits 200K context without terabytes of KV cache, and what GPUs you need.
GLM 4.7 was released only two months ago, but Zhipu AI (Z.ai) has already followed up with a stronger successor: GLM 5.
One of the headline changes is the introduction of DeepSeek Sparse Attention (DSA), layered on top of Multi-Head Latent Attention (MLA), to further speed up inference with long context.
GLM 5 is also substantially larger: from 355B parameters for GLM 4.7 to 744B parameters for GLM 5.
In this article, I’ll use this release as an opportunity to explain what DSA brings to the table and why it may become the default going forward. Then we’ll look at what’s new in GLM 5, what it takes to run in practice, and what the hardware requirements look like. Finally, we’ll break down memory consumption and compare the available quantized variants.
If you want a refresher on how MLA works, I covered it here:


