The release of DeepSeek-R1 sparked significant interest in large language models (LLMs) with advanced reasoning capabilities. This new category of models is now commonly referred to as “Reasoning Language Models” (RLMs). These models begin by generating a “thinking” phase, essentially a self-prompting mechanism where the model decomposes and analyzes the user’s input before delivering a final, more accurate response than a standard LLM would produce.
Today, many open RLMs are available, with most being distilled versions of DeepSeek-R1, that is, they have been trained on outputs generated by DeepSeek-R1 itself.
GLM-4 models have recently emerged as strong alternatives. Despite their relatively small size (only 32 billion parameters), they achieve impressive results across benchmarks.
In this article, we’ll explore how to effectively use these models and how much time and cost they require to produce reasoning traces at a large scale. The resulting dataset can be used to fine-tune other models to think like GLM-4.
The following notebook shows how to use GLM-4 and generate a dataset with it. I also made a 4-bit version to reduce the inference cost: