The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
bitnet.cpp: Efficient Inference with 1-Bit LLMs on your CPU
Copy link
Facebook
Email
Notes
More

bitnet.cpp: Efficient Inference with 1-Bit LLMs on your CPU

How to run "1-bit" (but 1.58-bit) LLMs made of ternary weights packed to 2-bit

Benjamin Marie's avatar
Benjamin Marie
Oct 28, 2024
∙ Paid
11

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
bitnet.cpp: Efficient Inference with 1-Bit LLMs on your CPU
Copy link
Facebook
Email
Notes
More
3
Share
Generated with Grok

BitNet is a specialized transformer architecture developed by Microsoft Research. It uses an approach where each model parameter is represented by only three values: -1, 0, and 1. This drastically reduces the memory required, as each parameter consumes just 1.58 bits instead of the standard 16 bits. Microsoft refers to these models as "1-bit LLMs."

These ternary LLMs are pre-trained from scratch to find optimal weights, using a training process specifically designed for low-precision parameters. Despite working with such limited precision, Microsoft has demonstrated that these models still achieve competitive performance compared to traditional LLMs with higher bit precision.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Today, several of these ternary LLMs are available on the Hugging Face Hub for public use, making them accessible to researchers and developers. They can run on consumer hardware.

To make these models even more practical, Microsoft released bitnet.cpp. This open-source software includes optimized kernels for efficient inference with ternary LLMs on standard CPUs. It works very similarly to llama.cpp and also uses the GGUF format.

GGUF Quantization with Imatrix and K-Quantization to Run LLMs on Your CPU

GGUF Quantization with Imatrix and K-Quantization to Run LLMs on Your CPU

Benjamin Marie
·
September 9, 2024
Read full story

In this article, we will first explore how these 1-bit LLMs work and then experiment with some of them using bitnet.cpp on a CPU.

I made a notebook showing how to use bitnet.cpp with 1-bit LLMs here:

Get the notebook (#116)

Training from Scratch "1-bit" Ternary LLMs

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More