Boost 2-Bit LLM Accuracy with EoRA

A training-free solution for extreme LLM compression

May 19, 2025

A llama with a hat labelled "EoRA", looking proud, with landscape in the background. — Generated with ChatGPT

Hi everyone,

In this article, I explore how to improve the accuracy of a quantized LLM using EoRA, a simple yet effective method proposed by NVIDIA. EoRA works by calibrating a lightweight adapter on top of a quantized LLM.

With an EoRA adapter, even 2-bit models can perform remarkably close to their original full-precision counterparts. I ran experiments on Qwen3 and Qwen2.5, and the results were impressive.

This time, I’ve published the article on Towards Data Science, you can read it here:

Boost 2-Bit LLM Accuracy with EoRA (on TDS)

This article is free.

I’ve also made a notebook that you can find here:

Get the notebook (#165)

The Kaitchup – AI on a Budget

Discussion about this post