The Kaitchup Index: A Leaderboard for…

Jul 16

Comparing formats like GGUF, GPTQ, and AWQ, with different bitwidths

2 Comments

Awesome; this is Perfect! More than a year ago we started with your guidance for local hardware configs. Your work has become foundational to our knowledgebase deploying LLMs. Just browsed your index and in total agreement because the models we have selected for app dev as best performing for our requirements are all on your index!! Definitely agree that google/gemma-3-27b-it-qat-q4_0-gguf is a good one. And chose the unsloth quantized versions as credible for fine-tuning, so seeing on your list affirms our decision! Credibility behind the quantizing is extremely important, including protecting IP locally by not injecting security concerns i.e. making unexpected outbound or telemetry connections. a “Quantization Fidelity” metric will be valuable.

Expand full comment

Reply (1)

Benjamin Marie

Thank you! I was actually surprised by how good is the qat version of gemma 3. I was very suspicious at first because Google didn't publish any official benchmark results when they released it. But I confirmed that this is a very good model and that qat techniques are still useful.

Expand full comment

The Kaitchup – AI on a Budget

The Kaitchup Index: A Leaderboard for…