Make Your Own Optimized GGUFs with AutoRound
Build optimized GGUF models for llama.cpp and LM Studio using AutoScheme, custom bit-widths, and layer protection.
GGUF is the standard format for running LLMs locally with tools such as llama.cpp and LM Studio. It stores both model weights and inference metadata in a compact binary format, making GGUF models easy to download, load, and serve.
GGUF is especially popular because it supports many quantization levels. Instead of using a full BF16 or FP16 checkpoint, users can pick a smaller 2-bit to 8-bit model that better fits their RAM, VRAM, and quality requirements.
The most interesting GGUFs today increasingly rely on mixed-precision quantization. Unsloth Dynamic, or UD, helped popularize this approach by using lower precision for less sensitive tensors and higher precision for important ones.
AutoRound’s AutoScheme one of the best practical alternatives for creating your own mixed-precision GGUFs today. Given a target average bit-width and a list of candidate schemes, AutoScheme automatically chooses between GGUF quantization types, for example GGUF:Q2_K_S and GGUF:Q4_K_S, layer by layer.
This article is therefore a step toward learning how to build, control, and evaluate mixed-precision GGUF recipes. As most open-weight models already exists in some high-quality GGUF version, making your own GGUF is especially relevant if you have a fine-tuned version of a model and want to GGUF it.
I also expect MoQ-like strategies to eventually be adapted and supported by tools such as AutoRound, which could further improve the kind of recipe built in this article.
In this article, you will learn how to:
create your own GGUF model with AutoRound AutoScheme;
control the target average bit-width and candidate GGUF quantization types;
protect important layers;
compare the quality-size trade-off.
Here is a notebook you can use to make your own optimized GGUFs for Qwen3.5/3.6 models:

