autoround

Fine-Tuning 2-Bit Qwen3 Models on Your Computer

Code and best practices

Jun 9 •

Accurate 2-bit Quantization: Run Massive LLMs on a Single Consumer GPU

70B models for consumer hardware

May 5 •

How Well Does Qwen3 Handle 4-bit and 2-bit Quantization?

Let's review Qwen3 and check which one you should use

May 1 •

Mistral Small 3: An Excellent 24B-Parameter Wide-Shallow LLM

Fine-tuning, quantization, and evaluation

Feb 17 •

Quantize and Run Llama 3.3 70B Instruct on Your GPU

4-bit👍, 3-bit👎, and 2-bit👎quantization

Dec 9, 2024 •

The Recipe for Extremely Accurate and Cheap Quantization of 70B+ LLMs

Cost and accuracy for quantizing large models to 4-bit and 2-bit

Nov 25, 2024 •

The Impact of the Calibration Dataset for AutoRound and AWQ Quantization

Should you choose the calibration dataset?

Oct 31, 2024 •

Mistral-NeMo: 4.1x Smaller with Quantized Minitron

How Pruning, Knowledge Distillation, and 4-Bit Quantization Can Make Advanced AI Models More Accessible and Cost-Effective

Aug 26, 2024 •

Fine-tuning Phi-3.5 MoE and Mini on Your Computer

With code to quantize the models with bitsandbytes and AutoRound

Aug 22, 2024 •

QLoRA with AutoRound: Cheaper and Better LLM Fine-tuning on Your GPU

Bitsandbytes is not your only option

Aug 19, 2024 •

Intel AutoRound: Accurate Low-bit Quantization for LLMs

Between quantization-aware training and post-training quantization

Jun 27, 2024 •

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts