The Kaitchup – AI on a Budget
Subscribe
Sign in
Home
Notes
Start Here
AI Notebooks
The Kaitchup's Book
Weekly Kaitchup
Tutorials
The Kaitchup Index
Archive
About
Tutorials
Latest
Top
Discussions
Accelerate Models with Quantization: Recipes for NVFP4, GPTQ, AWQ, SmoothQuant, AutoRound, and FP8
Focus on 4-bit and 8-bit quantization + vLLM benchmarking with accuracy and inference throughput
Nov 24
•
Benjamin Marie
8
1
Unsloth's Quantization-Aware Training (QAT) vs Post-Training Quantization (PTQ) for Small Models
Can a tiny LLM stay accurate under quantization thanks to QAT?
Nov 10
•
Benjamin Marie
9
2
Advanced LoRA Fine-Tuning: How to Pick LoRA, QLoRA, DoRA, PiSSA, OLoRA, EVA, and LoftQ for LLMs
A practical guide to parameter-efficient LLM adaptation on 16-bit and 4-bit models
Nov 3
•
Benjamin Marie
13
3
1
Generate Better Synthetic Datasets with a "User" LLM
User LLM + Qwen3 to generate fully synthetic dialogues
Oct 27
•
Benjamin Marie
10
1
Qwen3-VL Fine-Tuning on Your Computer
Model review, GPU requirements, and code explained step by step
Oct 20
•
Benjamin Marie
9
Choosing a GGUF Model: K-Quants, I-Quants, and Legacy Formats
Reviewing the differences between each type and their impact on accuracy, throughput, and memory.
Oct 13
•
Benjamin Marie
7
Why Increasing Batch Size Doesn’t Always Speed Up Training
5 most common issues that decreases the batch training efficiency
Oct 7
•
Benjamin Marie
9
1
Serve Multiple LoRA Adapters with vLLM and Custom Chat Templates
Swap adapters per request, reuse your chat template, and run offline or via an OpenAI-compatible server.
Sep 23
•
Benjamin Marie
7
1
DenseMixer: Smarter MoE Routing That Doesn’t Break LoRA and QLoRA
Better MoE training for a slightly higher cost
Sep 8
•
Benjamin Marie
5
2
Gemma 3 270M: Can Tiny Models Learn New Tasks?
A case study with machine translation
Sep 1
•
Benjamin Marie
20
10
4
NVFP4: Same Accuracy with 2.3x Higher Throughput for 4-Bit LLMs
How to quantize LLMs with NVFP4
Aug 25
•
Benjamin Marie
9
6
1
How to Run Unsloth on Multi-GPU Setups: Data-Parallel or Model-Parallel
Step-by-step fixes for running Unsloth across GPUs
Aug 11
•
Benjamin Marie
6
3
1
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts