The Kaitchup – AI on a Budget
Subscribe
Sign in
Home
Notes
AI Notebooks
The Kaitchup's Book
Weekly Kaitchup
Tutorials
The Kaitchup Index
Archive
About
Tutorials
Latest
Top
Discussions
The KV-Cache of Small MoEs: Qwen3, Qwen3.5, GLM 4.7 Flash, and Nemotron 3 Nano Compared
A memory-first look at four efficient open LLM architectures.
Mar 18
•
Benjamin Marie
21
2
Qwen3.5 Quantization: Similar Accuracy, More Thinking — Best Models and Recipes
INT4, NVFP4, and FP8 evaluations — Thinking off and on
Mar 12
•
Benjamin Marie
29
7
1
How to Deploy Your LLM in the Cloud
The simple recipe to choose your GPU and anticipate costs
Feb 23
•
Benjamin Marie
8
GLM-5 Memory Requirements Explained: MLA + DeepSeek Sparse Attention (DSA)
How GLM-5 fits 200K context without terabytes of KV cache, and what GPUs you need.
Feb 16
•
Benjamin Marie
4
2
Serving ExLlamaV3 Models with tabbyAPI: Accuracy, Speed, and Recommendations
With comparisons against AutoRound and GGUF models served with vLLM
Jan 19
•
Benjamin Marie
7
4-bit GLM-4.7 (358B) on a Single NVIDIA B300 with vLLM: AWQ vs NVFP4 vs INT4
Just give it enough tokens to think
Jan 12
•
Benjamin Marie
8
3
Eagle 3 Speculators: When To Use Them?
Easier and faster speculative decoding, if you are in the right settings
Dec 9, 2025
•
Benjamin Marie
3
Accelerate Models with Quantization: Recipes for NVFP4, GPTQ, AWQ, SmoothQuant, AutoRound, and FP8
Focus on 4-bit and 8-bit quantization + vLLM benchmarking with accuracy and inference throughput
Nov 24, 2025
•
Benjamin Marie
11
9
1
Unsloth's Quantization-Aware Training (QAT) vs Post-Training Quantization (PTQ) for Small Models
Can a tiny LLM stay accurate under quantization thanks to QAT?
Nov 10, 2025
•
Benjamin Marie
9
2
Advanced LoRA Fine-Tuning: How to Pick LoRA, QLoRA, DoRA, PiSSA, OLoRA, EVA, and LoftQ for LLMs
A practical guide to parameter-efficient LLM adaptation on 16-bit and 4-bit models
Nov 3, 2025
•
Benjamin Marie
14
3
1
Generate Better Synthetic Datasets with a "User" LLM
User LLM + Qwen3 to generate fully synthetic dialogues
Oct 27, 2025
•
Benjamin Marie
11
1
Qwen3-VL Fine-Tuning on Your Computer
Unsloth guide + VRAM requirements
Oct 20, 2025
•
Benjamin Marie
9
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts