The Kaitchup – AI on a Budget
Subscribe
Sign in
Home
Notes
Chat
Start Here
AI Notebooks
The Kaitchup's Book
Weekly Kaitchup
Tutorials
The Kaitchup Index
Archive
About
Tutorials
Latest
Top
Discussions
Serving ExLlamaV3 Models with tabbyAPI: Accuracy, Speed, and Recommendations
With comparisons against AutoRound and GGUF models served with vLLM
Jan 19
•
Benjamin Marie
4
4-bit GLM-4.7 (358B) on a Single NVIDIA B300 with vLLM: AWQ vs NVFP4 vs INT4
Just give it enough tokens to think
Jan 12
•
Benjamin Marie
6
2
Eagle 3 Speculators: When To Use Them?
Easier and faster speculative decoding, if you are in the right settings
Dec 9, 2025
•
Benjamin Marie
3
Accelerate Models with Quantization: Recipes for NVFP4, GPTQ, AWQ, SmoothQuant, AutoRound, and FP8
Focus on 4-bit and 8-bit quantization + vLLM benchmarking with accuracy and inference throughput
Nov 24, 2025
•
Benjamin Marie
11
9
1
Unsloth's Quantization-Aware Training (QAT) vs Post-Training Quantization (PTQ) for Small Models
Can a tiny LLM stay accurate under quantization thanks to QAT?
Nov 10, 2025
•
Benjamin Marie
9
2
Advanced LoRA Fine-Tuning: How to Pick LoRA, QLoRA, DoRA, PiSSA, OLoRA, EVA, and LoftQ for LLMs
A practical guide to parameter-efficient LLM adaptation on 16-bit and 4-bit models
Nov 3, 2025
•
Benjamin Marie
14
3
1
Generate Better Synthetic Datasets with a "User" LLM
User LLM + Qwen3 to generate fully synthetic dialogues
Oct 27, 2025
•
Benjamin Marie
11
1
Qwen3-VL Fine-Tuning on Your Computer
Unsloth guide + VRAM requirements
Oct 20, 2025
•
Benjamin Marie
9
Choosing a GGUF Model: K-Quants, IQ Variants, and Legacy Formats
Reviewing the differences between each type and their impact on accuracy, throughput, and memory.
Oct 13, 2025
•
Benjamin Marie
8
1
Why Increasing Batch Size Doesn’t Always Speed Up Training
5 most common issues that decreases the batch training efficiency
Oct 7, 2025
•
Benjamin Marie
9
1
Serve Multiple LoRA Adapters with vLLM and Custom Chat Templates
Swap adapters per request, reuse your chat template, and run offline or via an OpenAI-compatible server.
Sep 23, 2025
•
Benjamin Marie
8
1
DenseMixer: Smarter MoE Routing That Doesn’t Break LoRA and QLoRA
Better MoE training for a slightly higher cost
Sep 8, 2025
•
Benjamin Marie
5
2
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts