The Kaitchup – AI on a Budget
Subscribe
Sign in
Home
Notes
AI Notebooks
The Kaitchup's Book
Weekly Kaitchup
Tutorials
Archive
About
Latest
Top
Discussions
New DiffusionGemma and MoQ GGUFs for Gemma 4 12B and LFM2.5 8B A1B
The Weekly Kaitchup #146
Jun 12
•
Benjamin Marie
8
1
1
Make Your Own Optimized GGUFs with AutoRound
Build optimized GGUF models for llama.cpp and LM Studio using AutoScheme, custom bit-widths, and layer protection.
Jun 10
•
Benjamin Marie
3
MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better
The Weekly Kaitchup #145
Jun 5
•
Benjamin Marie
8
DFlash vs MTP: Qwen3.6 Speculative Decoding Benchmarks with vLLM and llama.cpp
Up to 4x faster inference -- Benchmarking the speed on various task on coding, chat, and math tasks, with optimal hyperparameters
Jun 2
•
Benjamin Marie
8
1
May 2026
Qwen3.5 9B MoQ: Inside a Strong 3.6-bit GGUF
The Weekly Kaitchup #144
May 29
•
Benjamin Marie
12
Reasoning Budgets vs. Structured CoT: Controlling LLM Thinking Tokens
Evaluations of BNF grammars and reasoning budgets with Qwen3.6 27B
May 25
•
Benjamin Marie
10
6
2
Gated DeltaNet-2: Better Memory Editing for Linear Attention
The Weekly Kaitchup #143
May 22
•
Benjamin Marie
4
Train and Run DFlash Speculative Decoding
A simple method to make your local model much faster
May 18
•
Benjamin Marie
11
1
SlimQwen Compression, Elastic Models, and Aurora Optimization
The Weekly Kaitchup #142
May 15
•
Benjamin Marie
9
1
Qwen3.6 27B Quantization: FP8 vs INT4 vs NVFP4
Testing accuracy, latency, memory usage, and MTP efficiency after quantization.
May 12
•
Benjamin Marie
15
2
2
MTP Layers for Gemma 4 and My Projects in Progress
The Weekly Kaitchup #141
May 8
•
Benjamin Marie
10
7
1
Qwen3.6 27B vs Qwen3.5 27B vs Gemma 4 31B: Accuracy, Latency, Memory, and Token Efficiency Tested
Qwen3.6 improves on Qwen3.5, but Gemma 4 remains surprisingly competitive.
May 5
•
Benjamin Marie
23
2
3
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts