Start Here: Learn How to Adapt LLMs to Your Tasks and Budget

Fine-Tuning, Inference, Quantization, Datasets Processing and Generation, and Evaluation

Oct 02, 2025

If you’re new to LLMs or just want to know whether The Kaitchup is for you, start with the articles below. They’re categorized for a smooth learning path and cover the core techniques for adapting LLMs to your data and hardware at low cost while preserving quality.

You’ll find hands-on tutorials for fine-tuning and running models on your own GPU. All tutorials and notebooks are regularly updated to match current releases of PyTorch, Transformers, and popular models.

Each article begins with a short overview of what you’ll learn and when to use it.

Fine-Tuning LLMs

Did you know that LoRA fine-tuning can be as good as full fine-tuning? Not only can it be done on consumer GPUs for most LLMs, but it is also very simple. Here are a few articles showing you how to do it right:

Tests how LoRA holds up beyond toy setups, including large ranks, dataset sizes, and adapter placement.

LoRA at Scale on a Consumer GPU: Does It Work?

LoRA at Scale on a Consumer GPU: Does It Work?

·

May 12

Read full story

Walks through QLoRA’s recipe, 4-bit base model + frozen weights + trainable low-rank adapters, to enable SFT on a single consumer GPU. Covers setup, memory budget, and expected quality/cost trade-offs.

QLoRA: Fine-Tune a Large Language Model on Your GPU

QLoRA: Fine-Tune a Large Language Model on Your GPU

·

May 30, 2023

Read full story

Adds AutoRound to the QLoRA stack to improve quantization quality and stability. Shows how to configure it and where it reliably beats vanilla QLoRA for the same hardware budget.

QLoRA with AutoRound: Cheaper and Better LLM Fine-tuning on Your GPU

QLoRA with AutoRound: Cheaper and Better LLM Fine-tuning on Your GPU

·

August 19, 2024

Read full story

Compares SFT behavior and outcomes between Qwen3 base and reasoning variants. Details data choices, hyperparameters, and where each model type is preferable.

Fine-Tuning Qwen3: Base vs. Reasoning Models

Fine-Tuning Qwen3: Base vs. Reasoning Models

·

May 8

Read full story

Local Inference

Compares serving stacks across throughput, latency, model coverage, and operational complexity. Provides guidance on when to pick vLLM (server inference, scaling) vs. Ollama (local/dev, simple deployment).

vLLM vs Ollama: How They Differ and When To Use Them

vLLM vs Ollama: How They Differ and When To Use Them

·

Jul 7

Read full story

Shows how to host several adapters side-by-side on one base model with vLLM. Explains routing, template configuration, and pitfalls when swapping adapters at request time.

Serve Multiple LoRA Adapters with vLLM and Custom Chat Templates

Serve Multiple LoRA Adapters with vLLM and Custom Chat Templates

·

Sep 23

Read full story

Step-by-step deployment of LoRA adapters using Ollama’s model files and CLI.

Deploy Your Fine-Tuned LoRA Adapters with Ollama

Deploy Your Fine-Tuned LoRA Adapters with Ollama

·

December 30, 2024

Read full story

Practical tuning guide for GGUF inference.

Get the Best from GGUF Models: Optimize Your Inference Hyperparameters

Get the Best from GGUF Models: Optimize Your Inference Hyperparameters

·

Jun 23

Read full story

Quantization: Compress LLMs to Run the Largest Models on Your Computer

Explains how to quantize to GGUF with imatrix/K-quants. Includes commands, expected speed/accuracy, and when lower bit-widths are acceptable.

GGUF Quantization with Imatrix and K-Quantization to Run LLMs on Your CPU

GGUF Quantization with Imatrix and K-Quantization to Run LLMs on Your CPU

·

September 9, 2024

Read full story

Benchmarks Qwen3 across low-bit settings, highlighting where accuracy holds and where it drops. Offers configuration tips for stable 4-bit and experimental 2-bit runs.

How Well Does Qwen3 Handle 4-bit and 2-bit Quantization?

How Well Does Qwen3 Handle 4-bit and 2-bit Quantization?

·

May 1

Read full story

Summarizes NVFP4’s kernel/format advantages on Blackwell-class GPUs. Presents throughput gains and parity targets versus existing 4-bit formats.

NVFP4: Same Accuracy with 2.3x Higher Throughput for 4-Bit LLMs

NVFP4: Same Accuracy with 2.3x Higher Throughput for 4-Bit LLMs

·

Aug 25

Read full story

Datasets

Describes a controlled pipeline to generate persona-driven dialogues for SFT.

Generate Synthetic Data from Personas to Train AI Chatbots

Generate Synthetic Data from Personas to Train AI Chatbots

·

October 10, 2024

Read full story

Uses GLM-Z1 to generate chain-of-thought style data with budget controls. Details sampling settings, verification passes, and dataset assembly for reasoning tasks.

How to Create Reasoning Datasets with GLM-Z1 at Low Cost

How to Create Reasoning Datasets with GLM-Z1 at Low Cost

·

Apr 28

Read full story

Preference Optimization and Reinforcement Learning (RL)

Compares full-parameter DPO with LoRA-based DPO on cost, stability, and final preference metrics. Shows where LoRA matches full training and where it needs rank/LR adjustments.

DPO Full Training vs. LoRA: How Good is LoRA for DPO Training?

DPO Full Training vs. LoRA: How Good is LoRA for DPO Training?

·

November 18, 2024

Read full story

Introduces GRPO, its objective, and differences from PPO-style RLHF. Provides a runnable setup and guidance on rewards, sampling, and stability.

GRPO: Train LLMs with DeepSeek-R1's Reinforcement Learning Method

GRPO: Train LLMs with DeepSeek-R1's Reinforcement Learning Method

·

Feb 10

Read full story

Evaluates GSPO against GRPO, especially on MoE architectures. Focuses on stability, efficiency, and why per-expert dynamics change the training recipe.

GSPO vs GRPO: Reinforcement Learning for MoE Models

GSPO vs GRPO: Reinforcement Learning for MoE Models

·

Aug 4

Read full story

Discussion about this post

No posts

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts