The Kaitchup – AI on a Budget | Benjamin Marie | Substack

Gemma 3n: Fine-Tuning, Inference, and Submodel Extraction

Running Gemma 3n with vLLM and fine-tuning with TRL

READ THE LATEST

The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Weekly tutorials, tips, and news on fine-tuning, running, and serving large language models on your computer. The Kaitchup also publishes two new AI notebooks every week.

Recent posts

vLLM vs Ollama: How They Differ and When To Use Them

With Examples of Offline and Online Inference

Jul 7 •

The Weekly Kaitchup #99

IFBench - ERNIE 4.5 - Gemma 3n

Jul 4 •

The Weekly Kaitchup #98

Survey - Mistral 3.2 - And More News

Jun 27 •

Get the Best from GGUF Models: Optimize Your Inference Hyperparameters

The default hyperparameters are suboptimal for quantized models

Jun 23 •

The Weekly Kaitchup #97

Survey - Axolotl + LLM Compressor

Jun 20 •

RAG with Qwen3 Embedding and Qwen3 Reranker

How to use embedding and reranker models to efficiently retrieve only the most relevant chunks or documents given a user query

Jun 19 •

Top posts

Multimodal RAG with ColPali and Qwen2-VL on Your Computer

Sep 16, 2024 • Benjamin Marie

RAG with Qwen3 Embedding and Qwen3 Reranker

Jun 19 • Benjamin Marie

GRPO: Train LLMs with DeepSeek-R1's Reinforcement Learning Method

Feb 10 • Benjamin Marie

LoRA Adapters: When a Naive Merge Leads to Poor Performance

Sep 7, 2023 • Benjamin Marie

Combine Multiple LoRA Adapters for Llama 2

Nov 27, 2023 • Benjamin Marie

Recommendations

Why Try AI

Daniel Nest

Generative AI Publication

Generative AI Publication

Jim Clyde Monge

The Poor GPU Guy's Substack

The Poor GPU Guy's Substack

Fabio

AI Horizon Forecast

AI Horizon Forecast

Nikos Kafritsas

AI Disruption

Meng Li

150+ AI Notebooks Now + 2 Each Week:

Tutorials

Gemma 3n: Fine-Tuning, Inference, and Submodel Extraction

Running Gemma 3n with vLLM and fine-tuning with TRL

Jun 30 •

RAG with Qwen3 Embedding and Qwen3 Reranker

How to use embedding and reranker models to efficiently retrieve only the most relevant chunks or documents given a user query

Jun 19 •

RTX 6000 Pro vs H100 & A100: Best Single-GPU Choice for Fast, Low-Cost LLM Fine-Tuning

Faster, cheaper single-GPU training

Jun 16 •

Fine-Tuning 2-Bit Qwen3 Models on Your Computer

Code and best practices

Jun 9 •

Qwulu 3: Fine-Tuning Qwen3 Base with LoRA and TULU 3's Supervised Fine-Tuning Recipe

Can a supervised fine-tuning recipe that works effectively on Llama 3.1 be applied directly to Qwen3?

Jun 5 •

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts