The Kaitchup – AI on a Budget
Subscribe
Sign in
Home
Notes
AI Notebooks
The Kaitchup Pro
The Kaitchup's Book
AI Toolboxes
Tutorials
Archive
About
Llama 4 with 10M Tokens: How Much Does It Cost and Is It Worth It?
A KV Cache Story
Apr 8
•
Benjamin Marie
9
Share this post
The Kaitchup – AI on a Budget
Llama 4 with 10M Tokens: How Much Does It Cost and Is It Worth It?
Copy link
Facebook
Email
Notes
More
Fine-Tuning Gemma 3 on Your Computer with LoRA and QLoRA (+model review)
The efficiency of global-local attention with QK-Norm and no more soft-capping
Mar 13
•
Benjamin Marie
15
Share this post
The Kaitchup – AI on a Budget
Fine-Tuning Gemma 3 on Your Computer with LoRA and QLoRA (+model review)
Copy link
Facebook
Email
Notes
More
7
Make LLMs Faster and Lighter with W8A8 Quantization
Efficient Weight and Activation Quantization with llm-compressor
23 hrs ago
•
Benjamin Marie
4
Share this post
The Kaitchup – AI on a Budget
Make LLMs Faster and Lighter with W8A8 Quantization
Copy link
Facebook
Email
Notes
More
The Weekly Kaitchup #88
Nemotron-H - 1-bit LLM
Apr 18
•
Benjamin Marie
6
Share this post
The Kaitchup – AI on a Budget
The Weekly Kaitchup #88
Copy link
Facebook
Email
Notes
More
Run Llama 3.3 70B on Your GPU with ExLlamaV3
Fast Llama 3.3 70B at 1.75 bits per weight, using only 19 GB!
Apr 17
•
Benjamin Marie
7
Share this post
The Kaitchup – AI on a Budget
Run Llama 3.3 70B on Your GPU with ExLlamaV3
Copy link
Facebook
Email
Notes
More
1
150+ AI Notebooks Now + 2 Each Week:
Subscribe
Recent posts
View all
Fast and Memory-Efficient Full Fine-Tuning with Unsloth (single-GPU)
With the best hyperparameters for a cost-effective full fine-tuning
Apr 14
•
Benjamin Marie
7
Share this post
The Kaitchup – AI on a Budget
Fast and Memory-Efficient Full Fine-Tuning with Unsloth (single-GPU)
Copy link
Facebook
Email
Notes
More
4
The Weekly Kaitchup #87
Llama 4 - MoE-Quant - Nemotron Ultra
Apr 11
•
Benjamin Marie
6
Share this post
The Kaitchup – AI on a Budget
The Weekly Kaitchup #87
Copy link
Facebook
Email
Notes
More
Llama 4 Scout, Maverick, and Behemoth: MoE, VLMs, and Very Long Context
How to burn GPUs
Apr 5
•
Benjamin Marie
13
Share this post
The Kaitchup – AI on a Budget
Llama 4 Scout, Maverick, and Behemoth: MoE, VLMs, and Very Long Context
Copy link
Facebook
Email
Notes
More
The Weekly Kaitchup #86
Gemma 3 QAT - Dream 7B
Apr 4
•
Benjamin Marie
4
Share this post
The Kaitchup – AI on a Budget
The Weekly Kaitchup #86
Copy link
Facebook
Email
Notes
More
LoRA Trainable Tokens: Save Memory, Improve Accuracy for Your Domain
How to teach an LLM to use your new tokens without fully retraining the token embeddings
Apr 3
•
Benjamin Marie
6
Share this post
The Kaitchup – AI on a Budget
LoRA Trainable Tokens: Save Memory, Improve Accuracy for Your Domain
Copy link
Facebook
Email
Notes
More
vLLM v1 Engine: How Faster Is It for RTX and Mid-Range GPUs?
Better design and faster with H100s but what about the smaller GPUs?
Mar 31
•
Benjamin Marie
8
Share this post
The Kaitchup – AI on a Budget
vLLM v1 Engine: How Faster Is It for RTX and Mid-Range GPUs?
Copy link
Facebook
Email
Notes
More
2
The Weekly Kaitchup #85
Qwen2.5-Omni-7B - DPO vs. PPO - DeepSeek-V3 Update - GPTQ Models
Mar 28
•
Benjamin Marie
7
Share this post
The Kaitchup – AI on a Budget
The Weekly Kaitchup #85
Copy link
Facebook
Email
Notes
More
Fine-Tuning and Inference with an RTX 5090
Is the RTX 5090 noticeably faster than the previous generation for LLMs?
Mar 24
•
Benjamin Marie
9
Share this post
The Kaitchup – AI on a Budget
Fine-Tuning and Inference with an RTX 5090
Copy link
Facebook
Email
Notes
More
See all
Share this publication
kaitchup
The Kaitchup – AI on a Budget
Copy link
Facebook
Email
Notes
More
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts