The Kaitchup – AI on a Budget
Subscribe
Sign in
Home
Notes
AI Notebooks
The Kaitchup Pro
The Kaitchup's Book
AI Toolboxes
Tutorials
Models
Archive
About
Tutorials
Latest
Top
Discussions
Mistral Small 3: An Excellent 24B-Parameter Wide-Shallow LLM
Fine-tuning, quantization, and evaluation
Feb 17
•
Benjamin Marie
5
Share this post
The Kaitchup – AI on a Budget
Mistral Small 3: An Excellent 24B-Parameter Wide-Shallow LLM
Copy link
Facebook
Email
Notes
More
1
"Thinking" LLMs with Simple Fine-tuning and Budget Forcing
How to activate "reasoning" in LLMs
Feb 13
•
Benjamin Marie
10
Share this post
The Kaitchup – AI on a Budget
"Thinking" LLMs with Simple Fine-tuning and Budget Forcing
Copy link
Facebook
Email
Notes
More
5
GRPO: Train LLMs with DeepSeek-R1's Reinforcement Learning Method
With a single consumer GPU!
Feb 10
•
Benjamin Marie
14
Share this post
The Kaitchup – AI on a Budget
GRPO: Train LLMs with DeepSeek-R1's Reinforcement Learning Method
Copy link
Facebook
Email
Notes
More
5
Estimating Memory Usage for LLMs During Inference (V2)
KV cache, GQA, FlashAttention, activations, batching...
Jan 20
•
Benjamin Marie
7
Share this post
The Kaitchup – AI on a Budget
Estimating Memory Usage for LLMs During Inference (V2)
Copy link
Facebook
Email
Notes
More
9
Local Agentic AI with smolagents and Qwen2.5 Coder
When to use it and when does it fail
Jan 13
•
Benjamin Marie
10
Share this post
The Kaitchup – AI on a Budget
Local Agentic AI with smolagents and Qwen2.5 Coder
Copy link
Facebook
Email
Notes
More
6
Deploy Your Fine-Tuned LoRA Adapters with Ollama
Probably the easiest way to run adapters offline and online
Dec 30, 2024
•
Benjamin Marie
8
Share this post
The Kaitchup – AI on a Budget
Deploy Your Fine-Tuned LoRA Adapters with Ollama
Copy link
Facebook
Email
Notes
More
8
Fast and Memory-Efficient Text-to-SQL with Qwen2.5 Coder 32B Instruct on Your GPU
Quantization and prompting with vLLM
Dec 23, 2024
•
Benjamin Marie
12
Share this post
The Kaitchup – AI on a Budget
Fast and Memory-Efficient Text-to-SQL with Qwen2.5 Coder 32B Instruct on Your GPU
Copy link
Facebook
Email
Notes
More
2
Schedule-Free Optimizer: Does It Work for LLMs?
Experiments with Llama 3.2: schedule-free vs. standard AdamW
Dec 16, 2024
•
Benjamin Marie
7
Share this post
The Kaitchup – AI on a Budget
Schedule-Free Optimizer: Does It Work for LLMs?
Copy link
Facebook
Email
Notes
More
Fine-Tuning Llama 3.3 70B with a Single GPU
And how to fix a poorly accurate 2-bit model
Dec 12, 2024
•
Benjamin Marie
14
Share this post
The Kaitchup – AI on a Budget
Fine-Tuning Llama 3.3 70B with a Single GPU
Copy link
Facebook
Email
Notes
More
Quantize and Run Llama 3.3 70B Instruct on Your GPU
4-bit👍, 3-bit👎, and 2-bit👎quantization
Dec 9, 2024
•
Benjamin Marie
11
Share this post
The Kaitchup – AI on a Budget
Quantize and Run Llama 3.3 70B Instruct on Your GPU
Copy link
Facebook
Email
Notes
More
1
LLM Alignment: Searching for Optimal ORPO Hyperparameters
Higher learning rate and beta
Dec 2, 2024
•
Benjamin Marie
7
Share this post
The Kaitchup – AI on a Budget
LLM Alignment: Searching for Optimal ORPO Hyperparameters
Copy link
Facebook
Email
Notes
More
2
The Recipe for Extremely Accurate and Cheap Quantization of 70B+ LLMs
Cost and accuracy for quantizing large models to 4-bit and 2-bit
Nov 25, 2024
•
Benjamin Marie
10
Share this post
The Kaitchup – AI on a Budget
The Recipe for Extremely Accurate and Cheap Quantization of 70B+ LLMs
Copy link
Facebook
Email
Notes
More
3
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts