The Kaitchup – AI on a Budget
Subscribe
Sign in
Home
Notes
AI Notebooks
The Kaitchup's Book
The Kaitchup Pro
Tutorials
AI Toolboxes
Archive
About
Tutorials
Latest
Top
Discussions
Boost 2-Bit LLM Accuracy with EoRA
A training-free solution for extreme LLM compression
May 19
•
Benjamin Marie
3
Share this post
The Kaitchup – AI on a Budget
Boost 2-Bit LLM Accuracy with EoRA
Copy link
Facebook
Email
Notes
More
LoRA at Scale on a Consumer GPU: Does It Work?
Reproducing TULU 3 SFT on Consumer Hardware Using LoRA and Unsloth
May 12
•
Benjamin Marie
5
Share this post
The Kaitchup – AI on a Budget
LoRA at Scale on a Consumer GPU: Does It Work?
Copy link
Facebook
Email
Notes
More
3
Fine-Tuning Qwen3: Base vs. Reasoning Models
Is it reasonable to fine-tune a "reasoning" model?
May 8
•
Benjamin Marie
10
Share this post
The Kaitchup – AI on a Budget
Fine-Tuning Qwen3: Base vs. Reasoning Models
Copy link
Facebook
Email
Notes
More
2
Accurate 2-bit Quantization: Run Massive LLMs on a Single Consumer GPU
70B models for consumer hardware
May 5
•
Benjamin Marie
8
Share this post
The Kaitchup – AI on a Budget
Accurate 2-bit Quantization: Run Massive LLMs on a Single Consumer GPU
Copy link
Facebook
Email
Notes
More
Make LLMs Faster and Lighter with W8A8 Quantization
Efficient Weight and Activation Quantization with llm-compressor
Apr 21
•
Benjamin Marie
8
Share this post
The Kaitchup – AI on a Budget
Make LLMs Faster and Lighter with W8A8 Quantization
Copy link
Facebook
Email
Notes
More
Run Llama 3.3 70B on Your GPU with ExLlamaV3
Fast Llama 3.3 70B at 1.75 bits per weight, using only 19 GB!
Apr 17
•
Benjamin Marie
7
Share this post
The Kaitchup – AI on a Budget
Run Llama 3.3 70B on Your GPU with ExLlamaV3
Copy link
Facebook
Email
Notes
More
1
Fast and Memory-Efficient Full Fine-Tuning with Unsloth (single-GPU)
With the best hyperparameters for a cost-effective full fine-tuning
Apr 14
•
Benjamin Marie
8
Share this post
The Kaitchup – AI on a Budget
Fast and Memory-Efficient Full Fine-Tuning with Unsloth (single-GPU)
Copy link
Facebook
Email
Notes
More
5
Llama 4 with 10M Tokens: How Much Does It Cost and Is It Worth It?
A KV Cache Story
Apr 8
•
Benjamin Marie
9
Share this post
The Kaitchup – AI on a Budget
Llama 4 with 10M Tokens: How Much Does It Cost and Is It Worth It?
Copy link
Facebook
Email
Notes
More
LoRA Trainable Tokens: Save Memory, Improve Accuracy for Your Domain
How to teach an LLM to use your new tokens without fully retraining the token embeddings
Apr 3
•
Benjamin Marie
7
Share this post
The Kaitchup – AI on a Budget
LoRA Trainable Tokens: Save Memory, Improve Accuracy for Your Domain
Copy link
Facebook
Email
Notes
More
BF16 vs. FP16 vs. FP32 for Gemma 3 Inference — Mind Your Data Type
Mitigating Numerical Issues When Converting a Model from BF16 to FP16
Mar 17
•
Benjamin Marie
9
Share this post
The Kaitchup – AI on a Budget
BF16 vs. FP16 vs. FP32 for Gemma 3 Inference — Mind Your Data Type
Copy link
Facebook
Email
Notes
More
Fine-Tuning Gemma 3 on Your Computer with LoRA and QLoRA (+model review)
The efficiency of global-local attention with QK-Norm and no more soft-capping
Mar 13
•
Benjamin Marie
16
Share this post
The Kaitchup – AI on a Budget
Fine-Tuning Gemma 3 on Your Computer with LoRA and QLoRA (+model review)
Copy link
Facebook
Email
Notes
More
7
TransMLA: Improve Qwen2.5 and Llama 3x LLMs with DeepSeek's Multi-Head Latent Attention
Give your LLMs significantly more learning power
Mar 5
•
Benjamin Marie
9
Share this post
The Kaitchup – AI on a Budget
TransMLA: Improve Qwen2.5 and Llama 3x LLMs with DeepSeek's Multi-Head Latent Attention
Copy link
Facebook
Email
Notes
More
2
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts