The Kaitchup – AI on a Budget
Subscribe
Sign in
Home
Notes
AI Notebooks
Table of Contents
Tutorials
Models
Hardware for LLMs
Archive
About
Tutorials
Latest
Top
Discussions
Fine-tune Llama 3 70B on Your GPU with AQLM 2-bit
It's possible to fine-tune Llama 3 70B with only 24 GB of GPU RAM
10 hrs ago
•
Benjamin Marie
2
Share this post
Fine-tune Llama 3 70B on Your GPU with AQLM 2-bit
kaitchup.substack.com
Copy link
Facebook
Email
Note
Other
Fine-tune Tiny Chat Models with Apple OpenELM and ORPO
Can we make a good chat model with a 270M LLM?
May 9
•
Benjamin Marie
Share this post
Fine-tune Tiny Chat Models with Apple OpenELM and ORPO
kaitchup.substack.com
Copy link
Facebook
Email
Note
Other
Run Llama 3 70B on Your GPU with ExLlamaV2
2.5 bits per weight, on average, is good enough
May 6
•
Benjamin Marie
2
Share this post
Run Llama 3 70B on Your GPU with ExLlamaV2
kaitchup.substack.com
Copy link
Facebook
Email
Note
Other
Phi-3: Fine-tuning and Quantization on Your Computer
Larger and better than Phi-2
May 2
•
Benjamin Marie
5
Share this post
Phi-3: Fine-tuning and Quantization on Your Computer
kaitchup.substack.com
Copy link
Facebook
Email
Note
Other
17
Turn Llama 3 into an Embedding Model with LLM2Vec
RAG with Llama 3 for the generation and the retrieval
Apr 29
•
Benjamin Marie
2
Share this post
Turn Llama 3 into an Embedding Model with LLM2Vec
kaitchup.substack.com
Copy link
Facebook
Email
Note
Other
13
Estimate the Memory Consumption of LLMs for Inference and Fine-tuning
A close look at the memory consumption of Command-R+, Mixtral-8x22B, and Llama 3 70B
Apr 25
•
Benjamin Marie
4
Share this post
Estimate the Memory Consumption of LLMs for Inference and Fine-tuning
kaitchup.substack.com
Copy link
Facebook
Email
Note
Other
Fine-tune Llama 3 on Your Computer
With code to merge QLoRA adapters and quantize the model
Apr 22
•
Benjamin Marie
8
Share this post
Fine-tune Llama 3 on Your Computer
kaitchup.substack.com
Copy link
Facebook
Email
Note
Other
21
Neural Speed: Fast Inference on CPU for 4-bit Large Language Models
Up to 40x faster than llama.cpp?
Apr 15
•
Benjamin Marie
6
Share this post
Neural Speed: Fast Inference on CPU for 4-bit Large Language Models
kaitchup.substack.com
Copy link
Facebook
Email
Note
Other
ORPO: Preference Optimization without the Supervised Fine-tuning (SFT) Step
A much cheaper alignment method but performing as well as DPO
Apr 8
•
Benjamin Marie
8
Share this post
ORPO: Preference Optimization without the Supervised Fine-tuning (SFT) Step
kaitchup.substack.com
Copy link
Facebook
Email
Note
Other
GaLore: Full Fine-tuning on Your GPU
And pre-training!
Apr 4
•
Benjamin Marie
6
Share this post
GaLore: Full Fine-tuning on Your GPU
kaitchup.substack.com
Copy link
Facebook
Email
Note
Other
9
A Guide on Hyperparameters and Training Arguments for Fine-tuning LLMs
Batch size, optimizers, learning rate schedulers, bfloat16, ...
Apr 1
•
Benjamin Marie
11
Share this post
A Guide on Hyperparameters and Training Arguments for Fine-tuning LLMs
kaitchup.substack.com
Copy link
Facebook
Email
Note
Other
3
Marlin: Nearly Ideal Inference Speed for 4-bit Models with vLLM (1k+ tokens/sec)
Up to 4x times faster inference
Mar 28
•
Benjamin Marie
4
Share this post
Marlin: Nearly Ideal Inference Speed for 4-bit Models with vLLM (1k+ tokens/sec)
kaitchup.substack.com
Copy link
Facebook
Email
Note
Other
7
Share
Copy link
Facebook
Email
Note
Other
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts