The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Fast and Memory-Efficient Full Fine-Tuning with Unsloth (single-GPU)
Copy link
Facebook
Email
Notes
More

Fast and Memory-Efficient Full Fine-Tuning with Unsloth (single-GPU)

With the best hyperparameters for a cost-effective full fine-tuning

Benjamin Marie's avatar
Benjamin Marie
Apr 14, 2025
∙ Paid
8

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Fast and Memory-Efficient Full Fine-Tuning with Unsloth (single-GPU)
Copy link
Facebook
Email
Notes
More
5
Share
Image generated with ChatGPT

Fine-tuning large language models (LLMs) for specific tasks and domains can be extremely expensive. This process typically requires multiple high-end GPUs due to the significant memory demands of LLMs.

Unsloth, known for being one of the fastest and most memory-efficient frameworks for fine-tuning, was previously limited to LoRA and QLoRA methods. Meaning it only supported adapter-based fine-tuning, not full model fine-tuning.

However, Unsloth now supports full fine-tuning as well. You can fully fine-tune models with 7–8 billion parameters, such as Llama 3.1 and Qwen2.5, using a single GPU with 48 GB of VRAM.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, we'll explore how to use Unsloth for full fine-tuning of LLMs. I’ll walk through example code for fine-tuning LLMs like Llama 3.1, analyze memory usage during training, and examine how different hyperparameters affect both memory consumption and training speed. Interestingly, due to Unsloth’s extensive optimizations, some hyperparameter changes that would normally accelerate training, such as increasing the batch size or not paging the optimizer states, might actually slow it down. We’ll dive into why that happens.

I also compare the performance of three GPUs: the NVIDIA L40S, RTX 6000 Ada, and RTX A6000. Using RunPod (referral link) pricing as a reference, we’ll determine which card delivers the best performance-to-cost ratio. I’ll also provide estimates for the H100.

The fine-tuning code discussed in this article is available in the accompanying notebook:

Get the notebook (#157)

Full Fine-Tuning with Unsloth

Keep reading with a 7-day free trial

Subscribe to The Kaitchup – AI on a Budget to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More