The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Estimate the Memory Consumption of LLMs for Inference and Fine-tuning

Estimate the Memory Consumption of LLMs for Inference and Fine-tuning

A close look at the memory consumption of Command-R+, Mixtral-8x22B, and Llama 3 70B

Benjamin Marie's avatar
Benjamin Marie
Apr 25, 2024
∙ Paid
12

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Estimate the Memory Consumption of LLMs for Inference and Fine-tuning
17
1
Share
Generated with DALL-E

With Command-R+, Mixtral-8x22b, and Llama 3 70B that were all released within a few weeks, we have now LLMs that perform more and more closely to the best GPT-4 models. However, these models are huge. They all have more than 70 billion parameters:

  • Command-R+: A 104B parameter model

  • Mixtral-8x22b: A mixture-of-expert (MoE) model with 141B parameters

  • Llama 3 70B: A model with 70.6B parameters

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Can you fine-tune and run these models on your computer?

In this article, I explain and analyze their memory consumption for inference and fine-tuning. The method I present applies to any transformer LLMs to estimate their memory consumption without downloading them. We will see that, while the memory consumption of Command-R+, Mixtral-8x22b, and Llama 3 70B is huge, there are several techniques to significantly reduce it, such as quantization and memory-efficient optimizers.

I made a notebook that can automatically estimate the memory consumption of a transformer model for inference and fine-tuning. You can find it here:

Get the notebook (#64)

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share