From Llama 3 70B to 120B: How to Self-Augment an LLM?

Experiments with Llama 3 8B: Removing, duplicating, and reordering layers

May 23, 2024

∙ Paid

A cartoon-style fat llama with a chubby, round body and short legs, flying in the sky. The llama has tiny wings and a very surprised expression with wide eyes, raised eyebrows, and an open mouth. A question mark is floating above its head to emphasize its surprise. The scene is colorful and cheerful, with a bright blue sky and a few fluffy clouds around the llama. The llama looks as if it just realized it can fly. — Generated with DALL-E

Meta only released two versions of Llama 3: 8B and 70B. Another version of Llama 3, called Llama 3 120B, has also appeared and has drawn a lot of discussions on social networks for its surprising performance in specific tasks such as creative writing.

This “Llama 3 120B” is not a product of Meta but a version made by Maxime Labonne with Mergekit. In a previous article, I presented Mergekit and showed how to use it to create powerful LLMs by simply merging several LLMs into one:

The Mayonnaise: Rank First on the Open LLM Leaderboard with TIES-Merging

Benjamin Marie

January 29, 2024

Read full story

However, Llama 3 120B is not a merge of several LLMs. It has been made from a single LLM, Llama 3 70B, by simply duplicating 60 of its 80 layers. Yet, despite its simplicity, this process seems to have worked relatively well on Llama 3 70B.

In this article, I explain how to reproduce Llama 3 120B. I show how to use the same process for Llama 3 8B and evaluate the resulting models. Going further than just duplication, I also show how to reorder and remove layers, e.g., to reduce the size of an LLM with mergekit. We will see that while most of these operations damage the model’s performance, it is possible to recover some of it through fine-tuning.

The following notebook implements layer duplication, reordering, and deletion for Llama 3 models (but technically applicable to any transformer model):

Get the notebook (#72)

The Kaitchup – AI on a Budget

From Llama 3 70B to 120B: How to Self-Augment an LLM?

Experiments with Llama 3 8B: Removing, duplicating, and reordering layers

The Mayonnaise: Rank First on the Open LLM Leaderboard with TIES-Merging

This post is for paid subscribers