Maixtchup: Make Your Own Mixture of Experts with Mergekit

The rise of the MoEs

Jan 18, 2024

∙ Paid

Since the release of Mixtral-8x7B by Mistral AI, there has been a renewed interest in the mixture of expert (MoE) models. This architecture exploits expert sub-networks among which only some of them are selected and activated by a router network during inference.

Mixtral-8x7B: Understanding and Running the Sparse Mixture of Experts by Mistral AI

Benjamin Marie

December 12, 2023

Read full story

MoEs are so simple and flexible that it is easy to make a custom MoE. On the Hugging Face Hub, we can now find several trending LLMs that are custom MoEs, such as mlabonne/phixtral-4x2_8.

However, most of them are not traditional MoEs made from scratch, they simply use a combination of already fine-tuned LLMs as experts. Their creation was made easy with mergekit. For instance, Phixtral LLMs have been made with mergekit by combining several Phi-2 models.

In this article, we will see how Phixtral was created. We will apply the same process to create our own mixture of experts, Maixtchup, using several Mistral 7B models.

I have implemented a notebook to reproduce the fabrication of Maixtchup. It is available here:

Get the notebook (#39)

The Kaitchup – AI on a Budget

Maixtchup: Make Your Own Mixture of Experts with Mergekit

The rise of the MoEs

Mixtral-8x7B: Understanding and Running the Sparse Mixture of Experts by Mistral AI

This post is for paid subscribers