The Mayonnaise: Rank First on the Open LLM Leaderboard with TIES-Merging
Trim and merge LLMs while keeping the same number of parameters
Following the release of Mixtral, we first saw a new trend of LLMs being mixtures of experts (MoEs) combining several other LLMs. We saw how to make our own MoE in a previous article:
However, while MoEs are an interesting way to combine LLMs while preserving the full knowledge of each of the LLMs combined, they are often unreasonably large and consequently require a GPU with a lot of memory. Running them on consumer hardware is challenging.
A more and more popular alternative to combining LLMs is merging them into one LLM without increasing the number of parameters. There are several simple algorithms to do that.
In this article, I focus on the TIES-merging method. This is one of the best-performing merging methods that is also simple to understand. I first explain how it works by briefly reviewing the NeurIPS paper that introduced it. Then, we will try and evaluate different combinations of LLMs using mergekit. Using this method, we will see how easy it became to rank first on the Open LLM Leaderboard in the 7B-class of LLMs. My Mayonnaise is still ranked first while I’m writing this article.
These LLM combinations can be fully replicated by using this notebook: