The Weekly Kaitchup #51

Llama 3.1 - Mistral Large 2 - GPT4o-mini fine-tuning

Jul 26, 2024

Hi Everyone,

In this edition of The Weekly Kaitchup:

Llama 3.1: Longer Context and Multilingual Llama 3
Mistral Large 2: As Good as Llama 3 405B, Really?
Free Fine-tuning for GPT4o-mini

If you are a free subscriber, consider upgrading to paid to access all the notebooks (80+) and more than 100 articles.

If you are looking for custom AI notebooks on request, priority support, or professional LLM services, have a look at The Kaitchup Pro:

The Kaitchup Pro

AI Notebooks and Articles Published this Week by The Kaitchup

Function Calling: Fine-tuning Llama 3 and Qwen2 on xLAM

Benjamin Marie

July 22, 2024

Read full story

Notebook: #89 Function Calling: Fine-tuning LLMs on xLAM -- Examples with Llama 3 and Qwen2

Llama 3 405B: Can You Fine-tune It?

Benjamin Marie

July 24, 2024

Read full story

Llama 3.1: Longer Context and Multilingual Llama 3

Along with Llama 3 405B, Meta also released new versions of Llama 3 8B and 70B. They are named Llama 3.1 and you can find them here:

Llama 3.1 Collection

Meta published a new report describing the models:

The Llama 3 Herd of Models

The main differences with Llama 3 are:

Longer context: These new versions have been post-trained on very long sequences of 128k tokens. Thus, they handle contexts of up to 128k tokens without any accuracy drop. If you are considering using very long contexts, I suggest quantizing your KV cache otherwise handling 128k would consume a large amount of memory:
KV Cache Quantization for Memory-Efficient Inference with LLMs
Benjamin Marie
·
June 17, 2024
Read full story
Multilingual: Llama 3.1 officially supports German, French, Italian, Portuguese, Hindi, Spanish, and Thai. I don’t know why they chose these languages, especially Thai which is a very difficult language to model and is considered a low-resource language.
Function calling: Meta also trained the model for function calling. For this purpose, they modified the tokenizer to replace some of the existing special tokens. You will find examples in this blog post by Hugging Face.
Function Calling: Fine-tuning Llama 3 and Qwen2 on xLAM
Benjamin Marie
·
July 22, 2024
Read full story

Another good news is that the new license Llama 3.1 allows us to use Llama 3 to improve other LLMs. For instance, you can now distill Llama 3 to train smaller LLMs. The only constraint is that you would have to put “Llama” at the beginning of the name of the model.

According to Meta, these new models are better on public benchmarks:

As usual, take these results with a pinch of salt. We can’t reproduce these results. For the other models such as Gemma and Nemotron, Meta only copied the results published by other papers. Most of them are not comparable since they used different hyperparameters, prompt templates, examples for few-shot learning, etc.

Meta also plans to release multimodal versions of Llama 3 later. These versions will have vision and audio adapters. However, there is a possibility that they won’t be available in Europe according to Yann LeCun.

Mistral Large 2: As Good As Llama 3 405B, Really?

While we knew that Llama 3 405B would be released this week, Mistral AI was silently preparing its release of Mistral Large 2:

mistralai/Mistral-Large-Instruct-2407

It is a 123 billion parameter model, i.e., 3.3x smaller than Llama 3 405B. We find more or less the same capabilities in both models: function calling, support for other languages, context length of 128k, etc.

In its blog post, Mistral AI claims that Mistral Large 2 performs on par with Llama 3 405B.

They have conducted the first “third-party” evaluation of Llama 3 405B and interestingly, but not surprisingly, obtained different results than the results published by Meta.

On average, Mistral AI observed a drop of 1 point in accuracy compared to the results published by Meta for code generation tasks (“measured” vs. “paper” in the table below). For some languages, there is a difference of 5 points…

Mistral Large 2 seems to perform very well on multilingual benchmarks. In the following results, we can also observe that Llama 3 405B performs well for languages it doesn’t officially support (Dutch, Russian, Japanese, and Chinese).

Given that Mistral Large 2 closely performs to Llama 3 405B, I would recommend using it instead of Llama 3 405B, especially if you want to fine-tune it.

Llama 3 405B: Can You Fine-tune It?

Benjamin Marie

July 24, 2024

Read full story

However, you can do it only for research purposes. Commercial use is forbidden by Mistral AI. The license is particularly unclear on what we are allowed to do with Mistral Large 2. Since I’m not sure whether I can write tutorials based on Mistral Large 2, I won’t write articles about it.

Free Fine-tuning for GPT4o-mini

GPT4o-mini is fast, very good, and cheap. You can make it better if you fine-tune it with your data.

Fine-tuning OpenAI models is possible but is usually extremely expensive, especially the preliminary experiments required to find good hyperparameters.

OpenAI announced that fine-tuning GPT4o-mini is free until September 23 for up to 2M tokens/day.

**Fine-tuning for GPT-4o mini is free up to a daily token limit through September 23, 2024. Each qualifying org gets up to 2M complimentary training tokens daily and any overage will be charged at the normal rate of $3.00/1M tokens. (source)

You won’t go far with only 2M tokens/day but it’s a very good opportunity in my opinion to test OpenAI’s fine-tuning API, and its hyperparameters, for free when it’s possible.

The Salt

The Salt is my other newsletter that takes a more scientific approach. In The Salt, I primarily feature short reviews of recent papers (for free), detailed analyses of noteworthy publications, and articles centered on LLM evaluation.

This week, I discussed why we can’t use perplexity to compare different LLMs.

The Salt - Curated AI

Why Can't We Compare the Perplexity of Two Different Models?

Perplexity is the main evaluation metric for large language models (LLMs). It measures how well the model predicts a given sequence of tokens…

a year ago · 2 likes · 1 comment · Benjamin Marie

I also reviewed:

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?
⭐Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

The Salt - Curated AI

More Evidence that Ternary LLMs Are Good Enough

Reviewed this week…

a year ago · Benjamin Marie

That’s all for this week.

If you like reading The Kaitchup, consider sharing it with friends and coworkers (there is a 20% discount for group subscriptions):

Share The Kaitchup – AI on a Budget

Have a nice weekend!

Sarin Suriyakoon

Jul 27

Hey I am Thai. The nlp community here is really good and probably we are top internet consumer.

Expand full comment

2 replies by Benjamin Marie and others

Max

Jul 26

thanks, good to know: "Mistral Large 2: As Good As Llama 3 405B, Really?" simplifies model selection for our R&D for implementation that saves developers' time plus local + cloud costs

2 more comments...

The Kaitchup – AI on a Budget

The Weekly Kaitchup #51

Llama 3.1 - Mistral Large 2 - GPT4o-mini fine-tuning

AI Notebooks and Articles Published this Week by The Kaitchup

Function Calling: Fine-tuning Llama 3 and Qwen2 on xLAM

Llama 3 405B: Can You Fine-tune It?

Llama 3.1: Longer Context and Multilingual Llama 3

KV Cache Quantization for Memory-Efficient Inference with LLMs

Function Calling: Fine-tuning Llama 3 and Qwen2 on xLAM

Mistral Large 2: As Good As Llama 3 405B, Really?

Llama 3 405B: Can You Fine-tune It?

Free Fine-tuning for GPT4o-mini

The Salt

Discussion about this post