The Weekly Kaitchup #42

Yi-1.5 - Phi-3 small and medium - Mistral 7B v0.3

Benjamin Marie

May 24, 2024

Hi Everyone,

In this edition of The Weekly Kaitchup:

Yi-1.5: Better than Llama 3 8B?
Phi-3 “small” and Phi-3 “medium”
Mistral 7B v0.3 with Function Calling

The Kaitchup has now 3,675 subscribers. Thanks a lot for your support!

If you are a free subscriber, consider upgrading to paid to access all the notebooks (70+) and more than 100 articles.

I’m offering a 35% discount for the yearly subscription to The Salt, my other AI newsletter. Available until May 28th.

Subscribe to The Salt

Yi-1.5: Better than Llama 3 8B?

01.AI released a new version of its Yi LLMs:

Hugging Face collection: Yi-1.5 (2024/05)

Yi-1.5 is Yi but further pre-trained on 500B additional tokens, according to the model card:

it is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.

I don’t know what is in these 500B tokens but it significantly boosted the performance of Yi to the point that Yi-1.5 9B outperforms Llama 3 8B. The 6B version also outperforms Llama 3 8B on some of the benchmarks:

I can confirm that the models perform very well. However, I’m unsure whether they are better than Llama 3. As usual, it depends on the target tasks.

They will publish an update of their technical report to document how they have trained Yi-1.5. Meanwhile, you can check my review of Yi here:

Yi: Fine-tune and Run One of the Best Bilingual LLMs on Your Computer

Benjamin Marie

March 21, 2024

Read full story

The notebook showing how to quantize, fine-tune, and run Yi LLMs (also works for Yi-1.5) is here:

Get the notebook (#54)

Phi-3 “small” and Phi-3 “medium”

Almost one month ago, Microsoft was releasing Phi-3 “mini, a 3.8B parameter LLM. At that time, they also mentioned that larger versions would be released later.

These larger versions, Phi-3 “small” and Phi-3 “medium”, are now available on the Hugging Face Hub (MIT license). They have 7B and 14B parameters, respectively:

According to Microsoft’s evaluation of the models, they perform extremely well.

Phi-3 “medium” outperforms GPT-3.5-Turbo and Command-R+, a 104B parameter model. Moreover, it performs closely to Llama 3 70B instruct.

Assuming that Microsoft conducted all the necessary checks to ensure that Phi-3’s training data were not contaminated by the benchmarks, these results are quite surprising.

Contaminated LLMs: What Happens When You Train an LLM on the Evaluation Benchmarks?

Benjamin Marie

March 20, 2024

Read full story

In an upcoming article, I’ll review Phi-3 medium, including its memory consumption and the performance of its quantized version. Meanwhile, you can read my review of Phi-3 “mini” here:

Phi-3: Fine-tuning and Quantization on Your Computer

Benjamin Marie

May 2, 2024

Read full story

Mistral 7B v0.3 with Function Calling

Mistral AI released a v0.3 for Mistral 7B:

mistralai/Mistral-7B-v0.3

This new version adds 768 tokens to the original Mistral 7B.

Here are some of the added tokens:

The remaining added tokens are “[control_X]” special tokens that are not trained.

All these new tokens were added to support function calling with Mistral 7B. It should now be much easier to use Mistral 7B for calling external tools/APIs.

Evergreen Kaitchup

In this section of The Weekly Kaitchup, I mention which of the Kaitchup’s AI notebook(s) and articles I have checked and updated, with a brief description of what I have done.

This week, I updated the article and notebook comparing 8-bit, 4-bit, 3-bit, and 2-bit quantizations for Llama 3 8B.

Avoid Quantizing Llama 3 8B with GPTQ and Use BitsandBytes Instead

Benjamin Marie

May 16, 2024

Read full story

Get the notebook (#70)

All the plots have been updated to show that while GPTQ 4-bit doesn’t perform well with Llama 3 8B, bitsandbytes NormalFloat4 is much more accurate and maintains the accuracy of Llama 3 8B 4-bit above Llama 2 7B 4-bit.

That’s good news for QLoRA users.

The Salt

The Salt is my other newsletter that takes a more scientific approach. In The Salt, I primarily feature short reviews of recent papers (for free), detailed analyses of noteworthy publications, and articles centered on LLM evaluation.

I reviewed DeepSeek-V2, a huge MoE model with numerous experts. I also showed how to fine-tune DeepSeek-V2 Lite:

The Salt - Curated AI

DeepSeek-V2: A Huge LLM with Efficient Inference

Since the release of Mixtral-8x7B by Mistal AI, mixture-of-experts (MoE) LLMs have been shown to perform as well as standard “dense” models of similar sizes while being cheaper for inference. This is because not all the parameters of an MoE are active during inference. Only a subset of experts is effectively used. For instance, Mixtra-8x7B and Mixtral-8x22B only activate two experts among eight. The decision to activate experts is taken by a router network…

a year ago · 1 like · Benjamin Marie

This week in the Salt, I briefly reviewed:

Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training
⭐Understanding the performance gap between online and offline alignment algorithms
LoRA Learns Less and Forgets Less
⭐RLHF Workflow: From Reward Modeling to Online RLHF

The Salt - Curated AI

Online RLHF Is Still the Best Method for LLM Alignment

Reviewed this week Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training ⭐Understanding the performance gap between online and offline alignment algorithms LoRA Learns Less and Forgets Less ⭐RLHF Workflow: From Reward Modeling to Online RLHF…

a year ago · 2 likes

That’s all for this week.

If you like reading The Kaitchup, consider sharing it with friends and coworkers:

Share The Kaitchup – AI on a Budget

Have a nice weekend!