The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Yi: Fine-tune and Run One of the Best Bilingual LLMs on Your Computer

Yi: Fine-tune and Run One of the Best Bilingual LLMs on Your Computer

How to use the Yi models on a budget

Benjamin Marie's avatar
Benjamin Marie
Mar 21, 2024
∙ Paid
2

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Yi: Fine-tune and Run One of the Best Bilingual LLMs on Your Computer
1
Share
Generated with DALL-E

The initial versions of the Yi models were unveiled in December 2023. Since their launch, there have been significant enhancements along with the introduction of new model sizes. The lineup now includes models with 6 billion, 9 billion, and 34 billion parameters. Additionally, chat models and variants capable of processing contexts up to 200,000 tokens have been introduced.

Yi's LLMs are open and highly effective in performing various tasks. Unlike many other open LLMs, Yi models are bilingual. They can perform tasks in English and Chinese.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, I review the Yi models and the technical report describing them to understand how they were trained. Then, I show how to run, quantize, fine-tune, and benchmark the models on consumer hardware. Even the 34B model can run on a single consumer GPU if quantized.

I made a notebook for the Yi LLMs implementing:

  • Inference with Transformers and vLLM

  • Quantization with bitsandbytes, AWQ, and GPTQ

  • Fine-tuning with QLoRA

  • Benchmarking for performance and accuracy with the Evaluation Harness and Optimum Benchmark

Get the notebook (#54)

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share