Models

All the models are stored in The Kaitchup’s repository on the Hugging Face Hub.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Llama 3

Llama 3 8B quantized to 4-bit with AWQ:

Llama 3 8B converted to an embedding model:

The Mayonnaise

A collection of 4 models, made with mergekit, and performing among the best 7B models on public benchmarks.

The recipe used to create these models is detailed here:

Maixtchup: A MoE Made with 4xMistral 7B

Llama 2 MT

I fine-tuned Llama 2 for the translation of several languages into English. More details in this article:

You can find the translation models here:

Llama 2 Quantized for QA-LoRA

QA-LoRA fine-tunes LoRA to be “quantization-aware”. The current implementation needs LLMs to be quantized with a specific version of AutoGPTQ. More details in this article:

A Llama 2 model ready for QA-LoRA is available here:

SFT, reward, and RLHF models based on OPT and trained with DeepSpeed Chat

The instructions to train and use these models are given in this article:

Llama 2 7B quantized with GPTQ (compatible with transformers)

To quantize and run GTPQ models, follow the instructions in this tutorial: