Models
All the models are stored in The Kaitchup’s repository on the Hugging Face Hub.
Llama 3
Llama 3 8B quantized to 4-bit with AWQ:
Llama 3 8B converted to an embedding model:
The Mayonnaise
A collection of 4 models, made with mergekit, and performing among the best 7B models on public benchmarks.
The recipe used to create these models is detailed here:
Maixtchup: A MoE Made with 4xMistral 7B
Llama 2 MT
I fine-tuned Llama 2 for the translation of several languages into English. More details in this article:
You can find the translation models here:
Llama 2 Quantized for QA-LoRA
QA-LoRA fine-tunes LoRA to be “quantization-aware”. The current implementation needs LLMs to be quantized with a specific version of AutoGPTQ. More details in this article:
A Llama 2 model ready for QA-LoRA is available here:
SFT, reward, and RLHF models based on OPT and trained with DeepSpeed Chat
The instructions to train and use these models are given in this article:
kaitchup/OPT-1.3B-SFT-DSChatLoRA: SFT model based on OPT-1.3B and trained with LoRA
kaitchup/OPT-350M-RM-DSChat: Reward model based on OPT-350M
kaitchup/OPT-1.3B-RLHF-DSChatLoRA: RLHF model based on OPT-1.3B trained with LoRA
Llama 2 7B quantized with GPTQ (compatible with transformers)
To quantize and run GTPQ models, follow the instructions in this tutorial:
kaitchup/Llama-2-7b-gptq-4bit: Llama 2 7B quantized to 4-bit
kaitchup/Llama-2-7b-gptq-3bit: Llama 2 7B quantized to 3-bit
kaitchup/Llama-2-7b-gptq-2bit: Llama 2 7B quantized to 2-bit