Llama 3.2 Embeddings: Training and Evaluation with LLM2Vec

A step-by-step tutorial

Nov 04, 2024

∙ Paid

The embedding model plays a key role in many applications such as in Retrieval-Augmented Generation (RAG) for large language models (LLMs). In RAG systems, it encodes both the knowledge base and the user query. I explained the RAG concept in this article:

RAG for Mistral 7B Instruct with LlamaIndex and Transformers

Benjamin Marie

March 25, 2024

Read full story

Using an embedding model that is trained or fine-tuned on the same domain as the LLM can greatly improve a RAG system. With LLM2Vec, we can extract an inaccurate embedding model directly from the LLM. Then, we can improve this model with a two-stage training including masked next-token prediction and contrastive learning. We saw how to do this in previous articles with Llama 3 8B. However, because Llama 3 8B is a large model, it produces high-dimensional text embeddings, which can be costly to train and deploy in downstream tasks.

In this article, we will see how to make text embeddings from Llama 3.2 1B. We will see in detail all the steps: masked next-token prediction training, contrastive learning, and then how to evaluate the resulting embeddings. I used an RTX 3090 from RunPod (currently at $0.22/hour) (referral link) for the training steps and evaluation.

The Kaitchup – AI on a Budget

Llama 3.2 Embeddings: Training and Evaluation with LLM2Vec

A step-by-step tutorial

RAG for Mistral 7B Instruct with LlamaIndex and Transformers

Train an Embedding Model with LLM2Vec

Step 1: Masked Next-Token Prediction Training

This post is for paid subscribers