Along with Llama 3 405B, Meta also released new versions of Llama 3 8B and 70B (“Llama 3.1”). You can find them here:
The main differences with Llama 3 include the official support of German, French, Italian, Portuguese, Hindi, Spanish, and Thai, along with function calling. These new versions have been post-trained on very long sequences. They can handle contexts of up to 128k tokens without a noticeable accuracy drop.
How is fine-tuning different for this new version? I found a couple of things that made the fine-tuning of Llama 3.1 easier and better.
In this article, we will fine-tune Llama 3.1 with LoRA and QLoRA and discuss the changes in the code and learning curves compared to Llama 3. We will see that the padding side chosen for fine-tuning has a significant and unexpected impact on the results.
The code for fine-tuning Llama 3.1 with LoRA and QLoRA is implemented in this notebook: