Hugging Face’s TRL library for supervised fine-tuning (SFT) is very useful for training large language models (LLM) on instruction datasets. It is simple but with many options to make fine-tuning much easier and faster than with the standard Transformers library.
Some options for the SFT Trainer can significantly speed up training, but not without unexpected behaviors that took me some time to understand.
Here is what I have learned so far, along with some other useful tips that make easier and faster the inference with Llama 2.
Note: I’ll publish a complete tutorial on how to fine-tune Llama 2 with TRL. Meanwhile, you can install TRL with “pip install trl“ and have a look at the documentation of SFT.
Pack Several Examples in the Same Sequence
Packing is one of the very powerful options of the SFTTrainer that took me some time to understand.
It significantly reduces training time by “packing” several training examples into one sequence. For instance, let’s say that your sequences have a maximum length of 2048, but you have 4 examples of a length of 500. Packing would put all these examples into the same sequence.