LoRA fine-tunes large language models (LLMs) by adding an adapter on top of the pre-trained LLM, with only this adapter being trainable while the LLM’s original parameters remain frozen. This approach significantly reduces the number of parameters that need to be trained, resulting in much smaller optimizer states. Consequently, LoRA fine-tuning consumes considerably less memory compared to standard full fine-tuning.
Nonetheless, depending on LoRA’s hyperparameters, such as the rank and the number of targeted modules, LoRA may still create very large adapters with hundreds of millions of parameters that are too large to be fine-tuned on consumer hardware.
Many alternatives have been proposed to reduce the size of adapters.
In this article, I review VeRA. I explain how it works and how it can produce adapters 100x smaller than LoRA. I fine-tuned Llama 3 with VeRA for demonstration and compared its performance with LoRA.
The notebook demonstrating VeRA fine-tuning for Llama 3 is available here: