LoRA is the most used technique for parameter-efficient fine-tuning (PEFT) of large language models (LLMs). This PEFT method adds a few trainable parameters, i.e., an adapter, on top of the LLM whose parameters are frozen. Since only the adapter’s parameters are trained during fine-tuning, LoRA is significantly more memory-efficient and faster to converge.
While LoRA can be nearly as good as a standard full fine-tuning, it often remains some performance gap. Moreover, finding optimal LoRA hyperparameters, especially the rank (r), is a tedious task.
Many alternatives have been proposed to extend and improve LoRA with various goals such as fine-tuning quantization-aware adapters (e.g., QA-LoRA, LQ-LoRA), reducing further the number of trainable parameters (e.g., VeRA), and using LoRA to pre-train LLMs from scratch (e.g., ReLoRA, LoRA-the-Explorer).
DoRA has been proposed to improve LoRA with a better theoretical grounding. DoRA is more robust to hyperparameter changes, learns faster (i.e., requires fewer training examples), is more parameter-efficient, and further closes the performance gap with full fine-tuning.
DoRA vs. LoRA? Which one is better?
In this article, I explain DoRA. Then, I show how to use DoRA to fine-tune Mistral 7B on consumer hardware and compare it with the standard LoRA.
I made a notebook demonstrating DoRA fine-tuning for Mistral 7B. You can find it here: