DoRA vs. LoRA: Better and Faster than LoRA?

DoRA fine-tuning for Mistral 7B

Mar 11, 2024

∙ Paid

LoRA is the most used technique for parameter-efficient fine-tuning (PEFT) of large language models (LLMs). This PEFT method adds a few trainable parameters, i.e., an adapter, on top of the LLM whose parameters are frozen. Since only the adapter’s parameters are trained during fine-tuning, LoRA is significantly more memory-efficient and faster to converge.

While LoRA can be nearly as good as a standard full fine-tuning, it often remains some performance gap. Moreover, finding optimal LoRA hyperparameters, especially the rank (r), is a tedious task.

Many alternatives have been proposed to extend and improve LoRA with various goals such as fine-tuning quantization-aware adapters (e.g., QA-LoRA, LQ-LoRA), reducing further the number of trainable parameters (e.g., VeRA), and using LoRA to pre-train LLMs from scratch (e.g., ReLoRA, LoRA-the-Explorer).

DoRA has been proposed to improve LoRA with a better theoretical grounding. DoRA is more robust to hyperparameter changes, learns faster (i.e., requires fewer training examples), is more parameter-efficient, and further closes the performance gap with full fine-tuning.

DoRA vs. LoRA? Which one is better?

In this article, I explain DoRA. Then, I show how to use DoRA to fine-tune Mistral 7B on consumer hardware and compare it with the standard LoRA.

I made a notebook demonstrating DoRA fine-tuning for Mistral 7B. You can find it here:

Get the notebook (#51)

The Kaitchup – AI on a Budget

DoRA vs. LoRA: Better and Faster than LoRA?

DoRA fine-tuning for Mistral 7B

DoRA: LoRA but with a New Magnitude Vector

This post is for paid subscribers