GPU Benchmarking: What Is the Best GPU for LoRA, QLoRA, and Inference?
You probably don't need $10k+ GPUs
Fine-tuning and running large language models (LLMs) can be quite costly, with GPUs being the primary component driving these expenses. However, finding the most cost-effective GPU for specific tasks is challenging due to the lack of extensive and up-to-date benchmarking.
For instance, we usually don’t know which is the most cost-effective GPU for LoRA/QLoRA fine-tuning. The most expensive GPUs are not always the fastest.
Choosing the right GPU is critical to saving both time and money.
In this article, I benchmark 18 NVIDIA GPUs. Results show the training time and cost associated with each GPU for LoRA and QLoRA fine-tuning. Additionally, I have benchmarked inference throughput with and without bitsandbytes 4-bit quantization.
To compute these results, I used Llama 3 8B and mainly relied on the GPUs proposed by RunPod. Note: Here is my referral link if you want to try RunPod while supporting The Kaitchup.
I plan to regularly update these results by adding new LLMs, new settings (e.g., different batch sizes), and new GPUs. I’ll notify you in The Weekly Kaitchup when this happens.
If you want to reproduce the benchmark scores or benchmark your own configuration, use this notebook: