QLoRA: Fine-Tune a Large Language Model on Your GPU
Fine-tuning models with billions of parameters on consumer hardware
Most large language models (LLMs) are far too large to fine-tune on consumer hardware. For example, fine-tuning a 70-billion-parameter model typically requires a multi-GPU node, such as 8 NVIDIA H100s, an extremely costly setup that can run into hundreds of thousands of dollars. In practice, this means relying on cloud computing, where costs can still e…