Device Map: Avoid Out-of-Memory Errors When Running Large Language Models
A small trick to run LLMs on any computer
Device mapping is a feature implemented in the Accelerate library by Hugging Face. It splits a large language model (LLM) into smaller parts that can be individually loaded on different devices: GPUs VRAM, CPU RAM, and hard disk.
In this article, I won’t further explain how it works. I have already written a detailed report about device map that you can read here:
I will explain why even with device map you may still get out-of-memory (OOM) errors triggered by your GPU.