Device Map: Avoid Out-of-Memory Errors When Running Large Language Models

A small trick to run LLMs on any computer

Jul 11, 2023

∙ Paid

Device mapping is a feature implemented in the Accelerate library by Hugging Face. It splits a large language model (LLM) into smaller parts that can be individually loaded on different devices: GPUs VRAM, CPU RAM, and hard disk.

Device map — Case where the LLM doesn’t fit on the GPUs

In this article, I won’t further explain how it works. I have already written a detailed report about device map that you can read here:

Run Very Large Language Models on Your Computer

Benjamin Marie, PhD

December 22, 2022

Read full story

I will explain why even with device map you may still get out-of-memory (OOM) errors triggered by your GPU.

The Kaitchup – AI on a Budget

Device Map: Avoid Out-of-Memory Errors When Running Large Language Models

A small trick to run LLMs on any computer

Run Very Large Language Models on Your Computer

This post is for paid subscribers