The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
High-Speed Inference with llama.cpp and Vicuna on CPU

High-Speed Inference with llama.cpp and Vicuna on CPU

You don’t need a GPU for fast inference

Benjamin Marie's avatar
Benjamin Marie
Jun 14, 2023
∙ Paid
4

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
High-Speed Inference with llama.cpp and Vicuna on CPU
1
Share

A vicuna — Photo by Parsing Eye on Unsplash

For inference with large language models, we may think that we need a very big GPU or that it can’t run on consumer hardware. This is rarely the case.

Nowadays, we have many tricks and frameworks at our disposal, such as device mapping or QLoRa, that make inference possible at home, even for very large language …

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share