The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

vLLM vs Ollama: How They Differ and When To Use Them

With Examples of Offline and Online Inference

Benjamin Marie's avatar
Benjamin Marie
Jul 07, 2025
∙ Paid
31
2
Share
Image generated with ChatGPT

This article takes a close look at two popular open-source tools for LLM inference: vLLM and Ollama. Both are widely used but optimized for very different use cases.

vLLM is built to maximize GPU throughput in server environments, while Ollama focuses on ease-of-use and local model execution, often on CPU. While they might seem like alternatives at first glance, they serve distinct roles in the LLM ecosystem.

We'll explore how vLLM achieves high performance through low-level memory optimizations like PagedAttention, and how it excels in multi-user or long-context scenarios. Then we'll look at Ollama’s lightweight design, its integration with quantized GGUF models, and its focus on simplicity.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

The goal of this article is to help you understand the differences between vLLM and Ollama, so you can choose the right tool based on your use case. You don’t need any prior experience with LLM inference or deployment to understand this article. This is meant to be beginner-friendly, with the exception of the PagedAttention (short) section just ahead.

I prepared a simple notebook containing the main commands to set up and try vLLM and Ollama with Qwen3:

Get the notebook (#175)

Note: What about SGLang? I’m much less familiar with SGLang, but I think it is as good as vLLM, maybe with fewer features. SGLang can be used for the same use cases as vLLM.

vLLM vs Ollama: One for the GPU, the Other for the CPU?

vLLM: Leveraging the GPU at Almost Full Capacity

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture