The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Run and Serve Faster VLMs Like Pixtral and Phi-3.5 Vision with vLLM
Copy link
Facebook
Email
Notes
More

Run and Serve Faster VLMs Like Pixtral and Phi-3.5 Vision with vLLM

Understanding how much memory you need to serve a VLM

Benjamin Marie's avatar
Benjamin Marie
Sep 19, 2024
∙ Paid
6

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Run and Serve Faster VLMs Like Pixtral and Phi-3.5 Vision with vLLM
Copy link
Facebook
Email
Notes
More
5
Share
An image prompt encoded by Pixtral

vLLM is currently one of the fastest inference engines for large language models (LLMs). It supports a wide range of model architectures and quantization methods. In addition, we saw that it can efficiently serve models equipped with multiple LoRA adapters.

Serve Multiple LoRA Adapters with vLLM

Serve Multiple LoRA Adapters with vLLM

Benjamin Marie
·
August 1, 2024
Read full story

vLLM also supports vision-language models (VLMs) with multimodal inputs containing both images and text prompts. For instance, vLLM can now serve models like Phi-3.5 Vision and Pixtral, which excel at tasks such as image captioning, optical character recognition (OCR), and visual question answering (VQA).

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, I will show you how to use VLMs with vLLM, focusing on key parameters that impact memory consumption. We will see why VLMs consume much more memory than standard LLMs. We’ll use Phi-3.5 Vision and Pixtral as case studies for a multimodal application that processes prompts containing text and images.

The code for running Phi-3.5 Vision and Pixtral with vLLM is provided in this notebook:

Get the notebook (#105)

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More