The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Serve Multiple LoRA Adapters with vLLM and Custom Chat Templates

Swap adapters per request, reuse your chat template, and run offline or via an OpenAI-compatible server.

Benjamin Marie's avatar
Benjamin Marie
Sep 23, 2025
∙ Paid
6
Share
Image generated with ChatGPT

LoRA adapters let you specialize a base LLM for specific tasks or domains by attaching low-rank weight deltas to selected layers. At inference time, the adapter must be loaded alongside the base model, and many applications benefit from serving several adapters, e.g., one for function calling and others for classification, translation, or general generation.

In a standard setup, switching tasks means unloading one adapter and loading another, which can add seconds of latency. Modern open-source servers avoid this by keeping multiple LoRA adapters resident and selecting the appropriate one per request. For example, vLLM can host several adapters simultaneously and apply them on demand with negligible switch time, subject to GPU memory limits.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

This guide shows how to run vLLM with multiple LoRA adapters and a custom chat template, both offline (Python API) and online (HTTP server). As a running example, it uses two adapters fine-tuned for French and Japanese translation on a Qwen3 base model, and keeps the exact same custom chat template used during fine-tuning.

We’ll load several adapters side by side, route each request to the desired adapter, and register a custom chat template so prompts match training. The accompanying notebook contains the full, runnable code for serving multiple LoRA adapters with vLLM.

Get the notebook (#183)

If you want to know how I fine-tuned the adapters for translation, I used the same code I explained in this article:

Gemma 3 270M: Can Tiny Models Learn New Tasks?

Gemma 3 270M: Can Tiny Models Learn New Tasks?

Benjamin Marie
·
Sep 1
Read full story

Get the notebook (#180)

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture