Serve Multiple LoRA Adapters with vLLM and Custom Chat Templates

Swap adapters per request, reuse your chat template, and run offline or via an OpenAI-compatible server.

Sep 23, 2025

∙ Paid

LoRA adapters let you specialize a base LLM for specific tasks or domains by attaching low-rank weight deltas to selected layers. At inference time, the adapter must be loaded alongside the base model, and many applications benefit from serving several adapters, e.g., one for function calling and others for classification, translation, or general generation.

In a standard setup, switching tasks means unloading one adapter and loading another, which can add seconds of latency. Modern open-source servers avoid this by keeping multiple LoRA adapters resident and selecting the appropriate one per request. For example, vLLM can host several adapters simultaneously and apply them on demand with negligible switch time, subject to GPU memory limits.

This guide shows how to run vLLM with multiple LoRA adapters and a custom chat template, both offline (Python API) and online (HTTP server). As a running example, it uses two adapters fine-tuned for French and Japanese translation on a Qwen3 base model, and keeps the exact same custom chat template used during fine-tuning.

We’ll load several adapters side by side, route each request to the desired adapter, and register a custom chat template so prompts match training. The accompanying notebook contains the full, runnable code for serving multiple LoRA adapters with vLLM.

Get the notebook (#183)

If you want to know how I fine-tuned the adapters for translation, I used the same code I explained in this article:

Gemma 3 270M: Can Tiny Models Learn New Tasks?

Benjamin Marie

Sep 1

Read full story

Get the notebook (#180)

The Kaitchup – AI on a Budget

Serve Multiple LoRA Adapters with vLLM and Custom Chat Templates

Swap adapters per request, reuse your chat template, and run offline or via an OpenAI-compatible server.

Gemma 3 270M: Can Tiny Models Learn New Tasks?

This post is for paid subscribers