The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Modelfile with LoRA Adapters for Ollama: How to Deploy Your Fine-Tuned LoRA Adapters

Probably the easiest way to run adapters offline and online

Benjamin Marie's avatar
Benjamin Marie
Dec 30, 2024
∙ Paid

Ollama lets you deploy large language models (LLMs) locally and serve them online. It provides a command-line interface (CLI) to download, manage, and use models like Llama 3.2, Mistral, and Qwen2.5.

It is a wrapper for llama.cpp, designed to be much more user-friendly. You only need a basic configuration file to run a model or a LoRA adapter. It can also integrate with LangChain to create more complex applications like Retrieval-Augmented Generation (RAG) systems.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

This article explains how to set up Ollama and how to use it. It shows how to start a custom model with a LoRA adapter, serve it as a chatbot, and examples of how to interact with a PDF using a RAG system with Ollama and LangChain. The code examples in this tutorial use Llama 3.2.

The following notebook shows how to load and use adapters with Ollama:

Get the notebook (#132)

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 The Kaitchup · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture