Deploy Your Fine-Tuned LoRA Adapters with Ollama

Probably the easiest way to run adapters offline and online

Dec 30, 2024

∙ Paid

Ollama lets you deploy large language models (LLMs) locally and serve them online. It provides a command-line interface (CLI) to download, manage, and use models like Llama 3.2, Mistral, and Qwen2.5.

It is a wrapper for llama.cpp, designed to be much more user-friendly. You only need a basic configuration file to run a model or a LoRA adapter. It can also integrate with LangChain to create more complex applications like Retrieval-Augmented Generation (RAG) systems.

This article explains how to set up Ollama and how to use it. It shows how to start a custom model with a LoRA adapter, serve it as a chatbot, and examples of how to interact with a PDF using a RAG system with Ollama and LangChain. The code examples in this tutorial use Llama 3.2.

The following notebook shows how to load and use adapters with Ollama:

Get the notebook (#132)

Ollama: How Does It Work?

Ollama provides a library of curated models that you can quickly download and use with just two command-line commands. Its GitHub repository has a clear and detailed documentation to help you get started.

GitHub: ollama/ollama

In this article, we will focus on using our own models and adapters instead of the pre-curated models maintained by Ollama.

The Kaitchup – AI on a Budget

Deploy Your Fine-Tuned LoRA Adapters with Ollama

Probably the easiest way to run adapters offline and online

Ollama: How Does It Work?

This post is for paid subscribers