Deploy Your Fine-Tuned LoRA Adapters with Ollama
Probably the easiest way to run adapters offline and online
Ollama lets you deploy large language models (LLMs) locally and serve them online. It provides a command-line interface (CLI) to download, manage, and use models like Llama 3.2, Mistral, and Qwen2.5.
It is a wrapper for llama.cpp, designed to be much more user-friendly. You only need a basic configuration file to run a model or a LoRA adapter. It can also integrate with LangChain to create more complex applications like Retrieval-Augmented Generation (RAG) systems.
This article explains how to set up Ollama and how to use it. It shows how to start a custom model with a LoRA adapter, serve it as a chatbot, and examples of how to interact with a PDF using a RAG system with Ollama and LangChain. The code examples in this tutorial use Llama 3.2.
The following notebook shows how to load and use adapters with Ollama: