Extract Post-Training Weights as a LoRA Adapter for LLMs
Can we turn DeepSeek-R1 into an adapter?
DeepSeek-R1-Distill-Llama-3, TULU 3 8B, and Llama 3.1 8B Instruct, among others, all share a common foundation: they are built upon the Llama 3.1 8B model.
Since they all originate from Llama 3.1 8B, it may be possible to represent each model as a set of weight adjustments, i.e., an adapter. By applying these adapters, we could transform the base Llama 3.1 8B model into any of its fine-tuned variants.
In this article, we will explore exactly that. Specifically, we will see how to decompose models like DeepSeek-R1-Distill-Llama-3 into Llama 3.1 8B plus a LoRA adapter approximating DeepSeek-R1-Distill-Llama-3. We will also attempt the same process with Qwen2.5 models. This approach allows us to store only the base model while treating each variant as a lightweight LoRA adapter that can be efficiently loaded and unloaded during inference.
This method is not limited to Llama or Qwen models. It can be applied to any fully fine-tuned/post-trained LLMs.
The following notebook provides a step-by-step guide on how to extract adapters from a fully fine-tuned model: