Qwen3-VL Fine-Tuning on Your Computer

Model review, GPU requirements, and code explained step by step

Oct 20, 2025

∙ Paid

Image generated by ChatGPT (prompt: “make an illustration of a capybara describing images and charts. Cartoon style.”)

Qwen released the first Qwen3-VL models in September. They started with Qwen3-VL-235B-A22B, and then gradually released models based on Qwen3 A30B-A3B, 8B, and 4B. They are all available on the Hugging Face Hub (Apache 2.0 license):

Qwen3-VL (Instruct, Thinking, and FP8 versions)

Maybe smaller models are coming (~~unlikely~~) (update 21 October: They released 2B and 32B variants), but since these last two models are small enough to run on consumer GPUs, I think this is a good time for a review of the best open-weight vision language models (VLMs) currently available!

In this article, we’ll examine how Qwen3-VL differs from Qwen2.5-VL across architecture, training, and overall performance. We’ll also cover GPU requirements, show how to run the models, and walk through fine-tuning Qwen3-VL with Unsloth. I’ll provide a step-by-step fine-tuning guide:

Load the model (Qwen3-VL 8B with Unsloth)
Mount the adapter (QLoRA/LoRA)
Load & preprocess the dataset (ChartQA → SFT format)
Configure the trainer
Training

Here is my fine-tuning notebook using Unsloth:

Get the notebook (#185)

The Kaitchup – AI on a Budget

Qwen3-VL Fine-Tuning on Your Computer

Model review, GPU requirements, and code explained step by step

This post is for paid subscribers