Qwen3-VL Fine-Tuning on Your Computer
Model review, GPU requirements, and code explained step by step

Qwen released the first Qwen3-VL models in September. They started with Qwen3-VL-235B-A22B, and then gradually released models based on Qwen3 A30B-A3B, 8B, and 4B. They are all available on the Hugging Face Hub (Apache 2.0 license):
Qwen3-VL (Instruct, Thinking, and FP8 versions)
Maybe smaller models are coming (unlikely) (update 21 October: They released 2B and 32B variants), but since these last two models are small enough to run on consumer GPUs, I think this is a good time for a review of the best open-weight vision language models (VLMs) currently available!
In this article, we’ll examine how Qwen3-VL differs from Qwen2.5-VL across architecture, training, and overall performance. We’ll also cover GPU requirements, show how to run the models, and walk through fine-tuning Qwen3-VL with Unsloth. I’ll provide a step-by-step fine-tuning guide:
Load the model (Qwen3-VL 8B with Unsloth)
Mount the adapter (QLoRA/LoRA)
Load & preprocess the dataset (ChartQA → SFT format)
Configure the trainer
Training
Here is my fine-tuning notebook using Unsloth:

