The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Qwen3-VL Fine-Tuning on Your Computer

Model review, GPU requirements, and code explained step by step

Benjamin Marie's avatar
Benjamin Marie
Oct 20, 2025
∙ Paid
9
Share
Image generated by ChatGPT (prompt: “make an illustration of a capybara describing images and charts. Cartoon style.”)

Qwen released the first Qwen3-VL models in September. They started with Qwen3-VL-235B-A22B, and then gradually released models based on Qwen3 A30B-A3B, 8B, and 4B. They are all available on the Hugging Face Hub (Apache 2.0 license):

  • Qwen3-VL (Instruct, Thinking, and FP8 versions)

Maybe smaller models are coming (unlikely) (update 21 October: They released 2B and 32B variants), but since these last two models are small enough to run on consumer GPUs, I think this is a good time for a review of the best open-weight vision language models (VLMs) currently available!

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, we’ll examine how Qwen3-VL differs from Qwen2.5-VL across architecture, training, and overall performance. We’ll also cover GPU requirements, show how to run the models, and walk through fine-tuning Qwen3-VL with Unsloth. I’ll provide a step-by-step fine-tuning guide:

  1. Load the model (Qwen3-VL 8B with Unsloth)

  2. Mount the adapter (QLoRA/LoRA)

  3. Load & preprocess the dataset (ChartQA → SFT format)

  4. Configure the trainer

  5. Training

Here is my fine-tuning notebook using Unsloth:

Get the notebook (#185)

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture