The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Reasoning with QwQ and QvQ on Your Computer

Reasoning with QwQ and QvQ on Your Computer

When "preview" becomes meaningful

Benjamin Marie's avatar
Benjamin Marie
Jan 02, 2025
∙ Paid
8

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
Reasoning with QwQ and QvQ on Your Computer
Share

Alibaba's Qwen team has introduced a new multimodal model called QvQ, which incorporates 72 billion parameters. This model was specifically designed to tackle tasks requiring advanced multimodal reasoning capabilities.

Similar to OpenAI's approach with their o1 model, the Qwen team has labeled QvQ as a "preview" model. This cautionary designation indicates that the model has notable limitations and struggles with many tasks. However, unlike OpenAI's o1, QvQ is truly a model in its early stages.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, we’ll take a look at QvQ and its language model variant, QwQ, which has fewer parameters. We’ll start by going over their architecture and what kind of GPU power you’ll need to run them. I’ve also put together 4-bit and 2-bit versions of the models for easier use. After that, we’ll check their limitations and figure out when it makes sense to use them.

To get started with the quantized versions of QwQ, check out my notebook that uses vLLM for efficient inference:

Get the notebook (#133)

The notebook also shows how I quantized and evaluated the models.

The Neural Architecture of QvQ and QwQ

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share