The Weekly Kaitchup #76

GRPO/TRL - SmolVLM - Coconut

Benjamin Marie

Jan 24, 2025

Hi Everyone,

In this edition of The Weekly Kaitchup:

GRPO in TRL
Smaller SmolVLM
Coconut: “Thinking” without Wasting Tokens

GRPO in TRL

TRL is my go-to library to easily post-train LLMs. It implements all the state-of-the-art methods for post-training: supervised fine-tuning (SFT), DPO, SimPO, and ORPO, among many others.

It now also implements GRPO, the reinforcement learning method behind the success of DeepSeek-R1 and Qwen2.5.

DeepSeek-R1: Reinforcement Learning with Simple and Verifiable Rewards

Benjamin Marie

Jan 22

Read full story

In an article next week, we will see in detail how it works and check the cost of GRPO. I can already tell you: GRPO is extremely costly!

Meanwhile, here is some code snippet published by Hugging Face:

from datasets import load_dataset
from peft import LoraConfig
from trl import GRPOConfig, GRPOTrainer

# Load the dataset
dataset = load_dataset("trl-lib/tldr", split="train")

training_args = GRPOConfig(
    output_dir="Qwen2-0.5B-GRPO",
    learning_rate=1e-5,
    logging_steps=10,
    gradient_accumulation_steps=16,
    max_completion_length=128,
)
trainer = GRPOTrainer(
    model="Qwen/Qwen2-0.5B-Instruct",
    reward_model="weqweasdas/RM-Gemma-2B",
    args=training_args,
    train_dataset=dataset,
    peft_config=LoraConfig(task_type="CAUSAL_LM"),
)

trainer.train()

To use it, install TRL from source.

Check also the documentation. It is very well written, with a good explanation of GRPO:

TRL’s grpo_trainer

Smaller SmolVLM

Two new small-scale models have joined the SmolVLM lineup, with a 256M parameter version and a 500M parameter version that preserve respectable multimodal capabilities at a fraction of the size of earlier releases.

Hugging Face Collection: SmolVLM 256M & 500M

Both rely on a smaller SigLIP base patch-16/512 vision encoder instead of the heavier SigLIP 400M SO encoder, allowing them to handle images at higher resolution while reducing overall size. A few changes in tokenization methods, such as pixel shuffling and special sub-image tokens, help reduce the number of tokens when describing visual content.

Run and Serve Faster VLMs Like Pixtral and Phi-3.5 Vision with vLLM

Benjamin Marie

September 19, 2024

Read full story

Data mixes have also been revamped, placing greater focus on document understanding and image captioning while still covering visual reasoning, charts, and general instructions.

Integration is straightforward with support for popular libraries like transformers and MLX, and you can also take advantage of ONNX-compatible checkpoints for wider compatibility, including WebGPU demos.

transformers.js: Run Phi-3.5 & Llama 3.2 in Your Browser

Benjamin Marie

September 26, 2024

Read full story

On top of this, the new “ColSmolVLM” has been released. You can use this model to make a multimodal RAG system:

Multimodal RAG with ColPali and Qwen2-VL on Your Computer

Benjamin Marie

September 16, 2024

Read full story

Coconut: “Thinking” without Wasting Tokens

Last month, Meta published their Coconut paper:

Training Large Language Models to Reason in a Continuous Latent Space

This is what I wrote about it in The Salt:

Coconut modifies the reasoning process by using the model's hidden states as continuous inputs for subsequent steps, bypassing the need for explicit language generation. This approach allows for efficient reasoning, where multiple potential solutions are encoded simultaneously, enabling the model to explore and refine reasoning paths in a manner similar to breadth-first search.
Continuous thoughts improve decision-making by progressively narrowing down options, even without explicit training for this capability.
source

In other words, while reasoning LLMs like QwQ and DeepSeek-R1 may need to generate thousands of tokens to “think”, LLMs trained with Coconut don’t need to generate any tokens for the thinking step. They are much more efficient.

Meta released their implementation last week:

GitHub: facebookresearch/coconut (MIT license)

The current code only demonstrates training configurations for GPT-2, and it’s unclear how much effort would be needed to adapt it for newer models. Still, the Coconut approach shows promise for training more efficient LLMs, and it will be interesting to see whether it gains wider adoption as the field moves forward.

GPU Selection of the Week:

To get the prices of GPUs, I use Amazon.com. If the price of a GPU drops on Amazon, there is a high chance that it will also be lower at your favorite GPU provider. All the links in this section are Amazon affiliate links.

With NVIDIA's announcement of the RTX 50 series, all the RTX 4090/4080/4070 became unaffordable. Since an RTX 5060 wasn’t announced, the RTX 4060 remains at the same price.

RTX 50 and DIGITS: What Does It Mean for Local AI?

Benjamin Marie

Jan 8

Read full story

RTX 4090 (24 GB): ASUS ROG Strix GeForce RTX™ 4090 BTF
RTX 4080 SUPER (16 GB): Inno3D GeForce RTX 4080 Super iChill Black
RTX 4070 Ti SUPER (16 GB): Zotac Gaming GEFORCE RTX 4070 Ti Super Solid OC
RTX 4060 Ti (16 GB): Zotac Gaming GeForce RTX 4060Ti AMP 16Go

The Salt

The Salt is my other newsletter that takes a more scientific approach. In The Salt, I primarily feature short reviews of recent papers (for free), detailed analyses of noteworthy publications, and articles centered on LLM evaluation.

I reviewed in The Weekly Salt:

Smaller KV Cache with Tensor Product Attention

Benjamin Marie

Jan 21

Read full story

⭐Tensor Product Attention Is All You Need
Evaluating Sample Utility for Data Selection by Mimicking Model Weights
Towards Best Practices for Open Datasets for LLM Training

Support The Kaitchup by becoming a Pro subscriber:

What You'll Get

Priority Support – Fast, dedicated assistance whenever you need it to fine-tune or optimize your LLM/VLM. I answer all your questions!
Lifetime Access to All the AI Toolboxes – Repositories containing Jupyter notebooks optimized for LLMs and providing implementation examples of AI applications.
Full Access to The Salt – Dive deeper into exclusive research content. Already a paid subscriber to The Salt? You’ll be refunded for the unused time!
Early Access to Research – Be the first to access groundbreaking studies and models by The Kaitchup.
30% Discount for Group Subscriptions – Perfect for teams and collaborators.
The Kaitchup’s Book – A comprehensive guide to LLM fine-tuning. Already bought it? You’ll be fully refunded!
All Benefits from Regular Kaitchup Subscriptions – Everything you already love, plus more. Already a paid subscriber? You’ll be refunded for the unused time!

Subscribe to The Kaitchup Pro

That’s all for this week.

If you like reading The Kaitchup, consider sharing it with friends and coworkers (there is a 20% (or 30% for Pro subscribers) discount for group subscriptions):

Share The Kaitchup – AI on a Budget

Have a nice weekend!

The Kaitchup – AI on a Budget

The Weekly Kaitchup #76

GRPO/TRL - SmolVLM - Coconut

GRPO in TRL

DeepSeek-R1: Reinforcement Learning with Simple and Verifiable Rewards

Smaller SmolVLM

Run and Serve Faster VLMs Like Pixtral and Phi-3.5 Vision with vLLM

transformers.js: Run Phi-3.5 & Llama 3.2 in Your Browser

Multimodal RAG with ColPali and Qwen2-VL on Your Computer

Coconut: “Thinking” without Wasting Tokens

GPU Selection of the Week:

RTX 50 and DIGITS: What Does It Mean for Local AI?

The Salt

Smaller KV Cache with Tensor Product Attention

What You'll Get

Discussion about this post