Hi Everyone,
In this edition of The Weekly Kaitchup:
Qwen2.5-Omni-7B: A Model that Can Talk and Write at the Same Time!
Best Practices for Post-Training with DPO and PPO
DeepSeek-V3 Update: A Much More Useful Model
Unstable GPTQ Models?
Book Update
Chapter 3, “LLM Quantization,” is dropping this weekend! If you bought LLMs on a Budget via Gumroad, you’ll find the new chapter in the book’s Gumroad space. If you didn’t use Gumroad, the link is the same as before. It is in the email you received when you made the purchase or when you subscribed to The Kaitchup Pro. Can’t find it? Just reach out and I’ll help you out.
Toolbox update
I have updated the Llama 3 toolbox. It officially supports PyTorch 2.6 and the most recent versions of TRL, PEFT, Transformers, and vLLM. I also added a notebook for GRPO training with Unsloth.
Reminder: The toolboxes and the book are all included in The Kaitchup Pro, the highest subscription tier of The Kaitchup.
The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
I'm currently offering a 30% discount on the annual subscription to The Kaitchup and The Kaitchup Pro!
The discount is available until tomorrow!
Qwen2.5-Omni-7B: A Model that Can Talk and Write at the Same Time!
The Qwen team has been teasing this model since at least December. When they mentioned an “omni” model (which comes from the Latin omnis, meaning “all”), I imagined something supporting all the modalities on both sides, i.e., that could see, listen, and read as input, and also draw, talk, and write as output. Qwen2.5-Omni-7B comes pretty close to this... just missing the ability to draw!
The model is here:
Qwen/Qwen2.5-Omni-7B (Apache 2.0 license)
The proposed Thinker-Talker architecture of Qwen2.5-Omni is an end-to-end multimodal model designed to generate both streaming text and natural speech outputs in real time.
A key technical innovation is the introduction of Time-aligned Multimodal RoPE (TMRoPE), a positional embedding method that synchronizes audio timestamps precisely with video inputs.
The performance of the model claimed by the Qwen team is impressive: it is better than Qwen2-Audio for audio tasks and achieves comparable results to Qwen2.5-VL-7B for vision tasks.
Qwen2.5-Omni also delivers highly robust and natural speech generation in a streaming context, surpassing many existing streaming and non-streaming baselines in subjective naturalness and evaluation tasks like Seed-tts-eval.
They published the technical report here:
I really hope that they will release a 72B version of this model!
Best Practices for Post-Training with DPO and PPO
AI2 and the University of Washington published a very interesting paper comparing the performance of the most popular post-training methods for LLMs: DPO and PPO.
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
It confirms (once again) that PPO is significantly better than DPO. Note: The original DPO paper claimed that DPO was better than PPO with RLHF. But after plenty of real-world testing, feedback from the community, and follow-up research, it's pretty clear that this is not the case.

In a nutshell, they found that synthetic, diverse datasets with detailed per-aspect preferences are most effective for training models from preference data, and the quality of preference annotations is more important than the quality of the generated responses themselves. PPO generally performs better than DPO across various datasets. Increasing the reward model’s size for PPO and the amount of training data substantially improves reward model performance in direct evaluations, but this primarily benefits specific tasks, like GSM, rather than overall performance. Additionally, incorporating unlabeled prompts that closely match the test setting helps boost domain-specific performance, such as math tasks, but has limited impact on broader performance measures.
DPO remains significantly more cost-effective to run than PPO, as it eliminates the need for a reward model. Its simpler objective function leads to easier optimization and faster convergence. For general-purpose post-training and AI on budget, DPO continues to be a strong alternative to more complex reinforcement learning methods such as PPO or GRPO.
DeepSeek-V3 Update: A Much More Useful Model
DeepSeek-AI has released a new version of their “base” model, DeepSeek-V3:
DeepSeek-V3 is the starting point used to make DeepSeek-R1, their reasoning model. We review it here:
While the previous version exhibited limited reasoning capabilities, the new model shows substantial progress, with notable improvements in benchmark performance such as MMLU-Pro (+5.3), GPQA (+9.3), AIME (+19.8), and LiveCodeBench (+10.0). Additionally, DeepSeek-AI highlights its better abilities in front-end web development, delivering more executable code and visually appealing web pages and game interfaces.
The developers also prioritized improvements in post-training for Chinese language tasks, with better performance for Chinese while simultaneously improving results for English.
In my opinion, the most noteworthy change is the updated license; DeepSeek-V3-0324 is now available under the Apache 2.0 license, eliminating previous restrictions associated with the custom DeepSeek license.
Unstable GPTQ Models?
Earlier this month, I published a comparison of various quantization methods:
In the evaluations, Qwen2.5 72B Instruct quantized with AutoRound to 4-bit and 8-bit consistently exhibited poor instruction-following performance, essentially performing at random levels, while the 2-bit version performed acceptably.
Initially, I attributed this poor performance to known instabilities in AutoRound quantization for instruct models, assuming that re-quantization with adjusted hyperparameters might yield better outcomes, especially since I had previously achieved stable 4-bit versions of Qwen2.5 72B Instruct using AutoRound.
However, the AutoRound maintainers investigated further, given their skepticism about these unexpected results, considering Qwen2.5 72B Instruct is typically straightforward to quantize effectively. Their analysis revealed that the models perform adequately when specific PyTorch versions and inference frameworks are used. You can follow their investigation in this GitHub issue.
Currently, we're exploring the issue further, but a preliminary recommendation for anyone experiencing similar performance issues with GPTQ models quantized via AutoRound is to test the models using frameworks like vLLM or Transformers, in combination with PyTorch versions 2.5.1 or 2.6. One of these combinations may resolve the performance issues.
On a related note, I suspect the root cause might be linked to the deprecated AutoGPTQ library, used for exporting these models into GPTQ format. Its lack of maintenance has led to unpredictable behavior with newer PyTorch and inference framework versions. Additionally, GPTQ models exported with AutoRound appear incompatible with Transformers starting from version 4.49.0, where AutoGPTQ was replaced by GPTQModel.
The Salt
The Salt is my other newsletter that takes a more scientific approach. In The Salt, I primarily feature short reviews of recent papers (for free), detailed analyses of noteworthy publications, and articles centered on LLM evaluation.
I reviewed in The Weekly Salt:
⭐Defeating Prompt Injections by Design
Variance Control via Weight Rescaling in LLM Pre-training
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Support The Kaitchup by becoming a Pro subscriber:
What You'll Get
Priority Support – Fast, dedicated assistance whenever you need it to fine-tune or optimize your LLM/VLM. I answer all your questions!
Lifetime Access to All the AI Toolboxes – Repositories containing Jupyter notebooks optimized for LLMs and providing implementation examples of AI applications.
Full Access to The Salt – Dive deeper into exclusive research content. Already a paid subscriber to The Salt? You’ll be refunded for the unused time!
Early Access to Research – Be the first to access groundbreaking studies and models by The Kaitchup.
30% Discount for Group Subscriptions – Perfect for teams and collaborators.
The Kaitchup’s Book – A comprehensive guide to LLM fine-tuning. Already bought it? You’ll be fully refunded!
All Benefits from Regular Kaitchup Subscriptions – Everything you already love, plus more. Already a paid subscriber? You’ll be refunded for the unused time!
That’s all for this week.
If you like reading The Kaitchup, consider sharing it with friends and coworkers (there is a 20% (or 30% for Pro subscribers) discount for group subscriptions):
Have a nice weekend!