Hi Everyone,
In this edition of The Weekly Kaitchup:
Absolute Zero: Enabling Reasoning for LLMs Without New Data
Parakeet-TDT-0.6B-v2: One-Hour Audio Transcription in a Second
OpenCodeReasoning: Reasoning for Better Code Generation
Absolute Zero: Enabling Reasoning for LLMs, Without Data
RLVR (Reinforcement Learning with Verifiable Rewards) and GRPO are now very popular methods for training reasoning models. With these methods, LLMs improve reasoning using only outcome-level signals rather than supervised reasoning traces. Yet, they still rely on curated datasets made of reasoning tasks with verifiable outcomes.
Several recent papers suggest that pre-trained LLMs may already possess strong latent reasoning abilities that are surprisingly easy to activate. Just a few hundred well-curated examples and some light supervised fine-tuning could be enough to unlock these capabilities, as we saw in this article:
Do we even need data?
A new paper proposes Absolute Zero: a self-play-based framework where the model generates its own tasks and solves them, learning entirely through interaction with an environment that can validate outputs.
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
It’s conceptually similar to AlphaZero but applied to open-ended reasoning tasks instead of games. No learned reward model is involved; rewards come from the environment itself (in this case, a code executor), which provides stable and hack-resistant feedback.

This work focuses on code reasoning and uses a program interpreter as an environment to validate outputs. It learns via RL using a custom advantage estimator. Critically, it starts from a base model and improves purely through this self-generated, self-evaluated training loop.
The authors found that:
Coding priors help general reasoning, and models with code capabilities generalize better to math after this training.
Transfer across domains is much stronger with Absolute Zero compared to standard RLVR fine-tuning.
Cognitive behaviors like planning or trial-and-error emerge without supervision, suggesting the model builds internal procedures to match different reasoning demands.
The training increases token lengths selectively based on task type (e.g., more for abduction due to trial-and-error).
They also released their training code:
GitHubd: LeapLabTHU/Absolute-Zero-Reasoner
Parakeet-TDT-0.6B-v2: One-Hour Audio Transcription in a Second
Another area seeing rapid advancement is automatic speech recognition (ASR).
NVIDIA released the latest version of its open-source ASR model, Parakeet-TDT-0.6B-v2, and it's currently leading the Hugging Face Open ASR Leaderboard with a 6.05% word error rate (English only). While not quite at the level of top proprietary models like GPT-4o-transcribe, it’s competitive, especially considering it's available under a permissive CC-BY-4.0 license.
Under the hood, the model combines a FastConformer encoder with a TDT decoder and has 600 million parameters. It’s built for speed. The model can transcribe an hour of audio in just one second. I wonder what kind of applications could fully exploit such a transcription speed.
The training corpus, called Granary, includes around 120,000 hours of English speech data, mixing 10,000 hours of human-transcribed audio with a large amount of pseudo-labeled content from public datasets and web-scale sources. Nvidia says the dataset will be released following Interspeech 2025.
In terms of practical usage, the model supports punctuation, capitalization, and timestamping, making it suitable for tasks like subtitling, voice assistants, and transcription pipelines.
In the model card, NVIDIA shows how to use the model on your computer:
Hugging Face: nvidia/parakeet-tdt-0.6b-v2
OpenCodeReasoning: Reasoning for Better Code Generation
NVIDIA also released a new series of open-weight models for code generation this week, adding to the growing landscape of publicly available coding-capable LLMs.
The models are trained on a synthetic dataset (nvidia/OpenCodeReasoning), generared by DeepSeek R1, of over 736k Python solutions with reasoning traces across nearly 29k unique competitive programming problems.
Using this dataset, they fine-tuned Qwen2.5 models (7B, 14B, 32B) and achieved strong performance, even surpassing prior distilled models such as R1-Distill-Qwen. Notably, their 32B SFT-only model reached 61.8 pass@1 on LiveCodeBench, outperforming several OpenAI models and narrowing the gap with leading SFT+RL approaches.
Once more, these results suggest that, given a sufficiently large and diverse synthetic dataset with explicit reasoning traces, SFT-only training can remain highly competitive.
More details in the paper:
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
The Salt
The Salt is my other newsletter that takes a more scientific approach. In The Salt, I primarily feature short reviews of recent papers (for free), detailed analyses of noteworthy publications, and articles centered on LLM evaluation.
I published a new deep-dive about hybrid models, Nemotron-H:
I also reviewed in The Weekly Salt:
⭐AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations
Support The Kaitchup by becoming a Pro subscriber:
What You'll Get
Priority Support – Fast, dedicated assistance whenever you need it to fine-tune or optimize your LLM/VLM. I answer all your questions!
Lifetime Access to All the AI Toolboxes – Repositories containing Jupyter notebooks optimized for LLMs and providing implementation examples of AI applications.
Full Access to The Salt – Dive deeper into exclusive research content. Already a paid subscriber to The Salt? You’ll be refunded for the unused time!
Early Access to Research – Be the first to access groundbreaking studies and models by The Kaitchup.
30% Discount for Group Subscriptions – Perfect for teams and collaborators.
The Kaitchup’s Book – A comprehensive guide to LLM fine-tuning. Already bought it? You’ll be fully refunded!
All Benefits from Regular Kaitchup Subscriptions – Everything you already love, plus more. Already a paid subscriber? You’ll be refunded for the unused time!
How to Subscribe? Send me a direct message on Substack!
That’s all for this week.
If you like reading The Kaitchup, consider sharing it with friends and coworkers (there is a 20% (or 30% for Pro subscribers) discount for group subscriptions):
Have a nice weekend!