Hi Everyone,
In this edition of The Weekly Kaitchup:
The Byte-Latent Transformers
AlphaEvolve
Unsloth’s Recipe for GRPO Applied to Base LLMs
The Byte-Latent Transformers
The AI research team at Meta frequently publishes papers, but the accompanying code and datasets are often delayed and released in batches later on.
One such batch just dropped this week:
Advancing AI systems through progress in perception, localization, and reasoning
Meta released several notable models, including a perception model and one designed for 3D object localization, especially relevant for robotics applications.
One model that stands out to me is the Dynamic Byte Latent Transformer (BLT), originally developed for a paper they published last year. It's particularly interesting for its approach to efficient, byte-level representation learning.
Byte Latent Transformer: Patches Scale Better Than Tokens
In other words, this is a tokenizer-free language model architecture that learns directly from raw byte sequences. Unlike standard LLMs, which rely on fixed tokenization schemes that introduce compression biases and language inequities, it dynamically segments input bytes into variable-length patches based on information entropy. These patches are encoded into latent representations and processed by a combination of lightweight local transformers and a global latent transformer, allowing for more efficient and adaptive compute allocation.
It appears to match the performance of token-based architectures at scale, while achieving up to 50% lower inference FLOPs compared to similarly sized models. Meta has released both 1B and 7B parameter versions.
As is often the case with Meta’s research releases, these models come with non-commercial licenses.
Meta’s research teams consistently publish promising ideas, many of which improve model accuracy or efficiency, but for some reason, these advances rarely make their way into the Llama model family.
AlphaEvolve
Introduced by Google DeepMind, AlphaEvolve is a language-model-driven system for program synthesis and optimization in domains where solutions can be automatically evaluated. It combines evolutionary algorithms with LLM-based code generation, applying this framework to tasks in algorithm discovery, mathematics, and systems optimization.
The core loop operates as follows:
Task Specification: Users define an evaluation function h that maps a program to one or more scalar metrics, implemented as a Python function with a fixed I/O signature. Programs are annotated with
# EVOLVE-BLOCK-START
and# EVOLVE-BLOCK-END
to mark editable regions.Prompt Construction: A prompt is built using prior high-performing programs from a program database, rendered evaluation results, and optional human-provided or LLM-generated meta-instructions. Prompts may include stochastic formatting and explicit context (e.g., equations, prior literature).
LLM Code Diff Generation: The prompt is passed to an ensemble of LLMs (Gemini 2.0 Flash for fast throughput, Gemini 2.0 Pro for higher-quality suggestions), which return diffs in a
SEARCH/REPLACE
format. These diffs are applied to generate new candidate programs.Evaluation and Selection: Candidate programs are executed using the user-defined evaluation function. Their results are logged, and the best ones are retained in the database for future prompt sampling.
AlphaEvolve supports multiple abstraction levels: evolving raw strings, constructor functions, search algorithms, or co-evolving algorithms and intermediate solutions. This is a very complex but flexible framework.
Applications demonstrated include improving matrix multiplication algorithms (e.g., outperforming Strassen's 4×4 complex case), solving constructive math problems (e.g., Kissing Numbers in 11D), and optimizing real-world engineering components like scheduling heuristics and TPU kernels. Evaluation is entirely automatic, which is both a strength (enabling scale) and a limitation (excluding tasks needing manual feedback).
The paper is here:
AlphaEvolve: A coding agent for scientific and algorithmic discovery
Unsloth’s Recipe for GRPO Applied to Base LLMs
Unsloth has released a new tutorial notebook demonstrating how to train a Qwen3 base model using GRPO. It's well-structured, beginner-friendly, and fully runnable on free Google Colab, making it an excellent hands-on resource for anyone looking to explore GRPO fine-tuning in practice.
The notebook particularly highlights the importance of running a short supervised fine-tuning (SFT) phase before applying GRPO. This initial SFT step helps the model internalize the chat template, which in turn makes GRPO training more effective. This aligns with observations from DeepSeek AI during the training of R1, showing that SFT is a crucial step for successful GRPO.
GRPO is effective, but it's worth noting that an increasing number of studies show that even a brief SFT phase can unlock basic reasoning capabilities in models. While the performance won't match that of GRPO, short SFT remains a lightweight and practical alternative for enabling reasoning behavior.
The Salt
The Salt is my other newsletter that takes a more scientific approach. In The Salt, I primarily feature short reviews of recent papers (for free), detailed analyses of noteworthy publications, and articles centered on LLM evaluation.
I published a deep dive about hybrid models, Nemotron-H:
I also reviewed in The Weekly Salt:
⭐Learning Dynamics in Continual Pre-Training for Large Language Models
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
DanceGRPO: Unleashing GRPO on Visual Generation
Support The Kaitchup by becoming a Pro subscriber:
What You'll Get
Priority Support – Fast, dedicated assistance whenever you need it to fine-tune or optimize your LLM/VLM. I answer all your questions!
Lifetime Access to All the AI Toolboxes – Repositories containing Jupyter notebooks optimized for LLMs and providing implementation examples of AI applications.
Full Access to The Salt – Dive deeper into exclusive research content. Already a paid subscriber to The Salt? You’ll be refunded for the unused time!
Early Access to Research – Be the first to access groundbreaking studies and models by The Kaitchup.
30% Discount for Group Subscriptions – Perfect for teams and collaborators.
The Kaitchup’s Book – A comprehensive guide to LLM fine-tuning. Already bought it? You’ll be fully refunded!
All Benefits from Regular Kaitchup Subscriptions – Everything you already love, plus more. Already a paid subscriber? You’ll be refunded for the unused time!
How to Subscribe?
That’s all for this week.
If you like reading The Kaitchup, consider sharing it with friends and coworkers (there is a 20% (or 30% for Pro subscribers) discount for group subscriptions):
Have a nice weekend!