Hi Everyone,
In this edition of The Weekly Kaitchup:
Magistral: Mistral AI’s First Open Reasoning Model
Text-to-LoRA: Generate LoRA Adapters Without Additional Training
Magistral: Mistral AI’s First Open Reasoning Model
We’d been waiting for it, and this week, Mistral AI finally delivered:
This is a 24B parameter reasoning model built on Mistral Small 3.1. It generates detailed reasoning traces followed by concise answers, achieving state-of-the-art performance among open models of this size.
The model was trained using an optimized form of Group Relative Policy Optimization (GRPO), with several efficiency improvements, including removal of the KL penalty and asynchronous generation.
Reinforcement learning is guided by a structured reward system that emphasizes formatting, correctness, and language consistency.
Efficient GRPO training: Removes KL penalty and reference model, uses group-based advantage estimation, and improves sample efficiency.
Structured reward function: Evaluates formatting, correctness (via symbolic math/code tests), output length, and language consistency.
Asynchronous inference pipeline: Enables constant generation with GPU-to-GPU syncing to reduce idle time and memory overhead.
Curated reasoning dataset: Includes 38k math and 35k code problems, filtered and verified for difficulty, correctness, and formatting.
RL + SFT synergy: RL significantly boosts small model performance even post-distillation, with strong cross-domain and multilingual generalization.
GRPO is clearly becoming the new standard. Most recent models adopt it, often with targeted optimizations to improve efficiency and stability.
In The Salt, I published a deep dive exploring how Mistral AI trained this model:
Use Magistral as-is. It’s a post-trained model. Fine-tuning it further may disrupt or undo the benefits of its reinforcement learning and post-training optimizations.
Text-to-LoRA: Generate LoRA Adapters Without Additional Training
T2L (Text-to-LoRA) is a hypernetwork-based method for generating LoRA (Low-Rank Adaptation) adapters for unseen tasks based solely on natural language task descriptions. The goal is to enable zero-shot task adaptation in LLMs without retraining or manually designing adapters for each new task.
I didn’t have the time to try it yet, but this seems very interesting if it really works!
LoRA methods, while parameter-efficient, are inflexible: they require separate fine-tuning for each downstream task and offer limited cross-task transferability. Although recent techniques have explored compressing or combining multiple LoRA modules (e.g., via matrix decomposition or routing), they depend on explicit structural constraints and are not readily generalizable.
I tried combining multiple LoRA adapters a long time ago:
T2L addresses these limitations by treating LoRA generation as a text-conditioned mapping learned by a hypernetwork. It is trained either through adapter reconstruction, i.e., distilling many pre-trained LoRAs, or through supervised fine-tuning (SFT) on a diverse set of tasks from the Super Natural Instructions (SNI) dataset. Once trained, T2L takes a natural language description of a new task and instantly produces a corresponding LoRA adapter with no additional training.
Empirically, T2L shows strong performance across multiple benchmarks. It outperforms multitask LoRA baselines. The system maintains task performance even at high compression ratios and seems to scale well as more training datasets are used.
The method is described in this paper:
Text-to-LoRA: Instant Transformer Adaption
The code is here:
GitHub: SakanaAI/text-to-lora (Apache 2.0)
That’s definitely something I’m excited to try out!
Note: This is the first time I’ve come across the word “adaption.” I actually wondered whether it was correct English. Grammarly flagged it when I wrote this, and I suspect Overleaf did the same when they wrote the paper. Personally, I’d stick with “adaptation.”
The Salt
The Salt is my other newsletter that takes a more scientific approach. In The Salt, I primarily feature short reviews of recent papers (for free), detailed analyses of noteworthy publications, and articles centered on LLM evaluation.
I reviewed in The Weekly Salt:
⭐OpenThoughts: Data Recipes for Reasoning Models
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
Inference-Time Hyper-Scaling with KV Cache Compression
That’s all for this week.
If you like reading The Kaitchup, consider sharing it with friends and coworkers (there is a 20% discount for group subscriptions):
Have a nice weekend!
Great! Looking forward to your T2L practice!