LoRA but with Only 13 Parameters??
The Weekly Kaitchup #129
Hi everyone,
In this edition of The Weekly Kaitchup, we discuss:
TinyLoRA: Are 13 Parameters Really Enough, or Is It Just Qwen, Again?
GGUFs for Qwen3-Coder-Next: Are They Good?
TinyLoRA: Are 13 Parameters Really Enough, or Is It Just Qwen, Again?
Researchers from Meta’s Fundamental AI Research unit and academic collaborators (John X. Morris is the main author) say they can boost an LLM’s math “reasoning” by updating just 13 parameters, about 26 bytes in bfloat16, rather than retraining billions of weights.
The work, posted on arXiv this week and not peer-reviewed, targets a long-running practical problem: reinforcement-learning “reasoning” runs are expensive, and even parameter-efficient adapters such as LoRA typically still require millions of trainable values on modern 7–8 billion-parameter models.
Learning to Reason in 13 Parameters
The authors say that conventional LoRA hits a floor: because its update matrices scale with the model’s hidden width, even “rank-1” LoRA can still mean millions of parameters on an 8B model. That’s true, but we can still cut the number of trainable parameters further by tuning LoRA’s setup: for example, applying it only to selected modules or a subset of layers or using more parameter-efficient variants like VeRA.
Starting from the view of a LoRA update as a recombination of a layer’s top singular directions (via a truncated SVD), their method, TinyLoRA, replaces the small trainable matrix with an even smaller “trainable vector” that is projected through a fixed random tensor back into the needed shape. The scheme can also tie that vector across many modules and layers, so the total trainable parameters can drop to single digits, in the extreme case, one.
The idea is very simple and explores mechanics already seen in some LoRA variants, such as the use of random vectors and trainable vectors.
Does it work?
The experiments are interesting but incomplete.
If you’ve been following recent RL-for-reasoning work, the takeaway won’t surprise you: doing RL on Qwen for math reliably helps, and this paper is essentially another confirmation of that pattern.
On GSM8K, they report Qwen2.5-7B-Instruct starting around 88% and climbing to ~91% with TinyLoRA while updating only 13 parameters, with accuracy continuing to improve as the adapter budget increases.
A key result of the paper is that at these ultra-tiny update sizes, RL is much more effective than SFT.
Table 2 shows the same trend more broadly: Qwen2.5-7B-Instruct increases its average score across the listed benchmarks:
The authors emphasize that their strongest claims are demonstrated only on math-style reasoning with verifiable rewards, and may not transfer to domains such as science or creative writing where reward signals are noisier.
They also flag the major architecture-dependence: Qwen models consistently respond better than Llama at the same tiny parameter budgets. In the Discussion, they state that Qwen-2.5 models often need around 10x fewer updated parameters than Llama-3 to reach comparable performance, and they do not pin down whether this comes from architectural details, pretraining data, or post-training differences. So, we still have no answer to the big mystery of why Qwen can learn Math so easily with RL.
Nonetheless, the main point of the paper seems to be true even for Llama 3: The model learns with a few bytes of trainable parameters. This is not very useful in practice, now, but probably future work will be able to leverage that finding.
GGUFs for Qwen3-Coder-Next: Are They Good?
The community loves coding models, and it also loves the GGUF format, so it’s no surprise that people are searching for the best/smallest GGUF builds of Qwen3-Coder-Next.
Between major providers like bartowski (24 variants) and unsloth (26 variants), plus the official GGUF versions made by Qwen (5 variants), there’s a ton of choice.
That’s great, but beyond “it’s smaller” and “it uses X quantization,” it’s still hard to know how good these GGUF variants actually are.
The usual rule of thumb is: Q4 is “good enough,” and once you dip into Q3* or Q2*, accuracy tends to fall off a cliff. That might be true, or it might be overly conservative. Maybe you can push compression further, save memory, and speed things up without paying much in quality. Hard to say without data.
So I decided to test them with real benchmarks: HumanEval (mostly as a sanity check, since this one is small and pretty easy these days) and LiveCodeBench (much tougher, with more realistic coding tasks).
The bigger issue is that almost nobody benchmarks GGUF models, largely because we don’t have a great tooling stack for it. llama.cpp (and most derivatives) isn’t designed for the kind of high-throughput, parallelized inference you want for benchmarking, so evaluations are slow and therefore expensive, often more so than benchmarking an unquantized model in a standard framework. Dequantizing GGUF back into the original format while preserving the quantization behavior is possible, but it’s fiddly and easy to break (e.g., key mismatches or architecture quirks that leave you with a partially unusable checkpoint). I’m not aware of any framework that makes this painless end-to-end.
Anyway, I ran the benchmarks, but limited the scope to Q4_K_M and Q3_K_M:
The good news is that they are safe to use if you go as low as Q4. The accuracy didn’t drop sharply. Use the Unsloth version rather than the official one from Qwen.
However, if you go below Q4, expect a sharp drop in accuracy, here, -7 points on Live Code Bench v6. You will find smaller and better models than this.
The Salt
The Salt is my other newsletter that takes a more scientific approach. In The Salt, I primarily feature short reviews of recent papers (for free), detailed analyses of noteworthy publications, and articles centered on LLM evaluation.
This week, we review:
⭐Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning
On the Limits of Layer Pruning for Generative Reasoning in LLMs
That’s all for this week.
If you like reading The Kaitchup, consider sharing it with friends and coworkers (there is a 20% discount for group subscriptions):
Have a nice weekend!







