Any metrics on Ornith-35B Moe? I'm hoping if the 9B is competitive to qwen-3.5-35B that maybe the 35B MoE is competitive or better than Qwen-3.6. To wit, I still see the industry comparing a lot to Qwen-3.5, but Qwen-3.6 is much better in practice...3.5 feels not quite good enough for agentic coding (just below the bar) and 3.6 feels good enough (above the bar)...so 3.5 comparisons feel a lot less useful...we have to infer the delta between 3.5 and 3.6 and the apply to the new competitor.
Any metrics on Ornith-35B Moe? I'm hoping if the 9B is competitive to qwen-3.5-35B that maybe the 35B MoE is competitive or better than Qwen-3.6. To wit, I still see the industry comparing a lot to Qwen-3.5, but Qwen-3.6 is much better in practice...3.5 feels not quite good enough for agentic coding (just below the bar) and 3.6 feels good enough (above the bar)...so 3.5 comparisons feel a lot less useful...we have to infer the delta between 3.5 and 3.6 and the apply to the new competitor.
They published comparisons 35b against Qwen3.6.
https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B
it seems better. but one important metric is missing: token efficieny
I'm very curious to know whether ornith-1.0 consumes fewer tokens than Qwen3.5/3.6
Great point about LFM2.5-230M's GPQA-Diamond score, we could've specified it in the text!