Discussion about this post

User's avatar
Brian Hostetler's avatar

I got excited, thinking you had released a quantized version of Qwen 3.5 357B that would run on VLLM. But alas, I was mistaken.

Nick Jenkins's avatar

Quick follow-up question...

If Qwen3.5-397B quantizes beautifully with UD-IQ2_M, did you try Qwen3.5-122B-A10B or any of the mid-sized Qwens (35B-A3B or 27B)? Question being...does the quality largely remain intact, because if so, that could be an amazing opportunity. 2bit quantization could offset bandwidth limitations on DGX Spark, or memory size constraints on 5090)

7 more comments...

No posts

Ready for more?