As fast as AWQ, but more accurate
Nice piece, as usual!
Am I reading the results correctly? I see that AWQ is only a little bit higher in terms of perplexity.
Yes, that's correct. Awq is only slightly but consistently behind, except for the 7b 3-bit where the gap is larger.
vLLM recently optimized its use of AWQ. I wonder if/when they'll do the same for SqueezeLLM. https://github.com/vllm-project/vllm/pull/2566
Nice piece, as usual!
Am I reading the results correctly? I see that AWQ is only a little bit higher in terms of perplexity.
Yes, that's correct. Awq is only slightly but consistently behind, except for the 7b 3-bit where the gap is larger.
vLLM recently optimized its use of AWQ. I wonder if/when they'll do the same for SqueezeLLM. https://github.com/vllm-project/vllm/pull/2566