QLoRA is (slightly) better at least thanks to the superior quantization data type. In the QLoRA paper they show that nf4 is more accurate than int4.
----
QA-LoRA fine tunes LoRA for already quantized LLMs. The base LLM can't be fp16. That being said, it should be possible to dequantize the final merged model to fp16 and then awq it. It might work. It's an interesting experiment to try.
Why is QLoRA better than the base model? Because it has more training?
Nice piece btw, I’ve yet to dig into QA LoRA and am def hoping it allows for decent bf16 models that can then be AWQd.
QLoRA is (slightly) better at least thanks to the superior quantization data type. In the QLoRA paper they show that nf4 is more accurate than int4.
----
QA-LoRA fine tunes LoRA for already quantized LLMs. The base LLM can't be fp16. That being said, it should be possible to dequantize the final merged model to fp16 and then awq it. It might work. It's an interesting experiment to try.