I think overall, DeepSpeed Chat is weird regarding LoRA recommendations... Their default example uses LoRA while keeping the entire base model unfrozen. I think I never saw that before and I wonder what is the benefit of such a configuration. I tried with and without freezing the base model, and I can say that keeping it frozen is much better.
Great series.
Btw that’s a huge Lora parameter compared to the LoRA paper (4) but I suppose it’s good if deepspeed recommends...?
I think overall, DeepSpeed Chat is weird regarding LoRA recommendations... Their default example uses LoRA while keeping the entire base model unfrozen. I think I never saw that before and I wonder what is the benefit of such a configuration. I tried with and without freezing the base model, and I can say that keeping it frozen is much better.
yeah that completely goes against the LoRA paper to unfreeze the base model.. I don't get it