3 Comments
⭠ Return to thread

Great series.

Btw that’s a huge Lora parameter compared to the LoRA paper (4) but I suppose it’s good if deepspeed recommends...?

Expand full comment

I think overall, DeepSpeed Chat is weird regarding LoRA recommendations... Their default example uses LoRA while keeping the entire base model unfrozen. I think I never saw that before and I wonder what is the benefit of such a configuration. I tried with and without freezing the base model, and I can say that keeping it frozen is much better.

Expand full comment

yeah that completely goes against the LoRA paper to unfreeze the base model.. I don't get it

Expand full comment
Error