Comments - Train Instruct LLMs On Your GPU with DeepSpeed Chat

3 Comments

⭠ Return to thread

Ronan McGovern

Sep 15, 2023

Great series.

Btw that’s a huge Lora parameter compared to the LoRA paper (4) but I suppose it’s good if deepspeed recommends...?

Expand full comment

I think overall, DeepSpeed Chat is weird regarding LoRA recommendations... Their default example uses LoRA while keeping the entire base model unfrozen. I think I never saw that before and I wonder what is the benefit of such a configuration. I tried with and without freezing the base model, and I can say that keeping it frozen is much better.

Expand full comment

Reply (1)

Ronan McGovern

Sep 15, 2023

yeah that completely goes against the LoRA paper to unfreeze the base model.. I don't get it

Expand full comment

The Kaitchup – AI on a Budget