Discussion about this post

User's avatar
Matt's avatar

Yes, I'm using SFTTrainer. I have never tried `packing=True` due to the incredibly poorly written documentation on the matter: "Note that if you use a packed dataset and if you pass max_steps in the training arguments you will probably train your models for more than few epochs, depending on the way you have configured the packed dataset and the training protocol. Double check that you know and understand what you are doing."

I just trained a new Qwen1.5 adapter with packing set to True. It did seem to help a little bit. Some predictions stopped generating correctly; some didn't. However, I don't know (or understand) what I am doing with packing.

Expand full comment
Matt's avatar

Just like with Phi, when I LoRA-tune Qwen-1.5 7B, it won't stop generating text until max_tokens is reached. Mistral 7B definitely doesn't have this issue. I'm using these arguments with no quantization and 10k training examples:

learning_rate = 2e-4

lr_scheduler_type = 'linear'

num_train_epochs = 5

warmup_ratio = 0.0

weight_decay = 0.01

optim = 'adamw_torch_fused'

target_modules = 'all-linear'

bf16 = True

Expand full comment
3 more comments...

No posts