Yep, I had already done that, but the problem remains. In your Medium article about Phi-1.5, you mentioned this:
"The problem here is that phi-1.5 was pre-trained without padding and the implementation of MixFormerSequentialForCausalLM released by Microsoft with the model doesn’t support attention masking during training. In other words, we can’t properly fine-tune the model to learn when to stop generating. Pad tokens are interpreted as normal tokens. You would have to modify MixFormerSequentialForCausalLM to add support for the attention mask."
I'll write an article on Phi-2 fine-tuning for next week. Not sure whether I'll succeed to teach it when to stop generating but I have several ideas. I'll let you know here as soon as I have something that works.
I just LoRA-tuned Phi-2, but it refuses to stop generating until `max_new_tokens` is reached. Phi-1.5 suffered from the same problem. Do you know how to correct it?
Did you try to set "eos_token_id=tokenizer.eos_token_id" when calling "model.generate"?
For me, it works. The model stops generating when it generates the EOS token. Without that, the model generates the EOS token but ignores it and continues to generate.
The problem that remains is that it tends to never output the EOS token for several of my testing prompts. But maybe that's just because my model is under-trained to learn when to stop, so I'm fine-tuning it again.
Does it generate the EOS token and ignores it or it never generates the EOS token? (to see the EOS token, set skip_special_tokens=False when calling decode).
Currently, 1/4 of my testing prompts generate an EOS token.
Since Phi-2 doesn't seem to use an attention mask, 10k examples might not be enough to teach the model when to generate an EOS token.
Yep, I had already done that, but the problem remains. In your Medium article about Phi-1.5, you mentioned this:
"The problem here is that phi-1.5 was pre-trained without padding and the implementation of MixFormerSequentialForCausalLM released by Microsoft with the model doesn’t support attention masking during training. In other words, we can’t properly fine-tune the model to learn when to stop generating. Pad tokens are interpreted as normal tokens. You would have to modify MixFormerSequentialForCausalLM to add support for the attention mask."
Is the same true with Phi-2?
https://medium.com/@bnjmn_marie/how-to-fine-tune-quantize-and-run-microsoft-phi-1-5-e14a1e22ec12
I didn't try phi-2, yet. I would guess this is still true. I'll investigate and will reply if I find something.
I'm still stumped. Have you taken a crack at instruction finetuning Phi-2 yet? I do see a finetuned chat model on HF, but I haven't played with it yet: https://huggingface.co/cognitivecomputations/dolphin-2_6-phi-2
I'll write an article on Phi-2 fine-tuning for next week. Not sure whether I'll succeed to teach it when to stop generating but I have several ideas. I'll let you know here as soon as I have something that works.
This looks interesting. https://huggingface.co/microsoft/phi-1_5/commit/de35f900d3fbba84d3f7c9a72e60488fa2c86221
as does this: https://huggingface.co/microsoft/phi-1_5/commit/3128bb636a3de36f8204901e4310c4449a2c6ddc
I just LoRA-tuned Phi-2, but it refuses to stop generating until `max_new_tokens` is reached. Phi-1.5 suffered from the same problem. Do you know how to correct it?
When you load the tokenizer, do you set "add_eos_token=True" ? This adds eos to all the training examples.
Did you try to set "eos_token_id=tokenizer.eos_token_id" when calling "model.generate"?
For me, it works. The model stops generating when it generates the EOS token. Without that, the model generates the EOS token but ignores it and continues to generate.
The problem that remains is that it tends to never output the EOS token for several of my testing prompts. But maybe that's just because my model is under-trained to learn when to stop, so I'm fine-tuning it again.
I just tried adding `eos_token = tokenizer.eos_token` and `eos_token_id = tokenizer.eos_token_id` in every possible place:
* AutoModelForCausalLM.from_pretrained()
* model.config
* model.generation_config
* TextGenerationPipeline()
None of them worked. :(
My PEFT adapter was trained with over 10,000 examples. :'(
Does it generate the EOS token and ignores it or it never generates the EOS token? (to see the EOS token, set skip_special_tokens=False when calling decode).
Currently, 1/4 of my testing prompts generate an EOS token.
Since Phi-2 doesn't seem to use an attention mask, 10k examples might not be enough to teach the model when to generate an EOS token.
How many examples do you think are required?
Difficult question... I would say at least 1 epoch over 50k examples for instance
Oh, duh. I did do that. :)
tokenizer = AutoTokenizer.from_pretrained(
'microsoft/phi-2',
trust_remote_code = True,
add_bos_token = False,
add_eos_token = True,
padding_side = 'right',
)
if not tokenizer.pad_token:
tokenizer.pad_token = tokenizer.unk_token
tokenizer.pad_token_id = tokenizer.unk_token_id
No, that didn't work for me. Would it be possible to end all of my training examples with '<|endoftext|>'?
If you set "add_eos_token=True" when you load the tokenizer, it automatically adds '<|endoftext|>' (the EOS token) to all your training examples.
Woo hoo! Looking forward to it. :)