When LoRA-tuning, I've found that using the eos_token as the pad_token makes an LLM unable to stop generating properly. It'll just keep spewing nonsense once your question has been addressed, until it reaches `max_tokens`. What if we instead used one of the 250 reserved special tokens in the Llama3 tokenizer?
This worked for me the first time I tried it on Thursday. I'm trying it again as we speak with a much more complex dataset.
I also tried adding a <|pad|> token to the tokenizer, calling model.resize_token_embeddings(len(tokenizer)), targeting lm_head, and saving the lm_head & embed_tokens modules, but that didn't go so well. I must have done something wrong, being my first time trying that.
When you say 'model', do you in fact mean the base model OR the peftmodel? It seems to me the dequanting function expects a base model (but maybe it works with a peft model too?)
3. The dequantization and merging cell doesn't specify an adapter as an input (so I assume the adapter has been specified earlier in the code). I wonder if it would be better to explicitly set (or reset) the adapter in that cell, to make things more clear?
I'm getting nowhere fast with LoRA-tuning Llama3-8B. I'm going to give it a rest for now—until I see a notebook of yours. In the name of science, I might also try again with a product like Axolotl, Unsloth, or AutoTrain. Hmm, I had these same problems trying to train Phi. Maybe there's something fundamentally wrong with me and my roll-your-own "LoraTuner" class? Fine-tuning Mistral sure does work flawlessly for me, though—for all five of my PEFT adapters.
I set padding_side='right' because SFTTrainer complains if you don't. Not sure why I set add_bos_token=True. Why would you want an eos token without a bos token, though? Heh, I guess if you're conflating eos and pad tokens, then it could very well be a moot point?
Oh, duh. If you're padding on the left side with eos tokens, then the actual eos_token on the right wouldn't be conflated with padding tokens....unless you're working with a right-to-left language like Arabic.
In theory, yes. But in practice, no. Most frameworks completely ignores the pad token whatever it is. You can try it with HF transformers: if you pad left with eos tokens, it shouldn't generate anything in theory but actually it works fine.
/home/matt/miniconda3/envs/nlp/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:318: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.
I only see two significant differences with my current config. I set padding side to left (to use FlashAttention) and I don't add the bos token. Is there a particular reason for add_bos_token = True?
Thanks for the update. What do you think the right pad/unk token we should use for llama3?
I would do:
tokenizer.pad_token = tokenizer.unk_token
However there is no unk token
Indeed! For my current experiments with Llama 3, I'm setting the EOS token as pad token, it seems to work well.
There isn't any cheap alternative I think.
When LoRA-tuning, I've found that using the eos_token as the pad_token makes an LLM unable to stop generating properly. It'll just keep spewing nonsense once your question has been addressed, until it reaches `max_tokens`. What if we instead used one of the 250 reserved special tokens in the Llama3 tokenizer?
tokenizer.pad_token = '<|reserved_special_token_250|>'
tokenizer.pad_token_id = 128255
This worked for me the first time I tried it on Thursday. I'm trying it again as we speak with a much more complex dataset.
I also tried adding a <|pad|> token to the tokenizer, calling model.resize_token_embeddings(len(tokenizer)), targeting lm_head, and saving the lm_head & embed_tokens modules, but that didn't go so well. I must have done something wrong, being my first time trying that.
Argh. Using <|reserved_special_token_250|> didn't work with this second attempt. It too is unable to stop generating properly.
Do you fine-tune the model for long enough? Using the EOS token should work fine. In my current experiments, inference seems to stop when appropriate.
Nice weekly summary. Just some comments/Qs on your notebook 14:
1. Do you have a reference for the dequanting code? Or, did you have to develop it from scratch?
2. I notice this line in the dequanting function:
```
def dequantize_model(model, to='./dequantized_model', dtype=torch.float16, device="cuda"):
"""
'model': the peftmodel you loaded with qlora.
```
When you say 'model', do you in fact mean the base model OR the peftmodel? It seems to me the dequanting function expects a base model (but maybe it works with a peft model too?)
3. The dequantization and merging cell doesn't specify an adapter as an input (so I assume the adapter has been specified earlier in the code). I wonder if it would be better to explicitly set (or reset) the adapter in that cell, to make things more clear?
Thanks!
1. The reference for the code is in the article. Following your comment, I also added it in the notebook.
2. This is the base model. The comment is misleading here.
3. I added in the cell the initialisation of the expected variables: base model, adapter, and compute dtype
many thanks
I'm getting nowhere fast with LoRA-tuning Llama3-8B. I'm going to give it a rest for now—until I see a notebook of yours. In the name of science, I might also try again with a product like Axolotl, Unsloth, or AutoTrain. Hmm, I had these same problems trying to train Phi. Maybe there's something fundamentally wrong with me and my roll-your-own "LoraTuner" class? Fine-tuning Mistral sure does work flawlessly for me, though—for all five of my PEFT adapters.
I set padding_side='right' because SFTTrainer complains if you don't. Not sure why I set add_bos_token=True. Why would you want an eos token without a bos token, though? Heh, I guess if you're conflating eos and pad tokens, then it could very well be a moot point?
Oh, duh. If you're padding on the left side with eos tokens, then the actual eos_token on the right wouldn't be conflated with padding tokens....unless you're working with a right-to-left language like Arabic.
In theory, yes. But in practice, no. Most frameworks completely ignores the pad token whatever it is. You can try it with HF transformers: if you pad left with eos tokens, it shouldn't generate anything in theory but actually it works fine.
/home/matt/miniconda3/envs/nlp/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:318: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.
I completely ignores this warning. I don't see any problem padding left when using float32 or bfloat16 data types for training.
I did four epochs with 13,000 examples, which is more than enough when fine-tuning Mistral.
learning_rate = 2e-4
lr_scheduler_type = 'linear'
target_modules = 'all-linear'
How long are your examples and what is your max_seq_len? If it's too short, the EOS token will be truncated.
Also, I assume that you set "add_eos_token=True," when instantiating the tokenizer.
I set max_seq_len = 1_024 because my prompts and their responses are both rather short. Works great with Mistral.
self.tokenizer = AutoTokenizer.from_pretrained(
self.model_id,
trust_remote_code = True,
add_bos_token = True,
add_eos_token = True,
padding_side = 'right'
)
I only see two significant differences with my current config. I set padding side to left (to use FlashAttention) and I don't add the bos token. Is there a particular reason for add_bos_token = True?