The Weekly Kaitchup #37

Benjamin Marie

Apr 19, 2024

Llama 3 - Mixtral-8x22B - Megalodon - WizardLM-2

Read →

21 Comments

Rui

Apr 19, 2024

Thanks for the update. What do you think the right pad/unk token we should use for llama3?

Expand full comment

Reply (1)

Benjamin Marie

Apr 20, 2024

I would do:

tokenizer.pad_token = tokenizer.unk_token

Expand full comment

Reply (1)

Rui

Apr 20, 2024

However there is no unk token

Expand full comment

Reply (1)

Benjamin Marie

Apr 20, 2024

Indeed! For my current experiments with Llama 3, I'm setting the EOS token as pad token, it seems to work well.

There isn't any cheap alternative I think.

Expand full comment

Reply (1)

Matt

Apr 21, 2024Edited

When LoRA-tuning, I've found that using the eos_token as the pad_token makes an LLM unable to stop generating properly. It'll just keep spewing nonsense once your question has been addressed, until it reaches `max_tokens`. What if we instead used one of the 250 reserved special tokens in the Llama3 tokenizer?

tokenizer.pad_token = '<|reserved_special_token_250|>'

tokenizer.pad_token_id = 128255

This worked for me the first time I tried it on Thursday. I'm trying it again as we speak with a much more complex dataset.

I also tried adding a <|pad|> token to the tokenizer, calling model.resize_token_embeddings(len(tokenizer)), targeting lm_head, and saving the lm_head & embed_tokens modules, but that didn't go so well. I must have done something wrong, being my first time trying that.

Expand full comment

Reply (1)

Matt

Apr 21, 2024

Argh. Using <|reserved_special_token_250|> didn't work with this second attempt. It too is unable to stop generating properly.

Expand full comment

Reply (1)

Benjamin Marie

Apr 21, 2024

Do you fine-tune the model for long enough? Using the EOS token should work fine. In my current experiments, inference seems to stop when appropriate.

Expand full comment

Trelis Research

Apr 19, 2024

Nice weekly summary. Just some comments/Qs on your notebook 14:

1. Do you have a reference for the dequanting code? Or, did you have to develop it from scratch?

2. I notice this line in the dequanting function:

```

def dequantize_model(model, to='./dequantized_model', dtype=torch.float16, device="cuda"):

"""

'model': the peftmodel you loaded with qlora.

```

When you say 'model', do you in fact mean the base model OR the peftmodel? It seems to me the dequanting function expects a base model (but maybe it works with a peft model too?)

3. The dequantization and merging cell doesn't specify an adapter as an input (so I assume the adapter has been specified earlier in the code). I wonder if it would be better to explicitly set (or reset) the adapter in that cell, to make things more clear?

Expand full comment

Reply (1)

Benjamin Marie

Apr 19, 2024

Thanks!

1. The reference for the code is in the article. Following your comment, I also added it in the notebook.

2. This is the base model. The comment is misleading here.

3. I added in the cell the initialisation of the expected variables: base model, adapter, and compute dtype

Expand full comment

Reply (1)

Trelis Research

Apr 22, 2024

many thanks

Expand full comment

Matt

Apr 22, 2024

I'm getting nowhere fast with LoRA-tuning Llama3-8B. I'm going to give it a rest for now—until I see a notebook of yours. In the name of science, I might also try again with a product like Axolotl, Unsloth, or AutoTrain. Hmm, I had these same problems trying to train Phi. Maybe there's something fundamentally wrong with me and my roll-your-own "LoraTuner" class? Fine-tuning Mistral sure does work flawlessly for me, though—for all five of my PEFT adapters.

Expand full comment

Matt

Apr 21, 2024

I set padding_side='right' because SFTTrainer complains if you don't. Not sure why I set add_bos_token=True. Why would you want an eos token without a bos token, though? Heh, I guess if you're conflating eos and pad tokens, then it could very well be a moot point?

Expand full comment

Reply (1)

Matt

Apr 21, 2024

Oh, duh. If you're padding on the left side with eos tokens, then the actual eos_token on the right wouldn't be conflated with padding tokens....unless you're working with a right-to-left language like Arabic.

Expand full comment

Reply (1)

Benjamin Marie

Apr 21, 2024

In theory, yes. But in practice, no. Most frameworks completely ignores the pad token whatever it is. You can try it with HF transformers: if you pad left with eos tokens, it shouldn't generate anything in theory but actually it works fine.

Expand full comment

Reply (1)

Matt

Apr 21, 2024

/home/matt/miniconda3/envs/nlp/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:318: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.

Expand full comment

Reply (1)

Benjamin Marie

Apr 21, 2024

I completely ignores this warning. I don't see any problem padding left when using float32 or bfloat16 data types for training.

Expand full comment

Matt

Apr 21, 2024

I did four epochs with 13,000 examples, which is more than enough when fine-tuning Mistral.

Expand full comment

Reply (1)

Matt

Apr 21, 2024

learning_rate = 2e-4

lr_scheduler_type = 'linear'

target_modules = 'all-linear'

Expand full comment

Reply (1)

Benjamin Marie

Apr 21, 2024

How long are your examples and what is your max_seq_len? If it's too short, the EOS token will be truncated.

Also, I assume that you set "add_eos_token=True," when instantiating the tokenizer.

Expand full comment

Reply (1)

Matt

Apr 21, 2024

I set max_seq_len = 1_024 because my prompts and their responses are both rather short. Works great with Mistral.

self.tokenizer = AutoTokenizer.from_pretrained(

self.model_id,

trust_remote_code = True,

add_bos_token = True,

add_eos_token = True,

padding_side = 'right'

)

Expand full comment

Reply (1)

Benjamin Marie

Apr 21, 2024

I only see two significant differences with my current config. I set padding side to left (to use FlashAttention) and I don't add the bos token. Is there a particular reason for add_bos_token = True?

Expand full comment

The Kaitchup – AI on a Budget

The Weekly Kaitchup #37