Discussion about this post

User's avatar
Sai Harish Pathuri's avatar

Hi Benjamin, I am facing trouble while testing the model after training 3 epochs for my custom dataset. I have followed same steps as you did.

While generation in test time, the code throws an error:

ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to call `tokenizer.padding_side = 'left'` before tokenizing the input.

How to fix this?

Expand full comment
Alex Grishin's avatar

Hi Benjamin! I'm a newbie in quantization. Can I ask a very basic and a very general question? When I follow your instructions from this article and load a quantized model with

model = AutoModelForCausalLM.from_pretrained(

model_name, quantization_config=bnb_config

)

the loaded model has an expected size of 3.84GB, but number of model parameters are surprising to me:

Trainable parameters: 262410240

Total parameters: 3752071168

How come quantization reduced the total number of parameters from 7.2B to 3.7B? Shouldn't the total number of model parameters stay the same after quantization?

Expand full comment
22 more comments...

No posts