Discussion about this post

User's avatar
Mickael BARON's avatar

Hi Benjamin, I suppose there is an error here ‘If you train TinyLlama for one epoch on openassistant-guanaco which contains 9,846 steps, with a batch size of 8, it yields 1,231 training steps’. It is 9,846 examples and not 9,846 steps, isn’t it?

Expand full comment
Alex Grishin's avatar

What about learning rate scheduler after the warmup? Shall we use rate decay (cosine, linear) or keep the rate constant?

Expand full comment
3 more comments...

No posts