Mistral Small 3.1 - TRL - Projects
Ack. Maybe I should downgrade TRL while we wait for them to correct it.
The issue with downgrading is that it breaks a lot of things, including GRPO.
I'm also not sure that this current behavior of TRL is not intended...
TRL might just become a post-training framework for instruct models, i.e., moving away from the original intent which was fine-tuning base models.
Interestingly, Nanotron, which is a framework for pre-training is now implementing SFT...
https://github.com/huggingface/nanotron/pull/295
Ack. Maybe I should downgrade TRL while we wait for them to correct it.
The issue with downgrading is that it breaks a lot of things, including GRPO.
I'm also not sure that this current behavior of TRL is not intended...
TRL might just become a post-training framework for instruct models, i.e., moving away from the original intent which was fine-tuning base models.
Interestingly, Nanotron, which is a framework for pre-training is now implementing SFT...
https://github.com/huggingface/nanotron/pull/295