Discussion about this post

User's avatar
Trelis Research's avatar

Crazy how we default save so many logits when they basically are never needed except if someone is doing beam search or something. I hadn’t thought about that.

John Saunders's avatar

Hmm. If Microsoft thought those benchmarks needed decontamination, when will we see other model results using decontamination, and what methods will be used?

3 more comments...

No posts

Ready for more?