Crazy how we default save so many logits when they basically are never needed except if someone is doing beam search or something. I hadn’t thought about that.
Hmm. If Microsoft thought those benchmarks needed decontamination, when will we see other model results using decontamination, and what methods will be used?
Last summer, I was using unsloth on a multigpu setup without issue…did they disable it completely? It was never supported explicitly but it ran fine for me until I tried again this past week using the same code and parameters.
Crazy how we default save so many logits when they basically are never needed except if someone is doing beam search or something. I hadn’t thought about that.
Hmm. If Microsoft thought those benchmarks needed decontamination, when will we see other model results using decontamination, and what methods will be used?
Last summer, I was using unsloth on a multigpu setup without issue…did they disable it completely? It was never supported explicitly but it ran fine for me until I tried again this past week using the same code and parameters.
Actually, I didn't even know that multi-GPU was possible with the free version. I always thought it was only available for the paid version.
My guess is that they don't want to unlock multi-GPU for the free version since this would remove the value of the paid version.
https://unsloth.ai/pricing
Yeah could be that earlier they were replying more on transformers, which will default to pipeline parallel.
But maybe since parts of that library have been pulled into unsloth which doesn’t.