The Weekly Kaitchup #71

Dec 13, 2024

efficient long context - Phi-4 - Molmo recipe

5 Comments

Crazy how we default save so many logits when they basically are never needed except if someone is doing beam search or something. I hadn’t thought about that.

Expand full comment

John Saunders

Dec 13

Hmm. If Microsoft thought those benchmarks needed decontamination, when will we see other model results using decontamination, and what methods will be used?

Expand full comment

Chris Handley

Dec 13

Last summer, I was using unsloth on a multigpu setup without issue…did they disable it completely? It was never supported explicitly but it ran fine for me until I tried again this past week using the same code and parameters.

Expand full comment

Reply (2)

Benjamin Marie

Dec 13

Actually, I didn't even know that multi-GPU was possible with the free version. I always thought it was only available for the paid version.

My guess is that they don't want to unlock multi-GPU for the free version since this would remove the value of the paid version.

https://unsloth.ai/pricing

Expand full comment

Trelis Research

Dec 21

Yeah could be that earlier they were replying more on transformers, which will default to pipeline parallel.

But maybe since parts of that library have been pulled into unsloth which doesn’t.

Expand full comment

The Kaitchup – AI on a Budget

The Weekly Kaitchup #71