Discussion about this post

User's avatar
Jeremy Blankenship's avatar

‘Intuitively, Muon “killing” a sgnificantly numbers of the neurons could explain why it doesn’t work well for LoRA, whose number of trainable parameters is already tiny.’

Spelling / grammar issues.

No posts

Ready for more?