2 Comments
User's avatar
Ronan McGovern's avatar

Pity deepseek 16B is weaker than a 13B model

Expand full comment
Benjamin Marie's avatar

Not sure what to think about it. It uses as many parameters as Phi-2 during inference (2.8B parameters) but requires almost 6 times more memory.

Expand full comment