2 Comments
User's avatar
Konrad Wojciechowski's avatar

So does it mean there is still hope for encoder-decoders and that it's in the small model size range? But aren't they harder to fine tune (or at least less convenient)? Or should we treat it more like interesting research.

Expand full comment
Benjamin Marie's avatar

For now, that's very interesting research. I don't see why would someone use them in production.

Fine tuning them for tasks making sequence transformation, such as paraphrasing and translation, could be a good use case, but that remain to be demonstrated.

Expand full comment