So does it mean there is still hope for encoder-decoders and that it's in the small model size range? But aren't they harder to fine tune (or at least less convenient)? Or should we treat it more like interesting research.
For now, that's very interesting research. I don't see why would someone use them in production.
Fine tuning them for tasks making sequence transformation, such as paraphrasing and translation, could be a good use case, but that remain to be demonstrated.
So does it mean there is still hope for encoder-decoders and that it's in the small model size range? But aren't they harder to fine tune (or at least less convenient)? Or should we treat it more like interesting research.
For now, that's very interesting research. I don't see why would someone use them in production.
Fine tuning them for tasks making sequence transformation, such as paraphrasing and translation, could be a good use case, but that remain to be demonstrated.