compare-mt: Because Scoring Your Systems Is Not Enough

Expose what’s behind your scores for a more insightful and credible evaluation.

Aug 29, 2022

∙ Paid

Generated with Craiyon with the prompt: “chart under a magnifying glass.”

For natural language generation tasks, it is common to evaluate several models or systems against each other to identify the best one according to some metrics. In research papers for instance, we often …

The Kaitchup – AI on a Budget

compare-mt: Because Scoring Your Systems Is Not Enough

Expose what’s behind your scores for a more insightful and credible evaluation.

Expose what’s behind your scores for a more insightful and credible evaluation.

This post is for paid subscribers