compare-mt: Because Scoring Your Systems Is Not Enough
Expose what’s behind your scores for a more insightful and credible evaluation.
Expose what’s behind your scores for a more insightful and credible evaluation.
For natural language generation tasks, it is common to evaluate several models or systems against each other to identify the best one according to some metrics. In research papers for instance, we often …