The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
LLM as a Judge: Evaluate Your LLMs with Another LLM

LLM as a Judge: Evaluate Your LLMs with Another LLM

A good evaluation framework for quick feedback and monitoring

Benjamin Marie's avatar
Benjamin Marie
Nov 07, 2024
∙ Paid
13

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
LLM as a Judge: Evaluate Your LLMs with Another LLM
1
Share
Generated with Grok

Evaluating large language models (LLMs) can be tricky. These models can do so many things, so it’s hard to come up with clear, simple standards to judge their responses. For example, an answer from an LLM might lack context, repeat itself, have grammar mistakes, be way too long, or even make little sense.

One effective solution is to let LLMs evaluate each other, an approach known as "LLM-as-a-judge." This approach which is used in popular benchmarks like Chatbot Arena, involves using an LLM to score or rank the responses of other models. By letting LLMs handle the judging, we can save on human effort while still getting feedback. As it's automatic, this method makes it easier to review and improve these models without relying heavily on human reviewers. LLM-as-a-judge is also a good alternative to avoid depending on old public benchmarks for evaluation like MMLU which have probably been seen by the models during training.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, we will see how to use the LLM-as-a-judge framework, with examples. Using vLLM and TRL, we will see how to compare two LLMs using a third larger and better LLM as a judge and compute win rates.

I also wrote a notebook showing how to run LLM-as-a-judge, here:

Get the notebook (#119)

LLM-as-a-judge: How Does It Work?

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share