The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
vLLM and Zero-Shot for Low-Cost LLM Evaluation

vLLM and Zero-Shot for Low-Cost LLM Evaluation

How to reduce the cost of your LLM evaluations

Benjamin Marie's avatar
Benjamin Marie
Feb 19, 2025
∙ Paid
6

Share this post

The Kaitchup – AI on a Budget
The Kaitchup – AI on a Budget
vLLM and Zero-Shot for Low-Cost LLM Evaluation
Share

During development and fine-tuning, regular evaluation is essential to assess model improvements. Once deployed, benchmarking also helps detect performance degradation, such as after model updates or changes to inference frameworks.

Large language models (LLMs) are typically evaluated using public benchmarks such as MMLU, GPQA, Big Bench, and IFEval. These benchmarks provide insights into an LLM’s capabilities in reasoning, natural language understanding, world knowledge, and instruction following.

However, running these evaluations can be costly, especially when using default benchmark parameters and inference frameworks.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, we will analyze the evaluation costs of LLMs. As examples, I will evaluate a 1.5B-parameter model and a 32B-parameter model, using RunPod (referral link), and present the total cost in USD. We will then explore strategies to significantly reduce evaluation costs while maintaining its credibility. Additionally, because even tiny changes in the evaluation hyperparameters can significantly impact the results, I will provide short guidelines for reporting results in scientific papers and technical reports.

Note: This article is not yet linked to an AI notebook. I might add one later. I’ll let you know in the Weekly Kaitchup.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 The Kaitchup
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share