The Kaitchup – AI on a Budget

The Kaitchup – AI on a Budget

Eagle 3 Speculators: When To Use Them?

Easier and faster speculative decoding, if you are in the right settings

Benjamin Marie's avatar
Benjamin Marie
Dec 09, 2025
∙ Paid

EAGLE-3 is a family of draft models (“speculators”) for speculative decoding. As with other speculative setups, a small model proposes several future tokens and a larger target model verifies them in a single pass. When the draft model’s guesses are accurate and cheap enough, this reduces the total number of heavy forward passes, and therefore the overall cost of inference, while keeping the target model’s output unchanged.

The EAGLE-3 speculators are designed to raise the acceptance rate of drafted tokens and make better use of each verification pass. They do this through architectural changes (multi-layer feature fusion) and a training setup that more closely matches how speculative decoding is actually run at inference time. The aim is to shift more of the work onto a small, fast model and let the large model act mainly as a validator.

The Kaitchup – AI on a Budget is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In this article, I will look at EAGLE-3 in practice using the released speculators with vLLM. I will experiment with Qwen3 32B on an A100 80 GB GPU and focus on end-to-end behavior: throughput, acceptance length, and wall-clock latency. In particular, I will compare high-concurrency continuous batching, where the GPU is already saturated, with batch size 1, where speculative decoding has more opportunity to lower the effective cost per generated token.

The following notebook shows how to run and evaluate EAGLE-3 speculators:

Get the notebook (#191)

EAGLE-3: High-Accuracy Draft Models for Fast Speculative Decoding

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 The Kaitchup · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture