Compute College

Tokens per second explained

By ComputeTape Editorial

Learn what tokens per second means, how model throughput affects AI applications, and why throughput matters for AI compute capacity planning.

Output throughput sets user wait time and how much work a serving stack can handle.
Faster useful output can let the same infrastructure serve more requests.
But other bottlenecks (first-token, batching) still bound real capacity.

At 50 tokens/sec, a 1,000-token output takes about 20 seconds after generation starts.
That excludes queueing and first-token delay, which users also feel.
Throughput and latency are related but not the same measurement.

Example figures are illustrative calculations, not current quoted market prices.

Higher usable throughput can lower effective serving cost or expand capacity.
Falling throughput under load can reveal demand pressure on serving systems.
Throughput per dollar is a better capacity read than raw tokens per second.

Market read: throughput per dollar, not raw tokens per second, is the capacity signal; a drop under load can flag serving-side demand pressure. Figures here are illustrative unless explicitly sourced and dated — see our methodology.

Combine measured tokens per second with expected output length to estimate response time.
Factor concurrent demand to size required serving capacity.
Compare throughput per dollar across candidate models.

Decision check: have you combined tokens per second with output length and concurrency to size capacity, rather than quoting peak throughput alone?

Get the Morning Brief

Compute College track

Model Benchmarks & AI Compute Economics

Step 8 of 23: Tokens per second explained

Tokens per second explained

Plain-English definition

Why it matters

Simple example

How to read the market signal

Common mistake

What you can do with this

Follow model releases as market signals

Model Benchmarks & AI Compute Economics

Tokens per second explained

Plain-English definition

Why it matters

Simple example

How to read the market signal

Common mistake

What you can do with this

Follow model releases as market signals

Model Benchmarks & AI Compute Economics

Related lessons

Model latency explained

Context window explained

What is frontier model serving cost?

What is an agent benchmark?