Previous lesson
Model latency explained
Continue the Model Costs track.
Compute College
Learn what tokens per second means, how model throughput affects AI applications, and why throughput matters for AI compute capacity planning.
One concept connected to AI compute market decisions.
A practical introduction designed to be completed in one sitting.
Useful for developers, founders, procurement teams, and analysts tracking model-serving economics.
Plain-English definition
Tokens per second is the rate at which a model generates output tokens after response generation begins, making it a useful throughput measure for model serving.
Why it matters
Output throughput influences user wait time and how much demand a serving stack can handle. Faster useful output may let the same infrastructure serve more work, although other bottlenecks still matter.
Simple example
At an illustrative 50 generated tokens per second, an output of 1,000 tokens takes about 20 seconds after generation starts, before accounting for queueing or first-token delay.
Example figures are illustrative calculations, not current quoted market prices.
Market signal
Higher usable throughput can reduce effective serving cost or expand capacity; falling throughput under load can reveal demand pressure on serving systems.
Market read: this metric becomes an AI compute signal only when it changes serving volume, effective workload cost, or the capacity buyers require.
Common mistake
Do not confuse tokens per second with the full user experience: first-token latency, output length, quality, and batching also matter.
Practical takeaway
Use measured tokens per second alongside expected output length and concurrent demand to estimate response time and required serving capacity.
Decision check: identify the capability measured, the serving cost driver it affects, and the buyer behavior that would make capacity demand change.
Helpful memory trick
Tokens per second is the model output speedometer.
Compute College
Follow model releases as AI compute market signals in the ComputeTape Morning Brief.
Compute College track
Continue this Compute College lesson path
Previous lesson
Continue the Model Costs track.
Next lesson
Continue the Model Costs track.