Previous lesson
Why reasoning models cost more to serve
Continue the Model Costs track.
Compute College
Cost per million tokens is how hosted AI APIs price inference — usually with input and output tokens priced separately.
One concept connected to AI compute market decisions.
A practical introduction designed to be completed in one sitting.
Useful for founders, product managers, analysts, and api buyers.
Plain-English definition
Cost per million tokens is the price of running a model expressed per million tokens, the way most hosted AI APIs quote inference. It translates GPU-level cost into the unit API buyers actually see, and it usually prices input (prompt) tokens and output (generated) tokens separately.
Why it matters
Most AI products are billed and budgeted in tokens, not GPU-hours. Cost per million tokens connects model usage directly to spend, and because output tokens usually cost more than input tokens, it exposes where the bill actually comes from.
Simple example
Suppose a feature sends 1,000 input tokens and generates 2,000 output tokens per request, at illustrative rates of $3 per million input tokens and $15 per million output tokens. That is (1,000 × $3 + 2,000 × $15) ÷ 1,000,000 = about $0.033 per request, or roughly an illustrative $33,000 a month at one million requests.
Example figures are illustrative calculations, not current quoted market prices.
Market signal
Falling cost per million tokens can signal more efficient models, better hardware, or competition; rising effective cost can signal longer outputs, premium or reasoning routes, or tighter capacity. Watch input-versus-output pricing and how providers price long-output features, since per-token competition is a proxy for inference supply and efficiency gains.
Market read: cost per million tokens is the inference-demand unit that maps model usage to spend. Evidence discipline: record the model, the date, and whether a rate is for input or output before comparing per-token prices, and keep illustrative rates separate from quotes.
Common mistake
Comparing a single blended per-token rate across providers. Input and output are usually priced differently, context length and features change the effective cost, and a low headline rate can hide expensive output pricing.
Practical takeaway
Budget from expected input and output token volumes at separate rates, then sanity-check against GPU-hour cost if you self-host.
Decision check: compare providers on input and output rates together with your expected token mix, not on a single headline number.
Helpful memory trick
Tokens are the meter; cost per million tokens is the price per unit — and output usually spins the meter faster.
Compute College
Use the GPU-Hour Cost Calculator, AI Training Cost Calculator, or Model Serving Cost Calculator.
Compute College track
Continue this Compute College lesson path
Previous lesson
Continue the Model Costs track.
Next lesson
Continue the Model Costs track.