AI compute market signals and learning
← Back to Compute College

Compute College

What is Cost per Million Tokens?

Cost per million tokens is how hosted AI APIs price inference — usually with input and output tokens priced separately.

Compute & Pricing LessonsLearning path

One concept connected to AI compute market decisions.

5-8 minutesRead time

A practical introduction designed to be completed in one sitting.

Tokens / Inference / PricingTags

Useful for founders, product managers, analysts, and api buyers.

Plain-English definition

Plain-English definition

Cost per million tokens is the price of running a model expressed per million tokens, the way most hosted AI APIs quote inference. It translates GPU-level cost into the unit API buyers actually see, and it usually prices input (prompt) tokens and output (generated) tokens separately.

Why it matters

Why it matters

Most AI products are billed and budgeted in tokens, not GPU-hours. Cost per million tokens connects model usage directly to spend, and because output tokens usually cost more than input tokens, it exposes where the bill actually comes from.

  • It is the unit hosted APIs bill in, so it maps usage straight to spend.
  • Input and output tokens are usually priced differently, and output often dominates the bill.
  • It bridges API buyers, who think in tokens, and infrastructure buyers, who think in GPU-hours.

Simple example

Simple example

Suppose a feature sends 1,000 input tokens and generates 2,000 output tokens per request, at illustrative rates of $3 per million input tokens and $15 per million output tokens. That is (1,000 × $3 + 2,000 × $15) ÷ 1,000,000 = about $0.033 per request, or roughly an illustrative $33,000 a month at one million requests.

  • Multiply token counts by per-million rates, keeping input and output separate.
  • Output usually drives the bill, so estimate expected output length carefully.
  • Treat per-token rates as illustrative unless taken from a current provider quote.

Example figures are illustrative calculations, not current quoted market prices.

Market signal

How to read the market signal

Falling cost per million tokens can signal more efficient models, better hardware, or competition; rising effective cost can signal longer outputs, premium or reasoning routes, or tighter capacity. Watch input-versus-output pricing and how providers price long-output features, since per-token competition is a proxy for inference supply and efficiency gains.

  • Separate a rate cut from a usage change before calling a cost trend.
  • Output-token pricing is where reasoning and long-answer features show up.
  • Per-token competition is a proxy for inference supply and efficiency gains.

Market read: cost per million tokens is the inference-demand unit that maps model usage to spend. Evidence discipline: record the model, the date, and whether a rate is for input or output before comparing per-token prices, and keep illustrative rates separate from quotes.

Common mistake

Common mistake

Comparing a single blended per-token rate across providers. Input and output are usually priced differently, context length and features change the effective cost, and a low headline rate can hide expensive output pricing.

Practical takeaway

What you can do with this

Budget from expected input and output token volumes at separate rates, then sanity-check against GPU-hour cost if you self-host.

  • Buyers: estimate input and output tokens per request separately and apply each rate.
  • Founders and analysts: track cost per completed task, not just per token, since one task can take many tokens.
  • Compare against a self-hosted GPU-hour estimate to decide between an API and self-hosting.
  • Treat per-token rates as illustrative until taken from a current provider quote.
  • Keep observed rates separate from modeled per-request costs.

Decision check: compare providers on input and output rates together with your expected token mix, not on a single headline number.

Helpful memory trick

Helpful memory trick

Tokens are the meter; cost per million tokens is the price per unit — and output usually spins the meter faster.

Compute College

Turn the lesson into a number

Use the GPU-Hour Cost Calculator, AI Training Cost Calculator, or Model Serving Cost Calculator.

Use the calculators

Compute College track

Model Costs

Continue this Compute College lesson path

Previous lesson

Why reasoning models cost more to serve

Continue the Model Costs track.

Next lesson

What is GPU utilization

Continue the Model Costs track.