AI compute market signals and learning

Calculator

API vs Self-Hosted Calculator

Compare API serving cost to a self-hosted GPU cluster.

Enter your monthly token volume and API list prices, then size a GPU cluster against them. The calculator returns both monthly estimates, the break-even output volume, and how utilization changes the self-hosted cost per million useful tokens.

Interactive calculator

API vs self-hosted calculator

Total prompt tokens sent in a month, in millions.
M
Total generated tokens in a month, in millions. Output usually dominates serving cost.
M
Vendor list price for input tokens, dollars per million.
$/M
Vendor list price for output tokens, dollars per million.
$/M
The accelerator model you are pricing. Selecting one fills in an illustrative starting rate you can edit.
What one GPU costs per hour, from your own quote or provider.
$
How many GPUs run continuously for serving. Self-hosted pays for the cluster even when idle.
Output tokens generated per second per GPU at your batch size and model. Frontier 70B-class models often land near 100 tok/s on an H100.
tok/s
Share of provisioned GPU time that actually generates output. Lower utilization raises effective cost per token.
%
Extra cost for storage, networking, orchestration, and platform fees, as a percentage of compute.
%

API at $1,050/mo vs self-hosted at $6,912/mo → API is cheaper this month.

API monthly estimate$1,050
Self-hosted monthly estimate$6,912
Self-hosted effective capacity181M output tok/mo
Self-hosted cost per useful 1M output$38.10
Break-even monthly output volume441M output tokens
Utilization sensitivity50% → $53.33/M · 75% → $35.56/M · 100% → $26.67/M

Starting values are illustrative defaults you can edit — not live ComputeTape benchmark prices. Replace them with a real quote.

How to read the result

What the numbers mean

API cost scales linearly with token volume. Self-hosted cost is mostly fixed by provisioned GPU count and rate, with utilization determining how much of that cost lands on useful output. The break-even is the monthly output volume at which the two converge at the prices and configuration you entered.

API monthly estimate

Input tokens × input price plus output tokens × output price. List rates, not negotiated.

Self-hosted monthly estimate

Provisioned GPUs × hourly rate × 720 hours per month × (1 + overhead). You pay for the cluster whether it is busy or not.

Effective capacity

Output tokens the cluster can actually serve in a month at the throughput and utilization you entered. If demand exceeds capacity, self-hosted needs more GPUs before the comparison is honest.

Break-even output volume

Monthly output tokens at which API equals self-hosted, holding input volume and prices constant. Above the break-even, self-hosted is cheaper; below it, API wins.

Why utilization dominates

A half-busy GPU is a fully paid GPU

Self-hosted cost per useful token is the all-in monthly cost divided by the output tokens you actually serve. Drop utilization from 100% to 50% and the cost per useful token roughly doubles, because the GPU still runs and bills.

What is GPU utilization?

How utilization is measured and why paid capacity costs more when it sits idle.

Frontier model serving cost

How tokens per second, latency, and batch size translate into recurring spend.