API monthly estimate
Input tokens × input price plus output tokens × output price. List rates, not negotiated.
Calculator
Compare API serving cost to a self-hosted GPU cluster.
Enter your monthly token volume and API list prices, then size a GPU cluster against them. The calculator returns both monthly estimates, the break-even output volume, and how utilization changes the self-hosted cost per million useful tokens.
Interactive calculator
API at $1,050/mo vs self-hosted at $6,912/mo → API is cheaper this month.
Starting values are illustrative defaults you can edit — not live ComputeTape benchmark prices. Replace them with a real quote.
How to read the result
API cost scales linearly with token volume. Self-hosted cost is mostly fixed by provisioned GPU count and rate, with utilization determining how much of that cost lands on useful output. The break-even is the monthly output volume at which the two converge at the prices and configuration you entered.
Input tokens × input price plus output tokens × output price. List rates, not negotiated.
Provisioned GPUs × hourly rate × 720 hours per month × (1 + overhead). You pay for the cluster whether it is busy or not.
Output tokens the cluster can actually serve in a month at the throughput and utilization you entered. If demand exceeds capacity, self-hosted needs more GPUs before the comparison is honest.
Monthly output tokens at which API equals self-hosted, holding input volume and prices constant. Above the break-even, self-hosted is cheaper; below it, API wins.
Why utilization dominates
Self-hosted cost per useful token is the all-in monthly cost divided by the output tokens you actually serve. Drop utilization from 100% to 50% and the cost per useful token roughly doubles, because the GPU still runs and bills.
How utilization is measured and why paid capacity costs more when it sits idle.
How tokens per second, latency, and batch size translate into recurring spend.