AI compute market signals and learning
← Back to Compute College

Compute College

AI compute glossary

Plain-English definitions of AI compute terms — each linked to a full lesson where one exists.

GPU-hours, neoclouds, HBM, spot pricing, PUE, compute futures, and more, defined in one place. Jump to a letter or open the full lesson behind any term.

Learn the Market

Term of the Day: GPU-hour

A GPU-hour is one graphics processor running for one hour. It is one of the simplest units for comparing AI compute because it turns a messy market into a measurable rental unit. If a provider charges for one H100 running for one hour, that is an H100 GPU-hour. ComputeTape can use GPU-hours to compare on-demand cloud, reserved capacity, neocloud rentals, and spot compute.

Reference guide

How to use this glossary

This glossary defines the AI compute terms that show up in GPU pricing, model-serving costs, capacity contracts, power deals, and data-center buildouts. Each definition stands on its own, and most terms link to a full Compute College lesson with examples, market signals, common mistakes, and related concepts.

  • Use GPU and pricing terms to compare cloud capacity, reservations, spot markets, and provider quotes.
  • Use power and data-center terms to understand why electrical capacity, cooling, and interconnection queues limit new AI supply.
  • Use model-cost terms to connect tokens, utilization, and serving workloads back to real compute demand.

A–Z

Every term, defined

A

AI cluster

Many GPUs wired together with fast networking so they act as one machine for training or large-scale inference.

AI compute

The accelerators, memory, networking, power, and cooling that together produce usable AI processing capacity.

API vs self-hosted

The choice between paying per token for a hosted model API and running the model yourself on rented or owned GPUs.

B

B200

NVIDIA's Blackwell-generation data-center GPU, the successor to the H100 and H200 for AI training and inference.

Behind-the-meter power

Electricity generated on-site and used directly by a data center without passing through the public grid connection.

C

Compute capacity market

The market where GPU capacity is priced, reserved, and traded as a scarce, time-sensitive resource.

Compute forward contract

An agreement to buy compute capacity for a future period at a price set today.

Compute futures

Forward-looking pricing for compute capacity delivered later, used to plan or hedge future GPU needs.

Compute reservation

Capacity booked in advance so it is guaranteed available when a workload needs it.

Cooling density

How much heat a data center can remove per rack, which limits how many high-power GPUs fit in a space.

Cost per million tokens

How hosted AI APIs price inference, usually pricing input and output tokens separately; output typically drives the bill.

D

Data center

A facility that houses servers, power, networking, and cooling — the physical home of AI compute.

Data center interconnection

The process and capacity needed to connect a data-center site to the power grid.

Data center interconnection queue

The waiting line of projects seeking a grid connection, a common bottleneck that delays new AI capacity.

F

Forward curve

A view of expected compute price or capacity value across future time periods.

Frontier model serving cost

The estimated recurring cost of running a leading AI model to answer users after it has been trained.

G

GPU cloud capacity

The amount of GPU power a cloud can actually deliver to customers, set by hardware, power, and networking.

GPU-hour

One GPU made available for one hour — the baseline unit for pricing accelerator rental time.

GPU rentals

Renting GPU time by the hour or by reservation instead of buying and operating the hardware.

GPU utilization

How much useful work a paid GPU actually does; low utilization raises the real cost per result.

GPU-backed financing

Borrowing using GPUs or their rental contracts as collateral, so operators can buy large fleets without paying the full cost upfront.

H

H100

NVIDIA's Hopper-generation data-center GPU, a widely used baseline for AI training and inference pricing.

H200

An upgraded Hopper GPU with more and faster memory than the H100 for larger models.

HBM (High-Bandwidth Memory)

Fast memory stacked next to a GPU; its supply and capacity are a key constraint on advanced accelerators.

I

InfiniBand

A high-speed, low-latency network used to link GPUs across a cluster for training.

L

Liquid cooling

Cooling GPUs with circulating liquid instead of air, needed as chip power and rack density rise.

M

Megawatt of AI compute

Using power (megawatts) as a proxy for how much AI compute a site can run.

MFU (Model FLOPs Utilization)

How much of a GPU's theoretical compute a job actually uses for useful work; low MFU means paying for compute you are not using.

Model training cost

The one-time compute expense of teaching a model, driven by GPU-hours, hardware, and run length.

N

Neocloud

A compute-first cloud operator focused on high-performance AI infrastructure rather than general-purpose services.

Neocloud SLA

The service-level guarantees a neocloud makes on availability, performance, and support for rented capacity.

NVL72

A rack-scale system that links many GPUs with high-speed interconnect so they act as one large accelerator (scale-up).

NVLink

NVIDIA's high-speed link connecting GPUs inside a server so they share data quickly.

O

On-demand pricing

Paying for GPU capacity as you use it with no commitment — flexible but usually priced higher per hour.

P

PPA (Power Purchase Agreement)

A long-term contract to buy electricity at agreed terms, used to secure firm power for AI data centers.

Power Usage Effectiveness (PUE)

A ratio of total facility power to IT power; lower is more efficient, with 1.0 the ideal.

R

Reasoning-model serving cost

The higher serving cost of models that generate long chains of thought, because output tokens drive inference cost.

Reserved capacity

GPU capacity committed for a set term in exchange for guaranteed availability and usually a lower rate.

S

Sovereign AI compute

AI computing capacity a country controls within its own borders and jurisdiction, reducing dependence on foreign infrastructure.

Spot price

The price of interruptible GPU capacity sold from spare supply; cheaper but can be reclaimed at any time.

T

Test-time compute

Extra compute a model spends while answering — reasoning before it replies — to improve the result.

Token cost

The price per token of model input and output; output tokens usually cost more and dominate serving bills.

TPU (Tensor Processing Unit)

A custom AI accelerator (Google's tensor processing unit) built for specific workloads, trading flexibility for efficiency versus a general-purpose GPU.