Compute College

AI compute glossary

Plain-English definitions of AI compute terms — each linked to a full lesson where one exists.

GPU-hours, neoclouds, HBM, spot pricing, PUE, compute futures, and more, defined in one place. Jump to a letter or open the full lesson behind any term.

Start with AI Compute 101 Browse the tracks

Use GPU and pricing terms to compare cloud capacity, reservations, spot markets, and provider quotes.
Use power and data-center terms to understand why electrical capacity, cooling, and interconnection queues limit new AI supply.
Use model-cost terms to connect tokens, utilization, and serving workloads back to real compute demand.

A

AI cluster

Many GPUs wired together with fast networking so they act as one machine for training or large-scale inference.

Full lesson →

AI compute

The accelerators, memory, networking, power, and cooling that together produce usable AI processing capacity.

Full lesson →

API vs self-hosted

The choice between paying per token for a hosted model API and running the model yourself on rented or owned GPUs.

Full lesson →

B

B200

NVIDIA's Blackwell-generation data-center GPU, the successor to the H100 and H200 for AI training and inference.

Full lesson →

Behind-the-meter power

Electricity generated on-site and used directly by a data center without passing through the public grid connection.

Full lesson →

C

Compute capacity market

The market where GPU capacity is priced, reserved, and traded as a scarce, time-sensitive resource.

Full lesson →

Compute forward contract

An agreement to buy compute capacity for a future period at a price set today.

Full lesson →

Compute futures

Forward-looking pricing for compute capacity delivered later, used to plan or hedge future GPU needs.

Full lesson →

Compute reservation

Capacity booked in advance so it is guaranteed available when a workload needs it.

Full lesson →

Cooling density

How much heat a data center can remove per rack, which limits how many high-power GPUs fit in a space.

Full lesson →

Cost per million tokens

How hosted AI APIs price inference, usually pricing input and output tokens separately; output typically drives the bill.

Full lesson →

D

Data center

A facility that houses servers, power, networking, and cooling — the physical home of AI compute.

Full lesson →

Data center interconnection

The process and capacity needed to connect a data-center site to the power grid.

Full lesson →

Data center interconnection queue

The waiting line of projects seeking a grid connection, a common bottleneck that delays new AI capacity.

Full lesson →

F

Forward curve

A view of expected compute price or capacity value across future time periods.

Full lesson →

Frontier model serving cost

The estimated recurring cost of running a leading AI model to answer users after it has been trained.

Full lesson →

G

GPU cloud capacity

The amount of GPU power a cloud can actually deliver to customers, set by hardware, power, and networking.

Full lesson →

GPU-hour

One GPU made available for one hour — the baseline unit for pricing accelerator rental time.

Full lesson →

GPU rentals

Renting GPU time by the hour or by reservation instead of buying and operating the hardware.

Full lesson →

GPU utilization

How much useful work a paid GPU actually does; low utilization raises the real cost per result.

Full lesson →

GPU-backed financing

Borrowing using GPUs or their rental contracts as collateral, so operators can buy large fleets without paying the full cost upfront.

Full lesson →

H

H100

NVIDIA's Hopper-generation data-center GPU, a widely used baseline for AI training and inference pricing.

Full lesson →

H200

An upgraded Hopper GPU with more and faster memory than the H100 for larger models.

Full lesson →

HBM (High-Bandwidth Memory)

Fast memory stacked next to a GPU; its supply and capacity are a key constraint on advanced accelerators.

Full lesson →

I

InfiniBand

A high-speed, low-latency network used to link GPUs across a cluster for training.

Full lesson →

L

Liquid cooling

Cooling GPUs with circulating liquid instead of air, needed as chip power and rack density rise.

Full lesson →

M

Megawatt of AI compute

Using power (megawatts) as a proxy for how much AI compute a site can run.

Full lesson →

MFU (Model FLOPs Utilization)

How much of a GPU's theoretical compute a job actually uses for useful work; low MFU means paying for compute you are not using.

Full lesson →

Model training cost

The one-time compute expense of teaching a model, driven by GPU-hours, hardware, and run length.

Full lesson →

N

Neocloud

A compute-first cloud operator focused on high-performance AI infrastructure rather than general-purpose services.

Full lesson →

Neocloud SLA

The service-level guarantees a neocloud makes on availability, performance, and support for rented capacity.

Full lesson →

NVL72

A rack-scale system that links many GPUs with high-speed interconnect so they act as one large accelerator (scale-up).

Full lesson →

NVLink

NVIDIA's high-speed link connecting GPUs inside a server so they share data quickly.

Full lesson →

O

On-demand pricing

Paying for GPU capacity as you use it with no commitment — flexible but usually priced higher per hour.

Full lesson →

P

PPA (Power Purchase Agreement)

A long-term contract to buy electricity at agreed terms, used to secure firm power for AI data centers.

Full lesson →

Power Usage Effectiveness (PUE)

A ratio of total facility power to IT power; lower is more efficient, with 1.0 the ideal.

Full lesson →

R

Reasoning-model serving cost

The higher serving cost of models that generate long chains of thought, because output tokens drive inference cost.

Full lesson →

Reserved capacity

GPU capacity committed for a set term in exchange for guaranteed availability and usually a lower rate.

Full lesson →

S

Sovereign AI compute

AI computing capacity a country controls within its own borders and jurisdiction, reducing dependence on foreign infrastructure.

Full lesson →

Spot price

The price of interruptible GPU capacity sold from spare supply; cheaper but can be reclaimed at any time.

Full lesson →

T

Test-time compute

Extra compute a model spends while answering — reasoning before it replies — to improve the result.

Full lesson →

Token cost

The price per token of model input and output; output tokens usually cost more and dominate serving bills.

Full lesson →

TPU (Tensor Processing Unit)

A custom AI accelerator (Google's tensor processing unit) built for specific workloads, trading flexibility for efficiency versus a general-purpose GPU.

Full lesson →

AI compute glossary

Term of the Day: GPU-hour

How to use this glossary

Every term, defined

A

AI cluster

AI compute

API vs self-hosted

B

B200

Behind-the-meter power

C

Compute capacity market

Compute forward contract

Compute futures

Compute reservation

Cooling density

Cost per million tokens

D

Data center

Data center interconnection

Data center interconnection queue

F

Forward curve

Frontier model serving cost

G

GPU cloud capacity

GPU-hour

GPU rentals

GPU utilization

GPU-backed financing

H

H100

H200

HBM (High-Bandwidth Memory)

I

InfiniBand

L

Liquid cooling

M

Megawatt of AI compute

MFU (Model FLOPs Utilization)

Model training cost

N

Neocloud

Neocloud SLA

NVL72

NVLink

O

On-demand pricing

P

PPA (Power Purchase Agreement)

Power Usage Effectiveness (PUE)

R

Reasoning-model serving cost

Reserved capacity

S

Sovereign AI compute

Spot price

T

Test-time compute

Token cost

TPU (Tensor Processing Unit)