AI cluster
Many GPUs wired together with fast networking so they act as one machine for training or large-scale inference.
Compute College
Plain-English definitions of AI compute terms — each linked to a full lesson where one exists.
GPU-hours, neoclouds, HBM, spot pricing, PUE, compute futures, and more, defined in one place. Jump to a letter or open the full lesson behind any term.
Learn the Market
A GPU-hour is one graphics processor running for one hour. It is one of the simplest units for comparing AI compute because it turns a messy market into a measurable rental unit. If a provider charges for one H100 running for one hour, that is an H100 GPU-hour. ComputeTape can use GPU-hours to compare on-demand cloud, reserved capacity, neocloud rentals, and spot compute.
Reference guide
This glossary defines the AI compute terms that show up in GPU pricing, model-serving costs, capacity contracts, power deals, and data-center buildouts. Each definition stands on its own, and most terms link to a full Compute College lesson with examples, market signals, common mistakes, and related concepts.
A–Z
Many GPUs wired together with fast networking so they act as one machine for training or large-scale inference.
The accelerators, memory, networking, power, and cooling that together produce usable AI processing capacity.
The choice between paying per token for a hosted model API and running the model yourself on rented or owned GPUs.
NVIDIA's Blackwell-generation data-center GPU, the successor to the H100 and H200 for AI training and inference.
Electricity generated on-site and used directly by a data center without passing through the public grid connection.
The market where GPU capacity is priced, reserved, and traded as a scarce, time-sensitive resource.
An agreement to buy compute capacity for a future period at a price set today.
Forward-looking pricing for compute capacity delivered later, used to plan or hedge future GPU needs.
Capacity booked in advance so it is guaranteed available when a workload needs it.
How much heat a data center can remove per rack, which limits how many high-power GPUs fit in a space.
How hosted AI APIs price inference, usually pricing input and output tokens separately; output typically drives the bill.
A facility that houses servers, power, networking, and cooling — the physical home of AI compute.
The process and capacity needed to connect a data-center site to the power grid.
The waiting line of projects seeking a grid connection, a common bottleneck that delays new AI capacity.
A view of expected compute price or capacity value across future time periods.
The estimated recurring cost of running a leading AI model to answer users after it has been trained.
The amount of GPU power a cloud can actually deliver to customers, set by hardware, power, and networking.
One GPU made available for one hour — the baseline unit for pricing accelerator rental time.
Renting GPU time by the hour or by reservation instead of buying and operating the hardware.
How much useful work a paid GPU actually does; low utilization raises the real cost per result.
Borrowing using GPUs or their rental contracts as collateral, so operators can buy large fleets without paying the full cost upfront.
NVIDIA's Hopper-generation data-center GPU, a widely used baseline for AI training and inference pricing.
An upgraded Hopper GPU with more and faster memory than the H100 for larger models.
Fast memory stacked next to a GPU; its supply and capacity are a key constraint on advanced accelerators.
A high-speed, low-latency network used to link GPUs across a cluster for training.
Cooling GPUs with circulating liquid instead of air, needed as chip power and rack density rise.
Using power (megawatts) as a proxy for how much AI compute a site can run.
How much of a GPU's theoretical compute a job actually uses for useful work; low MFU means paying for compute you are not using.
The one-time compute expense of teaching a model, driven by GPU-hours, hardware, and run length.
A compute-first cloud operator focused on high-performance AI infrastructure rather than general-purpose services.
The service-level guarantees a neocloud makes on availability, performance, and support for rented capacity.
A rack-scale system that links many GPUs with high-speed interconnect so they act as one large accelerator (scale-up).
NVIDIA's high-speed link connecting GPUs inside a server so they share data quickly.
Paying for GPU capacity as you use it with no commitment — flexible but usually priced higher per hour.
A long-term contract to buy electricity at agreed terms, used to secure firm power for AI data centers.
A ratio of total facility power to IT power; lower is more efficient, with 1.0 the ideal.
The higher serving cost of models that generate long chains of thought, because output tokens drive inference cost.
GPU capacity committed for a set term in exchange for guaranteed availability and usually a lower rate.
AI computing capacity a country controls within its own borders and jurisdiction, reducing dependence on foreign infrastructure.
The price of interruptible GPU capacity sold from spare supply; cheaper but can be reclaimed at any time.
Extra compute a model spends while answering — reasoning before it replies — to improve the result.
The price per token of model input and output; output tokens usually cost more and dominate serving bills.
A custom AI accelerator (Google's tensor processing unit) built for specific workloads, trading flexibility for efficiency versus a general-purpose GPU.