Previous lesson
What is InfiniBand
Continue the Power & Data Centers track.
Compute College
NVL72-style rack systems link many GPUs into one; it illustrates scale-up (bigger tightly-coupled units) versus scale-out (more networked units).
One concept connected to AI compute market decisions.
A practical introduction designed to be completed in one sitting.
Useful for ml engineers, analysts, and infrastructure buyers.
Plain-English definition
NVL72 refers to a rack-scale system that links many GPUs with high-speed interconnect so they behave like one very large accelerator. It illustrates scale-up — making a single tightly-connected unit bigger — versus scale-out, adding more separate units connected over a network. Both expand AI compute, but in different ways.
Why it matters
Large models need many GPUs working together. Scale-up tightly couples GPUs so they share memory and data at very high speed, which suits big models and low latency; scale-out adds more nodes over slower networks, which suits throughput and capacity. The balance affects performance, cost, power density, and cooling.
Simple example
Suppose a model is too large for one GPU. Scaling up puts more GPUs in one fast-interconnected rack so they act as a single big accelerator; scaling out spreads the work across many racks over a network. The scale-up approach can be faster for tightly-coupled work but concentrates power and heat in one rack.
Example figures are illustrative calculations, not current quoted market prices.
Market signal
A shift toward rack-scale, tightly-coupled systems is a signal about model size, interconnect demand, and power density. Read it alongside cooling and power-per-rack trends, since denser scale-up designs push liquid cooling and higher site power.
Market read: the scale-up/scale-out balance signals model size and power density, not just raw GPU count. Evidence discipline: distinguish a system's interconnect design from headline GPU counts, and date any density or performance claim.
Common mistake
Counting GPUs without asking how they are connected. The same number of GPUs can behave very differently depending on whether they are tightly coupled (scale-up) or loosely networked (scale-out) — interconnect, not count alone, decides what big models can do.
Practical takeaway
When you see a cluster described, ask how its GPUs are connected and whether the workload needs scale-up or scale-out.
Decision check: choose scale-up when the work is tightly coupled and latency-bound, scale-out when it parallelizes across independent units.
Helpful memory trick
Scale-up builds a bigger engine; scale-out adds more engines — both add power, but they solve different problems.
Compute College
Use the GPU-Hour Cost Calculator, AI Training Cost Calculator, or Model Serving Cost Calculator.
Compute College track
Continue this Compute College lesson path
Previous lesson
Continue the Power & Data Centers track.
Next lesson
Continue the Power & Data Centers track.