AI compute market signals and learning
← Back to Compute College

Compute College

What is NVL72? Scale-Up vs Scale-Out

NVL72-style rack systems link many GPUs into one; it illustrates scale-up (bigger tightly-coupled units) versus scale-out (more networked units).

Infrastructure & Power LessonsLearning path

One concept connected to AI compute market decisions.

5-8 minutesRead time

A practical introduction designed to be completed in one sitting.

Interconnect / Scaling / HardwareTags

Useful for ml engineers, analysts, and infrastructure buyers.

Plain-English definition

Plain-English definition

NVL72 refers to a rack-scale system that links many GPUs with high-speed interconnect so they behave like one very large accelerator. It illustrates scale-up — making a single tightly-connected unit bigger — versus scale-out, adding more separate units connected over a network. Both expand AI compute, but in different ways.

Why it matters

Why it matters

Large models need many GPUs working together. Scale-up tightly couples GPUs so they share memory and data at very high speed, which suits big models and low latency; scale-out adds more nodes over slower networks, which suits throughput and capacity. The balance affects performance, cost, power density, and cooling.

  • Scale-up makes a single tightly-connected unit larger; scale-out adds more units over a network.
  • Tightly-coupled rack systems help big models train and serve efficiently.
  • More tightly-packed GPUs raise power density and cooling requirements.

Simple example

Simple example

Suppose a model is too large for one GPU. Scaling up puts more GPUs in one fast-interconnected rack so they act as a single big accelerator; scaling out spreads the work across many racks over a network. The scale-up approach can be faster for tightly-coupled work but concentrates power and heat in one rack.

  • Scale-up favors tightly-coupled work such as large-model training and low-latency serving.
  • Scale-out favors capacity and throughput across many independent jobs.
  • Tighter coupling concentrates power and heat, raising cooling needs.

Example figures are illustrative calculations, not current quoted market prices.

Market signal

How to read the market signal

A shift toward rack-scale, tightly-coupled systems is a signal about model size, interconnect demand, and power density. Read it alongside cooling and power-per-rack trends, since denser scale-up designs push liquid cooling and higher site power.

  • Rack-scale designs signal larger models and heavier interconnect demand.
  • Higher power density per rack drives liquid cooling and site-power needs.
  • The scale-up and scale-out mix shapes data-center design and cost.

Market read: the scale-up/scale-out balance signals model size and power density, not just raw GPU count. Evidence discipline: distinguish a system's interconnect design from headline GPU counts, and date any density or performance claim.

Common mistake

Common mistake

Counting GPUs without asking how they are connected. The same number of GPUs can behave very differently depending on whether they are tightly coupled (scale-up) or loosely networked (scale-out) — interconnect, not count alone, decides what big models can do.

Practical takeaway

What you can do with this

When you see a cluster described, ask how its GPUs are connected and whether the workload needs scale-up or scale-out.

  • Buyers: match interconnect (scale-up vs scale-out) to whether your workload is tightly coupled or parallel.
  • Analysts: read rack-scale announcements as signals of model size and power density, not just chip volume.
  • Note that denser scale-up designs raise cooling and site-power requirements.
  • Separate interconnect design from headline GPU counts when comparing systems.
  • Keep vendor performance claims illustrative until tested on your workload.

Decision check: choose scale-up when the work is tightly coupled and latency-bound, scale-out when it parallelizes across independent units.

Helpful memory trick

Helpful memory trick

Scale-up builds a bigger engine; scale-out adds more engines — both add power, but they solve different problems.

Compute College

Turn the lesson into a number

Use the GPU-Hour Cost Calculator, AI Training Cost Calculator, or Model Serving Cost Calculator.

Use the calculators

Compute College track

Power & Data Centers

Continue this Compute College lesson path