Previous lesson
How fast does an H100 depreciate
Continue the Model Costs track.
Compute College
Model FLOPs Utilization (MFU) measures how much of a GPU's theoretical compute a job actually uses; goodput counts useful work delivered.
One concept connected to AI compute market decisions.
A practical introduction designed to be completed in one sitting.
Useful for ml engineers, operators, and analysts.
Plain-English definition
Model FLOPs Utilization (MFU) measures how much of a GPU's theoretical compute a training or inference job actually uses for useful model math. Goodput is the related idea of useful work delivered, as opposed to raw throughput. Both reveal how efficiently expensive accelerators are really being used.
Why it matters
A GPU's headline performance is a ceiling that real jobs rarely reach. Low MFU means you are paying for compute you are not using — overhead, waiting on memory or networking, or pipeline stalls eat into it. Improving MFU or goodput lowers the real cost per result without buying more hardware.
Simple example
Suppose a training run achieves an illustrative 40% MFU. Improving software, batching, or interconnect to reach an illustrative 50% means the same hardware does about 25% more useful work — cutting cost per result without adding a single GPU.
Example figures are illustrative calculations, not current quoted market prices.
Market signal
Rising MFU and goodput across the industry signal that more output is being squeezed from existing GPUs, which can ease effective demand growth. Watch efficiency gains alongside raw capacity: better utilization can lower cost per result even when chip supply is flat.
Market read: MFU and goodput reveal how much of paid compute is actually useful. Evidence discipline: state the workload and setup behind any utilization figure, and keep illustrative percentages separate from measured runs.
Common mistake
Reading a GPU's peak performance as what you will get. Real jobs reach a fraction of theoretical FLOPs; ignoring MFU and goodput overstates effective capacity and understates the true cost per result.
Practical takeaway
Track MFU or goodput, not just GPU count, to find where paid compute is wasted and to lower cost per result.
Decision check: before adding GPUs, check whether higher MFU or goodput on existing hardware would meet the need more cheaply.
Helpful memory trick
Peak FLOPs is the speedometer's top number; MFU is how fast you are actually going.
Compute College
Use the GPU-Hour Cost Calculator, AI Training Cost Calculator, or Model Serving Cost Calculator.
Compute College track
Continue this Compute College lesson path
Previous lesson
Continue the Model Costs track.
Next lesson
Continue the Model Costs track.