Hopper
H100
The baseline high-end accelerator that became a core reference point for AI compute pricing. Key idea: strong general-purpose AI capacity.
Compute College
How NVIDIA accelerator generations compare on workload fit, current GPU-hour pricing, memory, and availability.
Newer chips can change both performance and buyer willingness to pay.
Price gaps across generations reveal what capacity profiles buyers value.
H100 vs H200 vs B200 is a comparison of NVIDIA accelerator generations, not interchangeable market units. Generation, memory profile, performance, current GPU-hour pricing, availability, and quoted terms determine whether a higher hourly price produces cheaper completed work.
Memory trick: A faster delivery vehicle can cost more per hour and still cost less per completed trip.
A GPU that costs more per hour can still be cheaper per completed workload if it finishes the job faster, supports a larger model more efficiently, or reduces the number of chips required.
Hopper
The baseline high-end accelerator that became a core reference point for AI compute pricing. Key idea: strong general-purpose AI capacity.
Hopper
A Hopper-generation step-up with much larger and faster memory for memory-heavy AI workloads. Key idea: better fit for larger models and memory-sensitive workloads.
Blackwell
The next-generation Blackwell accelerator, pushing the performance and memory frontier higher again. Key idea: a new generation that can shift workload economics and market expectations.
| Chip | Generation | Memory profile | Price signal | Best fit | Current prices |
|---|---|---|---|---|---|
| H100 | Hopper baseline | 80 GB HBM3 | $2.40–$12.29 (7 providers) | General training and inference baseline; the most common comparison point. | Current H100 pricing |
| H200 | Hopper memory upgrade | 141 GB HBM3e | $3.50–$10 (5 providers) | Memory-heavy inference, larger contexts, and workloads bottlenecked by H100 memory. | Current H200 pricing |
| B200 | Blackwell | 192 GB HBM3e | $5.89–$8.60 (3 providers) | New-generation throughput, FP4-capable workloads, and buyers paying for scarce frontier capacity. | Current B200 pricing |
Any figures shown are illustrative calculations, not current quoted market prices.
Specifications
Hardware specifications are from NVIDIA datasheets (SXM variants; B200 figures are the dual-die HGX B200). Dense Tensor Core throughput is shown; NVIDIA headline numbers often quote the higher with-sparsity figure. The GPU-hour band is a live, sourced on-demand range, not a datasheet value.
| Specification | H100 SXM5 | H200 SXM | B200 |
|---|---|---|---|
| Architecture | Hopper | Hopper | Blackwell |
| GPU memory | 80 GB HBM3 | 141 GB HBM3e | 192 GB HBM3e |
| Memory bandwidth | ~3.35 TB/s | ~4.8 TB/s | ~8 TB/s |
| FP8 dense (Tensor Core) | ~1,979 TFLOPS | ~1,979 TFLOPS | ~4,500 TFLOPS |
| FP4 dense (Tensor Core) | — | — | ~9,000 TFLOPS (new) |
| TDP | ~700 W | ~700 W | ~1,000 W |
| On-demand GPU-hour band | $2.40–$12.29 (7 providers) | $3.50–$10 (5 providers) | $5.89–$8.60 (3 providers) |
GPU-hour band: live on-demand range from rights-vetted provider rows; "Not yet sourced" means no approved row is on file for that chip yet. Datasheet specs describe capability, not a quote — see our methodology.
How to read it
H100 to H200 is the same compute generation: the gain is memory, not FLOPS. H200 keeps Hopper-class compute (~1,979 TFLOPS dense FP8) but nearly doubles memory (141 GB vs 80 GB) and bandwidth (4.8 vs 3.35 TB/s), so it wins on memory-bound and larger-context work, not raw throughput. B200 (Blackwell) is the generational jump: roughly 2.3x the dense FP8 throughput plus a new FP4 mode, at about 40% more power. That is why a newer chip can finish a memory-bound or large-model job in fewer hours and end up cheaper per completed workload even at a higher GPU-hour rate.
Market signal
Market read: premiums and availability differences across GPU generations show which performance and memory profiles buyers currently value. Figures here are illustrative unless explicitly sourced and dated — see our methodology.
A lower hourly rate does not automatically mean lower compute cost. The right comparison is whether a chip can complete the required workload at the needed speed, scale, and total cost.
Price
What access costs per unit of time.
Performance
How much useful work the chip can complete.
Fit
Whether the chip is well suited to the model and task.
Practical takeaway
Compare accelerator choices by expected completion cost and workload fit, not by hourly rental price alone. Include memory needs, networking, availability, utilization, and time-to-result.
Decision check: select the accelerator that delivers acceptable useful output per total dollar and deadline, not simply the lowest displayed rate.
Compute College
Use the GPU-Hour Cost Calculator, AI Training Cost Calculator, or Model Serving Cost Calculator.
Compute College track
Step 4 of 7: H100 vs H200 vs B200