AI compute market signals and learning
← Back to Compute College

Compute College

H100 vs H200 vs B200: price, memory, and performance

How NVIDIA accelerator generations compare on workload fit, current GPU-hour pricing, memory, and availability.

Generation mattersComparison

Newer chips can change both performance and buyer willingness to pay.

Premiums show demandMarket read

Price gaps across generations reveal what capacity profiles buyers value.

Plain-English definition

H100 vs H200 vs B200 is a comparison of NVIDIA accelerator generations, not interchangeable market units. Generation, memory profile, performance, current GPU-hour pricing, availability, and quoted terms determine whether a higher hourly price produces cheaper completed work.

Memory trick: A faster delivery vehicle can cost more per hour and still cost less per completed trip.

Why it matters

A GPU that costs more per hour can still be cheaper per completed workload if it finishes the job faster, supports a larger model more efficiently, or reduces the number of chips required.

  • Memory capacity and bandwidth increase, which matters for larger models and memory-heavy workloads.
  • Architecture advances can improve throughput and efficiency.
  • New generations can change which workloads are practical, how many chips are needed, and what buyers are willing to pay.
  • Availability and supply mix also change as the market moves from one generation to the next.

Simple example

Hopper

H100

The baseline high-end accelerator that became a core reference point for AI compute pricing. Key idea: strong general-purpose AI capacity.

Hopper

H200

A Hopper-generation step-up with much larger and faster memory for memory-heavy AI workloads. Key idea: better fit for larger models and memory-sensitive workloads.

Blackwell

B200

The next-generation Blackwell accelerator, pushing the performance and memory frontier higher again. Key idea: a new generation that can shift workload economics and market expectations.

H100 vs H200 vs B200 buying comparison
ChipGenerationMemory profilePrice signalBest fitCurrent prices
H100Hopper baseline80 GB HBM3$2.40–$12.29 (7 providers)General training and inference baseline; the most common comparison point.Current H100 pricing
H200Hopper memory upgrade141 GB HBM3e$3.50–$10 (5 providers)Memory-heavy inference, larger contexts, and workloads bottlenecked by H100 memory.Current H200 pricing
B200Blackwell192 GB HBM3e$5.89–$8.60 (3 providers)New-generation throughput, FP4-capable workloads, and buyers paying for scarce frontier capacity.Current B200 pricing

Any figures shown are illustrative calculations, not current quoted market prices.

Specifications

H100 vs H200 vs B200: side-by-side specs

Hardware specifications are from NVIDIA datasheets (SXM variants; B200 figures are the dual-die HGX B200). Dense Tensor Core throughput is shown; NVIDIA headline numbers often quote the higher with-sparsity figure. The GPU-hour band is a live, sourced on-demand range, not a datasheet value.

NVIDIA data-center accelerator comparison
SpecificationH100 SXM5H200 SXMB200
ArchitectureHopperHopperBlackwell
GPU memory80 GB HBM3141 GB HBM3e192 GB HBM3e
Memory bandwidth~3.35 TB/s~4.8 TB/s~8 TB/s
FP8 dense (Tensor Core)~1,979 TFLOPS~1,979 TFLOPS~4,500 TFLOPS
FP4 dense (Tensor Core)~9,000 TFLOPS (new)
TDP~700 W~700 W~1,000 W
On-demand GPU-hour band$2.40–$12.29 (7 providers)$3.50–$10 (5 providers)$5.89–$8.60 (3 providers)

GPU-hour band: live on-demand range from rights-vetted provider rows; "Not yet sourced" means no approved row is on file for that chip yet. Datasheet specs describe capability, not a quote — see our methodology.

How to read it

What the spec gap actually means

H100 to H200 is the same compute generation: the gain is memory, not FLOPS. H200 keeps Hopper-class compute (~1,979 TFLOPS dense FP8) but nearly doubles memory (141 GB vs 80 GB) and bandwidth (4.8 vs 3.35 TB/s), so it wins on memory-bound and larger-context work, not raw throughput. B200 (Blackwell) is the generational jump: roughly 2.3x the dense FP8 throughput plus a new FP4 mode, at about 40% more power. That is why a newer chip can finish a memory-bound or large-model job in fewer hours and end up cheaper per completed workload even at a higher GPU-hour rate.

  • Memory-bound or long-context job: H200 over H100, because capacity and bandwidth, not FLOPS, are the bottleneck.
  • Throughput-bound training or high-volume inference: B200 can cut hours and chip count enough to offset its rate and power.
  • Always compare cost per completed workload, not the headline GPU-hour rate.

Market signal

How to read the market signal

  • Providers price different accelerators differently because they deliver different value.
  • Buyers compare not just hourly rental rates, but workload fit and total cost to complete a job.
  • Supply can shift as newer chips enter the market and older chips remain in service.
  • A market benchmark has to distinguish between chips instead of treating all GPU-hours as identical.

Market read: premiums and availability differences across GPU generations show which performance and memory profiles buyers currently value. Figures here are illustrative unless explicitly sourced and dated — see our methodology.

Common mistake

A lower hourly rate does not automatically mean lower compute cost. The right comparison is whether a chip can complete the required workload at the needed speed, scale, and total cost.

Price

Hourly price

What access costs per unit of time.

Performance

Performance

How much useful work the chip can complete.

Fit

Workload fit

Whether the chip is well suited to the model and task.

Practical takeaway

What you can do with this

Compare accelerator choices by expected completion cost and workload fit, not by hourly rental price alone. Include memory needs, networking, availability, utilization, and time-to-result.

  • Procurement teams: request quotes on equivalent workload and reliability assumptions.
  • Analysts: watch how demand and supply shift between generations as newer systems deploy.
  • Document whether a quoted premium purchases memory capacity, faster completion, cluster availability, or service assurance rather than assuming every generation is interchangeable.

Decision check: select the accelerator that delivers acceptable useful output per total dollar and deadline, not simply the lowest displayed rate.

Compute College

Turn the lesson into a number

Use the GPU-Hour Cost Calculator, AI Training Cost Calculator, or Model Serving Cost Calculator.

Use the calculators

Compute College track

AI Compute 101

Step 4 of 7: H100 vs H200 vs B200