Compute College

H100 vs H200 vs B200: price, memory, and performance

By ComputeTape Editorial

How NVIDIA accelerator generations compare on workload fit, current GPU-hour pricing, memory, and availability.

Generation mattersComparison

Newer chips can change both performance and buyer willingness to pay.

Premiums show demandMarket read

Price gaps across generations reveal what capacity profiles buyers value.

Memory capacity and bandwidth increase, which matters for larger models and memory-heavy workloads.
Architecture advances can improve throughput and efficiency.
New generations can change which workloads are practical, how many chips are needed, and what buyers are willing to pay.
Availability and supply mix also change as the market moves from one generation to the next.

Hopper

H100

The baseline high-end accelerator that became a core reference point for AI compute pricing. Key idea: strong general-purpose AI capacity.

Hopper

H200

A Hopper-generation step-up with much larger and faster memory for memory-heavy AI workloads. Key idea: better fit for larger models and memory-sensitive workloads.

Blackwell

B200

The next-generation Blackwell accelerator, pushing the performance and memory frontier higher again. Key idea: a new generation that can shift workload economics and market expectations.

H100 vs H200 vs B200 buying comparison
Chip	Generation	Memory profile	Price signal	Best fit	Current prices
H100	Hopper baseline	80 GB HBM3	$2.40–$12.29 (7 providers)	General training and inference baseline; the most common comparison point.	Current H100 pricing
H200	Hopper memory upgrade	141 GB HBM3e	$3.50–$10 (5 providers)	Memory-heavy inference, larger contexts, and workloads bottlenecked by H100 memory.	Current H200 pricing
B200	Blackwell	192 GB HBM3e	$5.89–$8.60 (3 providers)	New-generation throughput, FP4-capable workloads, and buyers paying for scarce frontier capacity.	Current B200 pricing

Any figures shown are illustrative calculations, not current quoted market prices.

NVIDIA data-center accelerator comparison
Specification	H100 SXM5	H200 SXM	B200
Architecture	Hopper	Hopper	Blackwell
GPU memory	80 GB HBM3	141 GB HBM3e	192 GB HBM3e
Memory bandwidth	~3.35 TB/s	~4.8 TB/s	~8 TB/s
FP8 dense (Tensor Core)	~1,979 TFLOPS	~1,979 TFLOPS	~4,500 TFLOPS
FP4 dense (Tensor Core)	—	—	~9,000 TFLOPS (new)
TDP	~700 W	~700 W	~1,000 W
On-demand GPU-hour band	$2.40–$12.29 (7 providers)	$3.50–$10 (5 providers)	$5.89–$8.60 (3 providers)

GPU-hour band: live on-demand range from rights-vetted provider rows; "Not yet sourced" means no approved row is on file for that chip yet. Datasheet specs describe capability, not a quote — see our methodology.

How to read it

What the spec gap actually means

H100 to H200 is the same compute generation: the gain is memory, not FLOPS. H200 keeps Hopper-class compute (~1,979 TFLOPS dense FP8) but nearly doubles memory (141 GB vs 80 GB) and bandwidth (4.8 vs 3.35 TB/s), so it wins on memory-bound and larger-context work, not raw throughput. B200 (Blackwell) is the generational jump: roughly 2.3x the dense FP8 throughput plus a new FP4 mode, at about 40% more power. That is why a newer chip can finish a memory-bound or large-model job in fewer hours and end up cheaper per completed workload even at a higher GPU-hour rate.

Memory-bound or long-context job: H200 over H100, because capacity and bandwidth, not FLOPS, are the bottleneck.
Throughput-bound training or high-volume inference: B200 can cut hours and chip count enough to offset its rate and power.
Always compare cost per completed workload, not the headline GPU-hour rate.

Providers price different accelerators differently because they deliver different value.
Buyers compare not just hourly rental rates, but workload fit and total cost to complete a job.
Supply can shift as newer chips enter the market and older chips remain in service.
A market benchmark has to distinguish between chips instead of treating all GPU-hours as identical.

Market read: premiums and availability differences across GPU generations show which performance and memory profiles buyers currently value. Figures here are illustrative unless explicitly sourced and dated — see our methodology.

Price

Hourly price

What access costs per unit of time.

Performance

How much useful work the chip can complete.

Fit

Workload fit

Whether the chip is well suited to the model and task.

Procurement teams: request quotes on equivalent workload and reliability assumptions.
Analysts: watch how demand and supply shift between generations as newer systems deploy.
Document whether a quoted premium purchases memory capacity, faster completion, cluster availability, or service assurance rather than assuming every generation is interchangeable.

Decision check: select the accelerator that delivers acceptable useful output per total dollar and deadline, not simply the lowest displayed rate.

Use the calculators

Compute College track

AI Compute 101

Step 4 of 7: H100 vs H200 vs B200

H100 vs H200 vs B200: price, memory, and performance

Plain-English definition

Why it matters

Simple example

H100

H200

B200

H100 vs H200 vs B200: side-by-side specs

What the spec gap actually means

How to read the market signal

Common mistake

Hourly price

Performance

Workload fit

What you can do with this

Turn the lesson into a number

AI Compute 101

H100 vs H200 vs B200: price, memory, and performance

Plain-English definition

Why it matters

Simple example

H100

H200

B200

H100 vs H200 vs B200: side-by-side specs

What the spec gap actually means

How to read the market signal

Common mistake

Hourly price

Performance

Workload fit

What you can do with this

Turn the lesson into a number

AI Compute 101

Related lessons

What is AI compute?

What is a GPU-hour?

GPU Pricing Hub