Understanding AWS Capacity Blocks for GPUs

AWS released its p6 series of GPUs on May 15.  The p6’s are based on NVIDIA’s top-of-the-line for now B200 GPUs based on the Blackwell architecture.  But if you’re looking to rent p6 capacity you won’t be able to run them on demand, apply Compute Savings Plans, or procure them through the spot market, at least for the time being.  Instead you’ll need to buy a capacity block, a new billing construct AWS has created for GPU clusters.

Reserving compute capacity has long been a foundational activity of FinOps.  Things changed a few years ago when AWS replaced Reserved Instances with Compute Savings Plans for EC2, but the fundamental principle of committing to save money remained the same.  Capacity Blocks are an extension of that idea, but tailored to the use cases of GPU clusters, where a 1-3 year reservation might be cost prohibitive and depending on model training requirements, unnecessary.

Capacity Blocks are primarily sold in weekly increments, anywhere from 7 to 182 days.  They can be booked up to 8 weeks in advance to align with project planning.  There are also smaller 24 hour increments available for shorter projects.  AWS is strict about the 24 hour time  for all Capacity Block reservations.  They all start and end at 11:30 UTC, or 4:30 Pacific Daylight Time.  And they shut off at the same time regardless of what you’re working on, although they send a notification via EventBridge and give you a 30 minute shut down period that only starts when the 24 hours are up.  The rigidity extends to procuring them, they cannot be modified or cancelled once ordered, and there is only an all upfront payment option. 

Capacity Blocks are available for p6/Blackwell, p5/Hopper, p4/Volta, Trainium, and Trainium2 instance types.  Pricing is set dynamically, but you can assess current pricing by checking out GPU rental prices from AWS, or GPU providers like CoreWeave, WhiteFiber, and Lambda.    Importantly, they are sold at the node level, so you buy them in batches of 8 accelerators or 16 in the case of the Trainium chip.

Perhaps the most important distinction in terms of procuring Capacity Blocks from other EC2 services is the need for upfront planning.  A typical “run some instances and see if you need to reserve capacity” approach won’t work.  FinOps, finance, and engineering all need to be aligned on costs and requirements before making the irrevocable commitment.  Moreover, with the Blackwell-based p6-b200.48xl running over $65 an hour list, a one month commitment for a single node can top $40,000.

As use of AI and GPUs grow, Capacity Blocks are likely to take a greater share of total compute spend.  Having a planning process to procure them is essential for any company developing large ML models.

Previous
Previous

AI Capex Grew over 80% Annually in Q1 to $90 billion

Next
Next

Developing a Unit Cost Measure for Inference