September 3, 2025 - AWS Automates Prompt Caching for Claude on Bedrock
AWS has just announced automated prompt caching when running Claude on Bedrock, removing the need for users/developers to set the cache breakpoints manually. While there should be some reduction in tokens-per-minute for developers using code assistant features, it's basically a press release to counter all the tokens-per-minute quota reductions they've implemented over the last year...without any press releases.
August 25, 2025 - Supermicro and Partners Tackle AI Storage Bottlenecks
Storage vendors continue to publicize their efforts to address the storage bottlenecks that are quickly emerging as Gen AI usage grows.
August 19, 2025 - Storage Networking Industry Association Launches Storage.AI Project
The Storage Networking Industry Association (SNIA) has launched Storage.AI to develop open standards for communication and data formatting between storage and GPUs and memory. Right now, GPUs are vulnerable to idle time waiting for data to come through various pipelines for processing, this effort hopes to build a standard-based approach to optimizing these pipelines through common formats and communication protocols.
August 6, 2025 - Enfabrica touts elastic AI memory fabric for GPU workload efficiency
Enfabrica, which has raised $290 million, is releasing a memory fabric based on the CXL (Compute Express Link) standard, that connects external memory to a GPU. This is an alternative to achieving the same thing using SSDs, with both aiming to meet the soaring demand for memory offload in inference applications.
August 5, 2025 - Micron Unveils the 9650 - PCIe Gen6 SSD to Power AI Data Center Workloads
Micron has released a new SSD based on PCIe Gen6 that greatly improves performance of the PCIe Gen5 9550. In addition to nearly doubling MB per watt, the 9650 doubles sequential read performance to 28 GB/s, increases random read performance 66% to 5.5 GB/s, and up random write performance 40% to 14 GB/s. While Micron is promoting the devices as helpful for both training and inference, inference is likely to dominate deployment, especially for applications like code editors which are straining to keep up with growing output lengths.
August 5, 2025 - Positron Taking on NVIDIA with Inference-Specific GPUs
Positron announced a $51 million funding round last week to develop inference-specific GPUs that it claims can beat NVIDIA in performance (tokens) per watt. Its website features a counter displaying this. Acknowledging memory bandwidth and capacity are the gating factors in inference performance, it is planning future ASICs with far more memory than HBM (High Bandwidth Memory) is projected to provide over the next few years.
August 4, 2025 - AWS Launches Fractional L4 Instances
In another attempt to address cost disparities with neoclouds, AWS has launched g6f instances, comprising fractional versions of NVIDIA’s lower end L4 chip. On Demand rates for the g6f.large, representing 1/8th of an L4, come in at 20.2 cents per hour. RunPod, meanwhile, offers a full L4 instance for 43 cents an hour, or about what AWS charges for a 1/4 of one.
July 28, 2025 - Anthropic unveils new rate limits to curb Claude Code power users
One of the biggest challenges with AI coding assistants is maintaining sufficient resources to keep up with requests, and doing so with a decent profit margin. Anysphere caused an uproar a few weeks ago when it began shifting users from fixed to usage-based pricing, now Anthropic is introducing rate limiting for Claude Code, forcing heavy users to purchase additional usage at standard API rates. Underlying all this is the economics of adding capacity for inference, which are strained especially due to costs for memory needed to cache the vectors used to produce output tokens.
July 28, 2025 - Sandisk Assembles Advisory Board to Guide High Bandwidth Flash Strategy
Lots of news recently on the use of Flash to supplement High Bandwidth Memory given the rapidly growing needs of caching vectors for Inference. Sandisk has now announced an advisory board to develop High Bandwidth Flash, or HBF, to advance development of GPU architectures that create a caching layer in SSDs in the hardware layer. However, unless and until NVIDIA signs on, it could be limited to the smaller GPU providers.
July 24, 2025 - SK Hynix Revenue Grows with HBM
SK Hynix, one of the leading suppliers of High Bandwidth Memory (HBM) in GPU-powered systems, reported second quarter revenue of $16.2 billion yesterday, up 35% year-over-year. During the earnings call, it mentioned that HBM revenue alone will double in 2025. If the tens of billions of capex for AI you keep reading about actually gets spent, this growth will continue for at least a few more years.
July 23, 2025 - DDN Introduces Infinia, Creating an SSD Caching Layer
AI Storage vendors are quickly adding the capability to use their SSDs as a caching layer. This means the KV (Key Value) Cache typically stored in memory to accelerate Inference can be extended out to disk, preventing re-computation of the vectors representing the output tokens. This greatly reduces latency while increasing price/performance.
July 22, 2025 - Oracle and OpenAI Announce 4.5 GW Stargate Expansion
Oracle and OpenAI are now saying Stargate will reach 5 GW and 2 million chips of capacity when complete. Not in the release, but a good rule of thumb is about $7 per watt capex for a full AI data center buildout, with about 75% of that on compute infrastructure. So another 4.5 GW should work out to about $30 billion of total investment. It is not clear where all the additional Stargate facilities will be located, but the largest known one is in Abilene, TX about 2 hours west of Dallas, where 1.2 GW is planned, and is located near a large area of wind farms.
July 17, 2025 - AWS Introduces S3 Vector Buckets
AWS has launched S3 Vector Buckets, which allow you to query and store embeddings directly through S3. That said, pricing is also based on query volume and there’s a 20 cent per GB PUT charge, so determining what data to put in there still requires careful cost analysis, it’s not the same as storing standard objects in S3.
July 16, 2025 - AWS Develops its own Liquid Cooling System to Support GPU racks
Dissatisfied with vendor products, AWS now developed its own liquid cooling system to support its recently released GB200 NVL72 racks. It cites not having to build new facilities from scratch those systems would have required, and lower floor space requirements as its key reasons.
July 15, 2025 - Google to Invest $25 billion in Pennsylvania and Mid-Atlantic for AI Infra
Sharing this here, but generally better to look at SEC filings than press releases to really assess spend, especially when no one announces small dollars when there are politicians around.
July 14, 2025 - AWS Launches P6e-GB200 Servers
AWS has launched its latest NVIDIA server types, the Grace Blackwell GB200 NVL72 based P6e-GB200, available in one rack (36 GPUs) or two rack (72 GPUs) configurations. NVIDIA generally doesn’t like to promote the NVL36 config, so the model number wasn’t included in the press release, but the specs match. The 72 accelerator version comes in at $761.91/hour, which I’m pretty sure is a record high cost for any EC2 instance and is available only through the purchase of a Capacity Block, which means no On Demand or Compute Savings Plan for these servers…yet.
July 9, 2025 - Cloudian Delivers Integrated AI Inferencing and Data Storage Solution
Cloudian has updated its Hyperstore platform to include an integration with open source vector database Milvus. This allows you to manage and store vector embeddings and source data in one place with a single interface. This should be particularly useful for RAG applications that need to store .pdfs, articles, and other data, convert them into vector embeddings, and then feed those into LLMs for inference requests.
July 9, 2025 - RunPod Releases S3 Compatible API to Streamline Workflows
Neocloud RunPod has released an S3 compatible API that allows you to manage files, upload datasets, or move data without launching an expensive GPU instances. Now you can use an S3 API key to access storage held within RunPod’s infrastructure, while running the commands on your own computer, and no cloud rented CPU/GPU needed.
July 9, 2025 - WEKA Debuts NeuralMesh Axon For Exascale AI Deployments
Parallel file systems manufacturer WEKA has launched a new storage system positioned as a sort of virtual memory for KV caches, which play a large role in Inference performance. After getting through all the “AI Factory” and related hype, the press release listed Cohere as an initial customer.
July 9, 2025 - CoreWeave First to Rollout GB300 NVL72 Systems
NVIDIA now has a $4 trillion market cap, and maintaining that valuation requires not being beholden to a few hyperscalers. CoreWeave has a $70 billion+ market cap, which requires staying ahead of other neoclouds. So no surprise to see CoreWeave as the first to announce availability of the GB300 NVL72, NVIDIA’s rack level system based on its latest Grace CPU + Blackwell Ultra GPU.
June 30, 2025 - Meta Looking to Raise $29 Billion for Data Center Expansion
Meta has $70 billion in the bank, and also produces about $12 billlion of free cash flow per quarter. Nonetheless, it has never been afraid to use debt to fund expansion, which is what it will do to cover the additions to its capital budget to grow its data center footprint.
June 26, 2025 How Much Energy Does Your AI Prompt Use?
The Wall Street Journal visited an Equinix data center in Ashburn to learn more about power consumption. The article quotes Sam Altman as saying the average ChatGPT request takes 0.3 watt hours. Not mentioned is that is exactly what Google said the average search took in 2009. Also not mentioned is that advances in memory bandwidth and higher volume will bring this significantly over the next 12-18 months.
June 19, 2025 - NVIDIA Backs TerraPower, Developing Nuclear Power for AI Data Centers
NVIDIA is investing in TerraPower, whose existing backers include Bill Gates, to advance nuclear power for AI data centers.
June 17, 2025 - New Study Suggests Future AI Chips and Advances in High Bandwidth Memory Could Strain the Latest Advances in Power and Cooling
As not just compute power, but memory bandwidth grow, advances in direct-to-chip cooling and related technologies might not be enough to deal with nodes that could produce 15 kW, and racks well over 100 kW. The analysis specifically calls out a future HBM8, which will provide 32x the memory bandwidth per stack in about ten years than HBM4, which will come out next year on NVIDIA Rubin and AMD MI400’s GPUs.
June 17, 2025 - Waymo Provides Insights on its AI Infra Scaling Laws
Waymo has released an analysis that shows the benefits data enrichment provides to its autonomous vehicle service. An interesting comparison to how it benefits LLMs.
June 13, 2025 - Sam Altman Warns AI is Growing Faster than Infrastructure Can Handle
Speaking at AMD’s Advancing AI event, OpenAI CEO Sam Altman claimed that the only way AI can continue not just its frenetic growth pace, but deliver fancy new capabilities is with tons more GPUs and memory. But I think we knew this.
June 13, 2025 - CoreWeave to Supply GCP Who’ll Supply OpenAI
NVIDIA will sell GPUs to CoreWeave who’ll in turn host them from Google Cloud who’ll in turn supply then to OpenAI, who already has an $11.9 billion, 5 year deal with CoreWeave. So why do this? Google wants to compete with Microsoft, while OpenAI and CoreWeave want to make themselves less dependent on Microsoft.
June 13, 2025 - AMD Introduces New GPUs at Advancing AI Event
AMD introduced its Blackwell competitors - the MI350x and Mi355x GPUs - at its Advancing AI event this week. They are manufactured with the same fab provider, TSMC, and using the same Chip-on-Wafer-on-Substrate (CoWoS) technology as NVIDIA’s Blackwells. AMD is trying to get a leg up on NVIDIA by offering 50% more memory and over 100% more memory bandwidth than its big competitor. Oracle/OCI has already begun deploying the chips, and AMD-funded TensorWave will likely follow.
June 11, 2025 - NVIDIA CEO Jensen Huang Now Says Quantum Computing is Reaching an “Inflection Point”
After saying quantum computing could be 15 years away, Jensen Huang is saying it’s getting closer to solving real problems. Quantum computing is often cited as the number one threat to NVIDIA’s business, and its $3 trillion market cap, so it makes sense that he’d start addressing it head on. There are a growing number of quantum startups, as well as Google’s “Quantum AI” initiative, which includes its own “Willow” chip design, but by the time these things are ready for mass production, the hype about all things agentic will be in the history books and they will be solving other problems.
June 11, 2025 - Amazon to Invest $20 billion in Pennsylvania to expand AI and Cloud Infrastructure
Where there are politicians, there are press releases - lots of them. If you’re looking to track or estimate actual capital investment in AI infrastructure the more relevant number than this announcement is that AWS is currently spending close to 70% of its revenue on capex.
June 10, 2025 - WEKA and Nebius Announce GPU-Storage Partnership
Parallel file systems are required to provide the high I/O needed for model training. WEKA, a leading provider of such systems, is joining forces with Nebius to provide its hardware to customers renting GPU services.
June 10, 2025 - Oracle Seeking 5 GW of US Data Center Capacity
TD Cowen has put out a note suggesting Oracle is looking to spend approximately $160 billion over the next year and a half on 5 GW of data center capacity and related GPUs. They project just under 60% of this will go to GPUs (and networks), with the balance on facility infrastructure. After languishing for years as a 4th place cloud, Oracle/OCI is taking full advantage of the open competition for GPU clouds.
June 9, 2025 - Nebius Announces B300 Clusters will be available in the UK in Q4
You can expect a lot of B300 announcements in the second half of this year as NVIDIA unveils its latest Blackwell chip that brings 50% more FLOPs, 50% more memory, and 50% more memory bandwidth than the B200. Getting out ahead of other cloud providers who’ll also have B300 news, Nebius will deploy the chips in the UK by the end of 2025.
June 5, 2025 - Broadcom Reports $15 billion of revenue, including $4.4 billion in AI hardware
Broadcom reported 20% revenue growth today for its the second quarters of FY2025. AI revenue came in at $4.4 billion with the company projecting AI semiconductor revenue of $5.1 billion next quarter “due to hyperscalers”. Much of this revenue comes from its agreement with Google to co-design the cloud provider’s Tensor Processing Units, or TPUs. In particular, the company provides the designs for I/O, SerDes, and other peripheral features where it holds extensive expertise and most cloud providers don’t.
June 5, 2025 - Meta Signs Nuclear Power Deal with Constellation Energy
Nuclear power is making a big comeback due to its ability to power AI data centers. Google has already pledged support for the technology now Meta is buying a Power Purchase Agreement of over 1 Gigawatt to support a plant that was going to close. Note that as a PPA, this agreement is to support clean power financially to offset carbon burning natural gas and coal elsewhere, not a direct purchase for Meta data centers.
June 4, 2025 - Amazon to Invest $10 billion in North Carolina AI Infrastructure
Another day, another multi-billion dollar investment in AI data centers. In this case Amazon will be committing $10 billion to a facility near Charlotte. Like most of these announcements, there is a broad number without specifying where it’s actually going. That said, a rough rule of thumb is about $15,000 of facility capex per kW of capacity…although in this case they’re really vague and don’t mention how many MW or GW of capacity they’re building out.
June 4, 2025 - Vertiv unveils trio of liquid cooling CDUs for AI data centers
CDUs, or Coolant Distribution Units, provide direct-to-chip cooling at either a rack or row level within a data center. Vertiv is adding three models that provide cooling capacity from 70 to 600 kW of cooling capacity.
June 2, 2025 - Applied Digital Announces 250MW AI Data Center Lease With CoreWeave in North Dakota
CoreWeave has signed a 15 year deal worth up to $7 billion for 250 MW of capacity with Allied Digital at its HPC Data Center facility in Ellendale, North Dakota, with an option for 150 MW of additional capacity. Why Ellendale? In addition to government incentives, Applied Digital chose the town for its access to energy resources, particularly its location in the middle of a dense wind power region in the Plains.
June 1, 2025 - New Startup Sygaldry Aims to Rethink AI Infrastructure With Quantum Hardware
Sygaldry is joining the chorus of companies claiming Quantum Computing can deal with AI’s energy consumption, especially for image processing and low latency inferencing. Sygaldry is backed by Y Combinator and led by compute hardware veterans.
May 29, 2025 - Blackwell Now GA on AWS in US-West-2
AWS is now offering Blackwell-based B200 GPUs in its Oregon region, in its “p6” series of processors. At this point they are only available by purchasing Capacity Blocks - reservations from 1 to 26 weeks. Effective hourly costs for the 8 GPU p6-b200.48xl are $65.12.
May 28, 2025 - Atlas Cloud Announces Inference Service to Boost GPU Throughput
GPU provider Atlas Cloud has announced an inference service that promises greater throughput and lower cost through its load balancing and compute-memory segregation technologies. Unlike other neoclouds, the company is targeting traditional business users who need packaged solutions and often have lower budgets than research-intensive tech companies or HPC labs.
May 27, 2025 - Intel Unveils New Xeon 6 CPUs to Maximize GPU-Accelerated AI Performance
Intel announced new Xeon CPUs which will be integrated into NVIDIA’s upcoming DGX B300 systems. While NVIDIA tends to promote its own Arm-based Grace CPUs more heavily, the Xeons are still used by many enterprise users who need x86 compatibility, and can tolerate the lower transfer speeds of PCIe connections between the CPU and GPU.
May 26, 2025 - NVIDIA to Launch Cheaper Blackwell Chip for China to Get Around Export Curbs
NVIDIA will be launching the B20, a far less powerful version of its flagship Blackwell chip that will meet requirements for export to China. A key difference will be the use of GDDR7 memory, a major drop in memory bandwidth from the HBM memory that’s been in all of its data center chips going back to its pre-Tensor Core Pascal chip which was released in 2016, but cannot be exported to China under current law. The B20 is expected to sell for nearly 90% less than the B100. NVIDIA Is keen to keep a foothold in China to prevent Huawei from completely taking over the market.
May 23, 2025 - CoreWeave and Flexential - Scaling AI with High Density Data Centers
CoreWeave does not own any of data centers, but is pushing towards 1 GW of total leased capacity across 33 facilities. It has recently added 260 MW in the Panhandle of Texas with Galaxy, and announced today it’s adding 13 MW with Flexential in Plano. The company announced an $11.2 billion, 5 year deal to supply GPUs and related infrastructure to OpenAI in March, and filed with the SEC recently that it captured $4 billion contract, which is also believed to be from OpenAI.
May 22, 2025 - OpenAI Expanding Stargate to UAE with 1 GW Cluster
As mentioned yesterday, Stargate is OpenAI ‘s joint venture to invest $500 billion in infrastructure. Each Stargate campus is being built out to approximately 1 GW of capacity, with up to 400,000 NVIDIA chips each. The first facility is currently under construction in Abilene, Texas, and the company has now announced its first international facility in Abu Dhabi as part of its OpenAI for Countries initiative. The Abilene facility is being built out by Crusoe per yesterday’s update.
May 21, 2025 - Crusoe secures $11.6 billion for Texas data center
AI data center builder Crusoe has secured over $11 billion in funding to build out a facility in Abilene, Texas. The company has a contract with Oracle, who in turn provides service to OpenAI, and will use the facility as part of its Stargate joint venture, a $500 billion investment in data centers and AI infrastructure. Crusoe holds patents in energy technologies that are intended to reduce the carbon footprint, and ultimately costs, of supporting GPUs and AI.
May 20, 2025 - NVIDIA provides Omniverse Blueprint for AI factory digital twins
Omniverse is NVIDIA’s plaform for building 3D industrial platforms, and modeling physical environments. Data centers use it for Computational Fluid Dynamics (CFD) analysis to optimize airflows and cooling. Today at Computex it announced it’s added new partners for Omniverse Blueprint, its platform for AI digital twins, expanding the ability of manufacturers to create AI models of their products as part of the design and build process.
May 18, 2025 - NVIDIA Unveils NVLink Fusion
NVLink is NVIDIA’s proprietary chip-to-chip interconnect that the company has aggressively marketed as faster than PCIe. Furthering its battle against that standard, NVIDIA has announced NVLink Fusion, opening its interconnect to CPU makers like Qualcomm and Fujitsu. NVLink runs out of the PCIe interface, but uses its own silicon technology.
May 14, 2025 - AI infrastructure firm TensorWave raises $100 million
TensorWave announced it has raised $100 million to build out its GPU infrastructure service. Unlike other providers, TensorWave does not provide any NVIDIA products, focusing instead on AMD hardware, notably the MI300X and MI325X [latforms.