This seems like a superficial reading of the blog post. The part you focus on that the MS President announced was not really that much news - $80b is a nice specific citable number, yes, but I don't know of widespread belief that MS's datacenter capex was going to be much below that, and this is consistent with past reporting like Stargate being $100b+.
The important thing about the post, and why the MS President wrote it and posted it when he did, is the messaging it contains to the incoming Trump administration and to the natsec establishment, lobbying for specific policies around AI regulation and chip exports and how they should be wielded for foreign policy. (See also the Oracle blog post, the new chip export regulation proposal, and Masayoshi Son.) Probably the most important part of the post was the paragraph about G42, and that's the one you and pretty much all commentators appear to have not even read.
Stargate is not 2025, and going from $50bn to $80bn is in line with building a $25-40bn training system this year (an unusual expense distinct from other projects), a clue separate from SemiAnalysis claims.
Stargate was reported in 2024, and that reporting specified that the Stargate $100b phase hadn't started yet because MS was still building the previous phase, with "in excess of $115b" for all the phases, implying a large ramp up. And since Stargate was intended for OA, while MS of course has its own knitting to tend to, that implies much larger datacenter capex total. Given how vague the reporting is and how large the numbers are, but that the sooner the better, $80b in FY 2025 doesn't clearly tell me that there must be some mystery $25-40bn training system which is a big surprise. You don't build a Stargate overnight, and if it is to be finished and fully operational "as soon as 2028", you're going to need to be spending a lot of money 3 years beforehand.
The $25-40bn figure is an estimate for about 1 GW worth of GB200s. SemiAnalysis expects 1 GW training systems for Google in 2025 and something comparable for Microsoft/OpenAI. This is discussed by Dylan Patel publicly on Dwarkesh Podcast, claiming that there is a 300K B200s cluster and 500K-700K B200s in total currently being constructed, possibly networked into a single training system. So if planned Microsoft capex was $60bn, that would've been surprising, too little for this project without cutting something else, but $80bn fits this story, that's my takeaway.
With Stargate, $100bn is still too much for the training systems of 2024-2025, so it's either not about what's being built in 2024-2025 at all, or a larger project that has current activities as part (which wouldn't fit building a big training system using a specific generation of hardware). Musk's 100K H100s Colossus tells me that building a training system in a year is feasible, even though it normally takes longer. The preliminary steps (land, power, permits, buildings) are much cheaper, but securing power and permits can require starting years in advance. So talking about a $100bn Stargate in 2024 is consistent with building it mostly in late 2026, once there is a plot with 3-5 GW of power and datacenter permits, most of the expense will then be in 2026 (Nvidia Rubin probably).
So if planned Microsoft capex was $60bn, that would've been surprising, too little for this project without cutting something else, but $80bn fits this story, that's my takeaway.
But why? You don't know what fiscal year that $25-40bn figure is booked for, and if they are going to run a single true production-scale 3-6-month run (for cost-optimality) on that $40b cluster, then isn't a total capex of $80bn for all MS datacenters if anything surprisingly small? That a single cluster is going to be half their capex, including 2025 spending for future years like buying land or power or GPUs?
(Also, note that this $80bn figure is intrinsically untrustworthy, because as I was pointing out, the importance of this is the political signaling going on, and so you would expect this number to be 'technically correct' - highly manipulated in some direction which does in fact yield a number starting with '80' but only loosely corresponding to reality. This number is propaganda, and good propaganda is true but not necessarily true. My best guess is that it's probably being manipulated to be as high as possible, but I'm not sure because so many of the dynamics here are opaque, so it could also be manipulated to be low.)
Musk's 100K H100s Colossus tells me that building a training system in a year is feasible, even though it normally takes longer.
Which implies that they would need to be spending that $40bn cluster in 2024, if they want to run it in 2025, and so shouldn't be part of the 2025 estimate... If you really want to put stress on this, it contradicts your story about why $80bn is evidence for that. Also, note that Musk's success there is dubious: he got there by doing things like hooking up temporary nat gas generators, diverting GPUs from Tesla, and it's unclear how well it even works, given the rumors of a big training run failure and the rather precise wording of Musk's tweets about what exactly the datacenter can do.
The point about Colossus is that the expensive part of a cluster can be done within a few months (even if it won't be able to start training models immediately), the unusual thing about a training system is that it both uses the same hardware for the whole thing, and needs the newest hardware. The cost of all the other parts of a datacenter is almost a rounding error (for example land comes out to below 1%). Altogether, this means that almost all capex for a training system describes the short phase where you install the hardware, and you want to get all the hardware in a narrow window of time, even as the various preliminary steps can take years. For GB200s, shipments in bulk start in Q2-Q3 2025 (which also strongly suggests training only starts in 2026). I'm not sure how far the payment for hardware can be moved from the actual shipments, but all else equal for Microsoft I expect FY 2025 (July 2024 to June 2025).
NVL72 GB200s are much better than 8x H100s for inference (much more HBM, much larger scale-up world size), can remain efficient with long context and larger models (this even weakly suggests that general deployment of larger models trained on 100K H100s will be delayed until late 2025, except for Google). So unclear if there need to be a lot of inference B200s compared to H100s before the training system also goes to inference. (Inference compute scales linearly with model size, while training compute scales with model size squared.)
This blog post by Microsoft's president, Brad Smith, further increases my excitement for what's to come in the AI space over the next few years.
To grasp the scale of an $80 billion US dollar capital expenditure, I gathered the following statistics:
The property, plant, and equipment on Microsoft's balance sheet total approximately $153 billion.
The capital expenditures during the last twelve months of the five largest international oil companies (Exxon, Chevron, Total, Shell, and Equinor) combined amounted to $88 billion.
The annual GDP of Azerbaijan for 2023 was $78 billion.
This level of commitment by Microsoft is unprecedented in private enterprise—and this is just one company. We have yet to see what their competitors in the space (Alphabet, Meta, Amazon) plan to commit for FY2025, but their investments will likely be on a similar scale.
This blog confirms that business leaders of the world's largest private enterprises view AI as being as disruptive and transformative as the greatest technological advances in history. I am excited to see what the future holds.