The largest of the present day models (GPT-4.5, Opus 4) in some strange sense could be said to cost about $500M to do the final pretraining run. Though much more even with the same strange way of counting costs in research experiments that are necessary to make the final pretraining run a success and the subsequent post-training good at eliciting its capabilities in a useful form. (Long reasoning training, or RLVR, is still mostly in elicitation mode, but threatens to start creating new capabilities at a more substantial training cost within 1-2 years.)
This is not the real cost in the sense that there is no market where you can pay that amount of money and get the ability to do that training run, instead you need to build a giant training system yourself. The servers and networking cost about 10x more than the 3-4 months of their time at a minimal price necessary to break even, considering that price-performance of compute advances quickly, and also the hardware itself doesn't last more than a few years when always in use. (Cloud providers would charge substantially more than that and won't give you nearly enough compute for a frontier training run.)
Since frontier training systems are currently very different from older ones (larger, and with much higher power and cooling requirements per rack), it's also necessary to pay for the buildings, power, and cooling infrastructure, at the same time as you need to be buying the very expensive compute hardware. This makes the costs about 50% greater than just the compute hardware. So in total you are paying 15x more to build a frontier training system, than the pretend on-paper "cost of a training run". The $500M training runs of today are done at what's probably $7bn training systems ($4-5bn in compute hardware, $1-3bn in buildings/power/cooling). The company needs to actually raise the $7bn, and not the $500M.
The largest training system currently being built that's somewhat documented is Stargate Abilene, to be completed around summer 2026. It might cost maybe $40-45bn to build ($15bn through Crusoe on buildings/power/cooling, maybe around $27bn on compute racks and networking through Oracle), and will host 400K chips in GB200 NVL72 racks, which is 10x more FLOP/s for pretraining than probably went into GPT-4.5 or Opus 4, and 150x more than went into GPT-4.
Now the pretend "cost of time" of the ~$40-45bn system to do a 3-4 month long final pretraining run of a giant model that might come out in 2027 could be said to be "about $3bn", but that's a somewhat meaningless figure, they still needed to manage to finance the ~$40-45bn development to get there, and they'll spend more than $3bn on the experiments needed to make that training run work.
This year, Amazon is spending $100bn on things like building its datacenters around the world, and that's a $2-3 trillion market cap company. Even if the giant datacenters are each a 2-year project, we are already close to what a non-AGI AI company might be able to finance, and closely after that we'd be running into the constraints of industrial capacity. So without AGI, the scaling of giant frontier AI training systems should stop around 2027-2029, at which point it regresses to the pace of Moore's law (of price-performance), which is about 3x slower than the current funding-fueled ramp-up.
So if I understand correctly, you're saying it would not be feasible to scale up training compute by 100x in a matter of months, because you'd need to build out the infrastructure first?
I am not concerned about this scenario. It does not matter if this is feasible or not (it might be theoretically feasible, but other things will almost certainly happen first).
The labs are laser-focused on algorithmic improvements, and the rate of algorithmic improvements is very fast (algorithmic improvements contribute more than hardware improvements at the moment).
The AIs are being optimized to do productive software engineering and to productively assist in AI research, and soon to perform productive AI research almost autonomously.
So the scenario I tend to ponder is software-only intelligence explosion based on non-saturating recursive self-improvement within a fixed hardware configuration (this is, in some sense, a scenario which is dual to the scenario described in this post; although, of course, they all are trying to scale hardware as well because they are in a race and every bit of advantage matters if one wants to reach an ASI level before other labs do that; that race situation is also quite unfortunate from the existential safety angle).
Answering my own question:
To me, the idea of "fully human-level capable AI" is a double myth. It works, in so far as we do not try to ascribe concrete capabilities to the model. Anything human-level that can be parallelized is per definition super-human. That's why to me it's a myth in the first place. Additionally, human-level capabilities just make very little sense to me in a model. Is this a brain simulation, and does it feel boredom and long for human rights? Or is this "just" a very general problem solving tool, something akin to an actually accurate vision-language model? This is a categorical difference.
Accurate, general problem solving tools are far more likely and, in the wrong hands, can probably cause far more harm than a "virtual human" ever could. On the other hands, the simulated brain raises many more ethical concerns, I would say.
To actually answer the question, I'm not concerned about a fast takeoff. There are multiple reasons for this:
Yes, until we set rigorous terms and prove otherwise, there is certainly a possibility. But compared to "mundane" worries like climate change and socioeconomic inequality this potential existential threat does not even register.
I am imagining a scenario like:
Thus there is no period of recursive self-improvement, you just go from human-level to dead in a single step.
This scenario depends on some assumptions that seem kinda unlikely to me, but not crazy unlikely. I want to hear other people's thoughts.