Since the end of (very weak) training scaling laws
Precisely because the scaling laws are somewhat weak, there was nothing so far to indicate they are ending (the only sense in which they might be ending is running out of text data, but models trained on 2024 compute should still have more than enough). The scaling laws held for many orders of magnitude, they are going to hold for a bit further. It's plausibly not enough, even with something to serve the role of continual learning (beyond in-context learning on ever larger contexts). But there is still another 100x-400x in compute to go, compared to the best models deployed today. Likely the 100x-400x models will be trained in 2029-2031, at which point the pre-AGI funding for training systems mostly plateaus. This is (a bit more than) a full step of GPT-2 to GPT-3, or GPT-3 to original Mar 2023 GPT-4 (after original Mar 2023 GPT-4 and with the exception of GPT-4.5, OpenAI's naming convention no longer tracks pretraining compute). And we still didn't see such a full step compared to original Mar 2023 GPT-4, only half of a step (10x-25x), out of the total of 3-4 halves-of-a-step (2022-2030 training compute ramp, 2000x-10,000x in total, at higher end if BF16 to NVFP4 transition is included, at lower end if even in 2030 there are no 5 GW training systems and somehow BF16 needs to be used for the largest models).
Since original Mar 2023 GPT-4, models that were allowed to get notably larger and made full use of the other contemporary techniques only appeared in late 2025 (likely Gemini 3 Pro and Opus 4.5). These models are probably sized compute optimally for 2024 levels of pretraining compute (as in 100K H100s, 10x-25x the FLOPs of original Mar 2023 GPT-4), might have been pretrained with that amount of compute or a bit more, plus pretraining scale RLVR. All the other models we've seen so far are either smaller than compute optimal for even 2024 levels of pretrained compute (Gemini 2.5 Pro, Grok 4, especially GPT-5), or didn't get the full benefit of RLVR compared to pretraining (Opus 4.0, GPT-4.5) and so in some ways looked underwhelming compared to the other (smaller) models that were more comprehensively trained.
The buildout of GB200/GB300 NVL72 will be complete at flagship model scale in 2026, and makes it possible to easily serve models sized compute optimally for 2024 levels of compute (MoE models with many trillions of total params). More training compute is currently available and will be available in 2026 than what was there in 2024, but for most of the inference hardware currently available it won't be efficient to serve models sized compute optimally for this compute (at tens of trillions of total params), except with Ironwood TPUs (which are being built in 2026, for Google and Anthropic) and then Nvidia Rubin Ultra NVL576 (which will only get built in sufficient amounts in 2029, maybe late 2028).
So the next step of scaling will probably come in late 2026 to early 2027 from Google and Anthropic (while OpenAI will only be catching up to late 2025 models from Google and Anthropic, though of course in 2026 they'll have better methods than Google and Anthropic had in 2025). And then training compute will still continue increasing somewhat quickly for models until 2029-2031 (with 5 GW training systems, which is at least $50bn per year in training compute, or $100bn per year in total for each AI company if inference is consuming half of the budget). After Rubin Ultra NVL576 (in 2029) and to some extent even Ironwood (in 2026), inference hardware will no longer be a notable constraint on scaling, and after AI companies are working with 10 GW of compute (half for training, half for inference), pretraining compute will no longer be growing much faster than price-performance of hardware, which is much slower than the buildout trend of 2022-2026, and even than the likely ramp-off in 2026-2030. I only expect 2 GW training systems in 2028, rather than the 5 GW that the 2022-2026 trend would ask for in 2028. But by 2030 the combination of continuing buildout and somewhat better hardware should still reach the levels of what would be on-trend for 2028, following 2022-2026.
That scenario is not impossible. If we aren't in a bubble, I'd expect something like that to happen.
It's still premised on the idea that more training/inference/ressources will result in qualitative improvements.
We've seen model after model being better and better, without any of them overcoming the fundamental limitations of the genre. Fundamentally, they still break when out of distribution (this is hidden in part by their extensive training which puts more stuff in distribution, without solving the issue).
So your scenario is possible; I had similar expectations a few years ago. But I'm seeing more and more evidence against it, so I'm giving it a lower probability (maybe 20%).
I'm responding to the claim that training scaling laws "have ended", even as the question of "the bubble" might be relevant context. The claim isn't very specific, and useful ways of making it specific seem to make it false, either in itself or in the implication that the observations so far have something to say in support of the claim.
The scaling laws don't depend on how much compute we'll be throwing at training or when, they predict how perplexity depends on the amount of compute. For scaling laws in this sense to become false, we'd need to show that perplexity starts depending on compute in some different way (with more compute). Not having enough compute doesn't disprove that the scaling laws are OK. Even not having enough data doesn't disprove this.
For practical purposes, scaling laws could be said to fail once they can no longer be exploited for making models better. As I outlined, there's going to be significantly more compute soon (this is still the case with "a bubble", which might have the power to get compute as much as 3x lower than the more optimistic 200x-400x projection for models by 2031, compared to the currently deployed models). The text data is plausibly in some trouble even for training with 2026 compute, and likely in a lot of trouble for training with 2028-2030 compute. But this hasn't happened yet, so the claim of scaling laws "having ended", past tense, would still be false in this sense. Instead, there would be a prediction that the scaling laws would in some practical sense end in a few years, before compute stops scaling even at pre-AGI funding levels. But also, the data efficiency I'm using for predicting that text data will be insufficient (even with repetition) is a product of the public pre-LLM-secrecy research that almost always took unlimited data for granted, so it's possible that spending a few years explicitly searching for ways to overcome data scarcity will let AI companies find a way to sidestep this issue, at least until 2030. Thus I wouldn't even predict that text data will run out by 2030 with a high degree of certainty, it's merely my baseline expectation.
It's still premised on the idea that more training/inference/ressources will result in qualitative improvements.
I said nothing about qualitative improvements. Sufficiently good inference hardware makes it cheap to make models a lot bigger, so if there is some visible benefit at all, this will be happening at the pace of the buildouts of better inference hardware. But also conversely, if there's not enough inference hardware, you physically can't serve something as a frontier model (for a large user base) even if that offers qualitative improvements, unless you restrict demand (with very high prices or rate limits).
So your scenario is possible; I had similar expectations a few years ago. But I'm seeing more and more evidence against it, so I'm giving it a lower probability (maybe 20%).
This is not very specific, similarly to the claim about training scaling laws "having ended". Even with "a bubble" (that bursts before 2031), some AI companies (like Google) might survive in an OK shape. These companies will also have their pick of the wreckage of the other AI companies, including both researchers and the almost-ready datacenter sites, which they can use to make their own efforts stronger. The range of scenarios I outlined only needs 2-4 GW of training compute by 2030 for at least one AI company (in addition to 2-4 GW of inference compute), which revenues of $40-80bn should be sufficient to cover (especially as the quality of inference hardware stops being a bottleneck, so that even older hardware will again become useful for serving current frontier models). Google has been spending this kind of money on datacenter capex as a matter of course for many years now.
OpenAI is projecting about $20bn of revenue in their current state, when the 800M+ free users are not being monetized (which is likely to change). These numbers can plausibly grow to at least give $50bn per year to the leading model company by 2030 (even if it's not OpenAI), this seems like a very conservative estimate. It doesn't depend on qualitative improvement in LLMs or promises for more than a trillion dollars in datacenter capex. Also, the capex numbers might even scale down gracefully if $50bn per year from one company by 2030 turns out to be all that's actually available.
No large surge in new products and software.
Both OpenAI's and Anthropic's revenue has increased massively in one year: roughly 3½-fold for OpenAI and 9-fold for Anthropic. I agree, those are not (largely) new products or software — but they're pretty astonishing revenue growth rates, and a pretty large chunk of these revenues are driven by coding usage.
More generally, if AGI-from-LLMs in 3–5-years does actually happen (which is definitely at the short end of my personal timelines, but roughly what the frontier labs appear to be betting on judging from their actions rather than their investor-facing rhetoric), that doesn't predict most of the sorts of things you're making a bullet list of until near the end of those 3–5 years. While LLM capabilities are still subhuman in most respects, their economic impact will be limited.
As you say, one area where they are already starting to be genuinely useful is some more routine forms of coding. A leading indicator I think you should be looking at is that, according to Google, they're recently reached "50% of code by character count was generated by LLMs". Since Google haven't massively cut their headcount, that suggests they're now producing code at roughly twice the rate as a few years ago (at least by character count). That's not a "large surge in new products and software" yet — but it might show up as a noticeable acceleration in Google product releases next year. Some other areas where we're already seeing signs of usefulness are legal research and routine customer service.
In general, something growing via an exponential or logistic-curve process looks small until shortly before it isn't — and that's even more true when it's competing with an established alternative.
Now, to be clear, my personal median timeline for AGI is something like 10 years, most likely from LLMs+other things bolted on top — which gives plenty of time for a trough of disillusionment from those who were expecting (or were sold) 3–5 or even 2 years. I would also be not-very-surprised by 5 years, or by 20. IMO, there are several remaining hard-looking problems (e.g. continual learning, long-term planning, long-term credit assignment, alignment, reliability/accuracy, good priors, maybe causal world models), some of which don't look obviously amenable to simple scaling, but might turn out to be, or might be amenable to scaling plus a whole lot of research and engineering, or one-or-two might actually need a whole additional paradigm.
In simple economic terms, other than Tesla, the other six of the "magnificent seven" have not (so far) reached the Price/Earnings levels characteristic of bubbles just before they burst — they look more typical of those for a historically-fast-growing company. In past bubbles, initial voices warning that it was a bubble generally predated the actual bursting by a couple of years. So my economic opinion is that we're not in a ready-to-burst bubble YET. But most significant technological revolutions (e.g. railways, the internet) did produce a bubble at some point.
Both OpenAI's and Anthropic's revenue has increased massively in one year: roughly 3½-fold for OpenAI and 9-fold for Anthropic.
Their product is in demand, they lose money on each customer, so they take in a lot of money to grow their customer base and lose more money.
They need to transition to making money. To do so they need something like network effects (social media, Uber/Lyft to some extent), returns to scale, or some massive first mover advantage. I don't see that yet.
As you say, one area where they are already starting to be genuinely useful is some more routine forms of coding. A leading indicator I think you should be looking at is that, according to Google, they're recently reached "50% of code by character count was generated by LLMs".
That's less than I was expecting. And my personal experience of coding with LLMs (and speaking with others who do) is that it takes a lot of work to make it function - the LLM will write most of the code, but it's often a long process from there to a working program - and a much longer process to a working, interpretable program. And much longer to get a working program that fits well into a codebase.
When you code with LLMs, it feels like you're really productive, because you're always doing stuff - but often it actually slows you down. https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
Now, I feel that coding algorithms are better than they were in that study, especially for routine tasks.
So my median expectation is that moving 50% of coding might increase google productivity by 10%. But 25% or -5% are also possible.
In general, something growing via an exponential or logistic-curve process looks small until shortly before it isn't — and that's even more true when it's competing with an established alternative.
Shipping finished code is a process involving a lot of steps, only some of which are automated. So (Amdahl's Law) the time to finished coding will be determined by those parts of the process that aren't easily automated. If time to write code falls to zero but time to review code stays the same or even increases, then we'll only get a mild speedup.
The other problem is that logistic curves close to their inflection point, logistic curves way before their inflection points, and true exponentials - all look the same (see our paper https://arxiv.org/abs/2109.08065 ). Ok, we might be on the verge of great LLM-based improvements - but these have been promised for a few years now. And (this is entirely my personal feeling) they feel further away now than they did in the GPT 3.5 era.
In simple economic terms, other than Tesla, the other six of the "magnificent seven" have not (so far) reached the Price/Earnings levels characteristic of bubbles just before they burst — they look more typical of those for a historically-fast-growing company.
The magnificent seven have strong non-AI income streams. I expect them to survive a bubble burst. If OpenAI had stocks, their P/E ratio would be... interesting. Well, actually, it would be quite boring, because it would be negative.
Generally, I agree.
One viewpoint that I haven't seen used much to look at foundation lab economics is ROI: they spend a ton of money training a model (including compute costs and researcher costs), they then deploy it. After allowing for inference and other costs of serving it, does the revenue they make on serving that model (before it becomes obsolete) pay for its training costs (plus interest), or not? (Another way to look at this is that a newly trained SOTA model is a form of – rapidly depreciating – capital.) I.e. would they be making a profit or a loss steady-state, if it weren't the case that the next model is far more expensive to train? I think this is actually a fairly reasonable economic model (making movies is rather similar). Note that it there's a built in improvement if progress in AI slows — models then stay SOTA for longer after you trained them, thus depreciate slower, so (as long as you are charging more than their inference serving cost) can make more money; and presumably the speed of increase of model training costs drops, so your actual balance sheet profit and loss get closer to the ROI analysis.
FWIW, I asked Claude Opus 4.5 in research mode to attempt to do this per-model-ROI analysis for OpenAI, and then for Anthropic, from what public materials it could locate, and it seemed to think that even in this framework OpenAI's ROI is deeply negative: primarily because a) training run investment includes not only the final successful run but also failed runs (the same issue as in the numbers DeepSeek released) b) revenue earnings are depressed by competition so are not much above serving costs, and c) model depreciation cycles are viciously short, generally less than 6 months.
So, even on a per-model ROI basis, OpenAI are still in a "burning VC money to gain market share and intellectual capital" mode.
Of Anthropic, it seemed to think their per-model ROI was also still negative, but less so for a variety of reasons (fewer failed training runs, slower model obscolescence), and was improving. It found their predictions of profitability by 2028 plausible. (I didn't ask it whether it might be biased.)
However, in an AI slowdown, factor c) automatically improves, and there are fairly obvious levers OpenAI could pull to improve a) and b) — some of which apparently Anthropic are already pulling.
For both companies, it mentioned that users in their highest individual subscription tiers often have usage so high that they lose them money. So I expect we'll eventually see tighter usage caps and even higher subscription tiers.
We might be in a generative AI bubble. There are many potential signs of this around:
If LLMs were truly on the path to AGI[1], I would be expecting the opposite of many of these - opportunities for LLM usage opening up all over the place, huge disruption in the job markets at the same time as completely novel products enter the economy and change its rate of growth. And I would expect the needed compute investments to be declining due to large efficiency gains, with LLM errors being subtle and beyond the ability of humans to understand.
Thus the world does not look like one where LLM-to-AGI is imminent, and looks a lot more like one where generative AI keep on hitting bottleneck after bottleneck - when, precisely, will the LLMs stop hallucinating? When will image composition work reliably[2]?
Remember when GPT 3.5 came out? It did feel that we were on the cusp of a something explosive, with countless opportunities being enthusiastically seized and companies promising transformations in all kinds of domains.
But that didn’t happen. Generative AI has a lot of uses and many good possibilities. But in terms of R&D progress, it now feels like an era of repeated bottlenecks slowly and painfully overcome. LLMs are maturing as a technology, but their cutting-edge performance is improving only slowly - outside of coding, which is showing some definite upswing.
A bubble wouldn't mean that generative AI is useless. It might even be transformative and a huge boost to the economy. It just means that the generative AI companies cannot monetise it to the level required to justify the huge investments being made.
And the investments being made are huge. See arguments like "Big Tech Needs $2 Trillion In AI Revenue By 2030 or They Wasted Their Capex" (if you want a well researched skeptical take on the economics of LLMs, the whole of Ed Zitron blog is a good source - stick to information, not his opinions, and be warned that he is extremely uncharitable towards AI safety).
There are many reasons why generative AI companies might fail at monetising. Since the end of (very weak) training scaling laws, we've been in an "inference" scaling situation, buying and building huge data centers. But that isn't enough for a moat - they need economies of scale, not just a large collection of expensive GPUs.
Because open source models are a few months, maybe a year, behind the top models. If the top LLM companies really become profitable, it will be worth it for others to buy up a small bunch of GPUs, design a nice front end, and run DeepSeek or a similar model cheaply. Unless they can clearly differentiate themselves, this puts a lower bound on what the top companies can charge.
So it's perfectly possible that generative AI is completely transformational and that we are still in an AI bubble, because LLM companies can't figure out how to capture that value.
If LLMs were a quick path to AGIs, then we'd certainly not be in a bubble. So, if we are in a bubble, they're not AGIs, nor the path to AGIs, nor probably the road to the avenue to the lane to the path to AGIs.
And the big companies like OpenAI and Anthropic, that have been pushing the LLM-to-AGI narrative, will take a huge reputational hit. OpenAI especially has been using the "risk" of AGI as a way to generate excitement and pump up their valuation. A technology so dangerous it could end the world - think of what it could do to your stock values!
And if the bubble bursts, talk of AGI and AGI risk will be seen as puffery, as tools of bullshit artists or naive dupes. It will be difficult to get people to take those ideas seriously.
There will be some positives. The biggest positive is that LLMs would not be proto-AGIs: hence there will be more time to prepare for AGI. Another positive is that LLMs may be available for alignment purposes (I'll present one possible approach in a subsequent paper.
Some of these things are things we should probably be doing anyway; others are conditional on generative AI being a bubble. The list is non-exhaustive and intended to start discussion:
In a subsequent post, I'll discuss how we might improve our AGI predictions - almost any advance in computer science could lead to AGI via recursive self-improvement, but can we identify those that are genuinely likely to do so?
I've had very painful experiences trying to use these tools to generate any image that is a bit unusual. I've used the phrase "Gen AIs still can't count" many a time.
Ed will be the kind of person who will be seen as having "been right all along" if there is an AI bubble.
It's paywalled, but he talks about the AI 2027 paper, concluding:
[...] Everything is entirely theoretical, taped together with charts that have lines that go up and serious, scary language that, when boiled down, mostly means "then the AI became really good at stuff."
I fucking hate the people that wrote this. I think they are craven grifters writing to cause intentional harm, and should have been mocked and shunned rather than given news articles or humoured in any way.
And in many ways they tell the true story of the AI boom — an era that stopped being about what science and technology could actually do, focusing instead on marketing bullshit and endless growth.
This isn't a "scenario for the future." It's propaganda built to scare you and make you believe that OpenAI and Large Language Models are capable of doing impossible things.
It's also a powerful representation of the nebulous title of "AI researcher," which can mean everything from "gifted statistician" to "failed philosophy PHD that hung around with people who can actually write software.
Note that, in general, the quality of his arguments and research is much higher than this vitriol would suggest.