Thanks for these detailed comments! I'll aim to respond to some of the meat of your post within a few days latest, but real quick regarding the top portion:
I find the decision to brand the forecast as "AI 2027" very odd. The authors do not in fact believe this; they explicitly give 2028, 2030, or 2033 for their median dates for a superhuman coder.
The point of this project was presumably to warn about a possible outcome; by the authors' own beliefs, their warning will be falsified immediately before it is needed.
Adding some more context: each of the timelines forecasts authors' modal superhuman coder year is roughly 2027. The FutureSearch forecasters who have a 2033 median aren't authors on the scenario itself (but neither is Nikola with the 2028 median). Of the AI 2027 authors, all have a modal year of roughly 2027 and give at least ~20% to getting it by 2027. Daniel, the lead author, has a median of early 2028.
IMO it seems reasonable to portray 2027 as the arrival year of superhuman coders, given the above. It's not clear whether the median or modal year is better here, conditional on having substantial probability by the modal year (i.e. each of us has >=20% by 2027, Daniel has nearly 50%).
To be transparent though, we originally had it at 2027 because that was Daniel's median year when we started the project. We decided against changing it when he lengthened his median because (a) it would have been a bunch of work and we'd already spent over a year on the project and (b) as I said above, it seemed roughly as justified as 2028 anyway from an epistemic perspective.
Overall though I sympathize with the concern that we will lose a bunch of credibility if we don't get superhuman coders by 2027. Seems plausible that we should have lengthened story despite the reasoning above.
When presenting predictions, forecasters always face tradeoffs regarding how much confidence to present. Confident, precise forecasting attracts attempts and motivates action; adding many concrete details produces a compelling story, stimulating discussion; this also involves falsifiable predictions. Emphasizing uncertainty avoids losing credibility when some parts of story inevitably fail; prevents overconfidence; and encourages more robust strategies that can work across a range of outcomes. But I can't think of any reason to give a confident, high precision story that you don't even believe in!
I'd be curious to hear more about what made you perceive our scenario as confident. We included caveats signaling uncertainty in a bunch of places, for example in "Why is it valuable?" and several expendables and footnotes. Interestingly, this popular YouTuber made a quip that it seemed like we were adding tons of caveats everywhere,
I'd be curious to hear more about what made you perceive our scenario as confident. We included caveats signaling uncertainty in a bunch of places, for example in "Why is it valuable?" and several expendables and footnotes. Interestingly, this popular YouTuber made a quip that it seemed like we were adding tons of caveats everywhere,
I was imprecise (ha ha) with my terminology here- I should have only talked about a precise forecast rather than a confident one, I meant solely the attempt to highlight a single story about a single year. My bad. Edited the post.
I can't think of any reason to give a confident, high precision story that you don't even believe in!
Datapoints generalize, a high precision story holds gears that can be reused in other hypotheticals. I'm not sure what you mean by the story being presented as "confident" (in some sense it's always wrong to say that a point prediction is "confident" rather than zero probability, even if it's the mode of a distribution, the most probable point). But in any case I think giving high precision stories is a good methodology for communicating a framing, pointing out which considerations seem to be more important in thinking about possibilities, and also which events (that happen to occur in the story) seem more plausible than their alternatives.
Responses to some of your points:
There is no particular reason to endorse the particular set of gaps chosen. The most prominent gap that I've seen discussed, the ability of LLMs to come up with new ideas or paradigms, wasn't included.
This skill doesn't seem that necessary for superhuman coding, but separately I think that AIs can already do this to some extent and it's unclear that it will lag much behind other skills.
"benchmarks-and-gaps" has historically proven to be an unreliable way of forecasting AI development. The problem is that human intuitions about what capabilities are required for specific tasks aren't very good, and so more "gaps" are discovered once the original gaps have been passed.
I think with previous benchmarks it was generally clearer that solving them would be nowhere near what is needed for superhuman coding or AGI. But I agree that we should notice similar skulls with e.g. solving chess being considered AGI-complete.
"AI 2027" uses an implausible forecast of compute/algorithm improvement past 2028. It assumes that each continues exponential progress, but at half the rate (so 2.35x compute/year, and 1.5x algorithmic improvement/year).
Seems plausible, I implemented these as quick guesses, though this wouldn't effect the mode or median forecasts much. I agree that we should have a long tail due to considerations like this, e.g. my 90th percentile is >2050.
If current growth rates can deliver superhuman coding capabilities by 2027, we might actually see it happen. However, if those same capabilities would require until 2028, then on some plausible financial models we wouldn't see AGI until the mid-2030's or later.
I'm very skeptical that 2028 with current growth rates would be pushed all the way back to mid-2030s and that the cliff will be so steep. My intuitions are more continuous here. If AGI is close in 2027 I think that will mean increased revenue and continued investment, even if the rate slows down some.
My intuitions are more continuous here. If AGI is close in 2027 I think that will mean increased revenue and continued investment
Gotcha, I disagree. Lemme zoom on this part of my reasoning, to explain why I think profitability matters (and growth matters less):
(1) Investors always only terminally value profit; they never terminally value growth. Most of the economy doesn't focus much on growth compared to profitability, even instrumentally. However, one group of investors, VC's, do: software companies generally have high fixed costs and low marginal costs, so sufficient growth will almost always make them profitable. But (a) VC's have never invested anywhere even close to the sums we're talking about, and (b) even if they had, OpenAI continuing to lose money will eventually make them skeptical.
(For normal companies: if they aren't profitable, they run out of money and die. Any R&D spending needs to come out of their profits.)
(2) Another way of phrasing point 1: I very much doubt if OpenAI's investors actually believe in AGI- Satya Nadella explicitly doesn't, others seem to use it as an empty slogan. What they believe in is getting a return on their money. So I believe that OpenAI making profits would lead to investment, but that OpenAI nearing AGI without profits won't trigger more investment.
(3) Even if VC's were to continue investment, the absolute numbers are nearly impossible. OpenAI's forecasted 2028 R&D budget is 183 billion; that exceeds the total global VC funding for enterprise software in 2024, which was 155 billion. This would be going to purchase a fraction of a company which would be tens of billions in debt, which had burned through 60 billion in equity already, and which had never turned a profit. (OpenAI needing to raise more money also probably means that xAI and Anthropic have run out of money, since they've raised less so far.)
In practice OpenAI won't even be able to raise its current amount of money ever again: (a) it's now piling on debt and burning through more equity, and is at a higher valuation; (b) recent OpenAI investor Masayoshi Son's SoftBank is famously bad at evaluating business models (they invested in WeWork) and is uniquely high-spending- but is now essentially out of money to invest.
So my expectation is that OpenAI cannot raise exponentially more money without turning a profit, which it cannot do.
OpenAI continuing to lose money
They are losing money only if you include all the R&D (where the unusual thing is very expensive training compute for experiments), which is only important while capabilities keep improving. If/when the capabilities stop improving quickly, somewhat cutting research spending won't affect their standing in the market that much. And also after revenue grows some more, essential research (in the slow capability growth mode) will consume a smaller fraction. So it doesn't seem like they are centrally "losing money", the plausible scenarios still end in profitability (where they don't end the world) if they don't lose the market for normal reasons like failing on products or company culture.
OpenAI cannot raise exponentially more money without turning a profit, which it cannot do
This does seem plausible in some no-slowdown worlds (where they ~can't reduce R&D spending in order to start turning profit), if in fact more investors don't turn up there. On the other hand, if every AI company is forced to reduce R&D spending because they can't raise money to cover it, then they won't be outcompeted by a company that keeps R&D spending flowing, because such a competitor won't exist.
I want to clarify that I'm criticizing "AI 2027"'s projection of R&D spending, i.e. this table. If companies cut R&D spending, that falsifies the "AI 2027" forecast.
In particular, the comment I'm replying to proposed that while the current money would run out in ~2027, companies could raise more to continue expanding R&D spending. Raising money for 2028 R&D would need to occur in 2027; and it would need to occur on the basis of financial statements of at least a quarter before the raise. So in this scenario, they need to slash R&D spending in 2027- something the "AI 2027" authors definitely don't anticipate.
Furthermore, your claim that "they are losing money only if you include all the R&D" may be false. We lack sufficient breakdown of OpenAI's budget to be certain. My estimate from the post was that most AI companies have 75% cost of revenue; OpenAI specifically has a 20% revenue sharing agreement with Microsoft; and the remaining 5% needs to cover General and Administrative expenses. Depending on the exact percentage of salary and G&A expenses caused by R&D, it's plausible that OpenAI eliminating R&D entirely wouldn't make it profitable today. And in the future OpenAI will also need to pay interest on tens of billions in debt.
I see what you mean (I did mostly change the topic to the slowdown hypothetical). There is another strange thing about AI companies, I think giving ~50% in cost of inference too much precision in the foreseeable future is wrong, as it's highly uncertain and malleable in a way that's hard for even the company itself to anticipate.
About ~2x difference in inference cost (or size of a model) can be merely hard to notice when nothing substantial changes in the training recipe (and training cost), and better post-training (which is relatively cheap) can get that kind of advantage or more, but not reliably. Pretraining knowledge distillation might get another ~1.5x at the cost of training a larger teacher model (plausibly GPT-4.1 has this because of the base model for GPT-4.5, but GPT-4o doesn't). And there are all the other compute multipliers that become less fake if the scale stops advancing. The company itself won't be able to plan with any degree of certainty how good its near future models will be relative to their cost, or how much its competitors will be able to cut prices. So the current state of cost of inference doesn't seem like a good anchor for where it might settle in the slowdown timelines.
Thanks for explaining. I now agree that the current cost of inference isn't a very good anchor for future costs in slowdown timelines.
I'm uncertain, but I still think OpenAI is likely to go bankrupt in slowdown timelines. Here are some related thoughts:
Control over many datacenters is useful for coordinating a large training run, but otherwise it doesn't mean you have to find a use for all of that compute all the time, since you could lease/sublease some for use by others (which at the level of datacenter buildings is probably not overly difficult technically, you don't need to suddenly become a cloud provider yourself).
So the quesion is more about the global AI compute buildout not finding enough demand to pay for itself, rather than what happens with companies that build the datacenters or create the models, and whether these are the same companies. It's not useful to let datacenters stay idle, even if that perfectly extends hardware's lifespan (which seems to be several years), since progress in hardware means the time of current GPUs will be much less valuable in several years, plausibly 5x-10x less valuable. And TCO over a datacenter's lifetime is only 10-20% higher than the initial capex. So in a slowdown timeline prices of GPU-time can drop all the way to maybe 20-30% of what they would need to be to pay for the initial capex, before the datacenters start going idle. This proportionally reduces cost of inference (and also of training).
Project Stargate is planning on spending 100 billion at first, 50 billion of which would be debt.
The Abilene site in 2026 only costs $22-35bn, and they've raised a similar amount for it recently, so the $100bn figure remains about as nebulous as the $500bn figure. For inference (where exclusive use of a giant training system in a single location is not necessary) they might keep using Azure, so there is probably no pressing need to build even more for now.
Though I think there's unlikely to be an AI slowdown until at least late 2026, and they'll need to plan to build more in 2027-2028, raising money for it in 2026, so it's likely they'll get to try to secure those $100bn even in the timeline where there'll be an AI slowdown soon after.
You seem to be assuming that there's not significant overhead or delays from negotiating leases, entering bankruptcy, or dealing with specialized hardware, which is very plausibly false.
If nobody is buying new datacenter GPU's, that will cut GPU progress to ~zero or negative (because production is halted and implicit knowledge is lost). (It will also probably damage broader semiconductor progress.)
This proportionally reduces cost of inference (and also of training).
This reduces the cost to rent a GPU-hour, but it doesn't reduce the cost to the owner. (OpenAI, and every frontier lab but Anthropic, will own much or all[1] of their own compute. So this doesn't do much to help OpenAI in particular.)
I think you have a misconception about accounting. GPU depreciation is considered on an income statement, it is part of the operating expenses, subtracted from gross profit to get net profit. Depreciation due to obsolescence vs. breakdowns isn't treated differently. If OpenAI drops its prices below the level needed to pay for that depreciation, they won't be running a (net) profit. Since they won't be buying new GPU's, they will die in a few years, once their existing stock of GPU's breaks down or becomes obsolete. To phrase it another way, if you reduce GPU-time prices 3-5x, the global AI compute buildout has not in fact paid for itself.
OpenAI has deals with CoreWeave and Azure; they may specify fixed prices; even if not, CoreWeave's independence doesn't matter here, as they also need to make enough money to buy new GPU's/repay debt. (Azure is less predictable.)
The point of the first two paragraphs was to establish relevance and an estimate for the lowest market price of compute in case of a significant AI slowdown, a level at which some datacenters will still prefer to sell GPU-time rather than stay idle (some owners of datacenters will manage to avoid bankruptcy and will be selling GPU-time even with no hope of recouping capex, as long as it remains at an opex profit, assuming nobody will be willing to buy out their second hand hardware either). So it's not directly about OpenAI's datacenter situation, rather it's a context in which OpenAI might find itself, which is with access to a lot of cheap compute from others.
I'm using "cost of inference" in a narrow sense of cost of running a model at a market price of the necessary compute, with no implications about costs of unfortunate steps taken in pursuit of securing inference capacity, such as buying too much hardware directly. In case of an AI slowdown, I'm assuming that inference compute will remain abundant, so securing the necessary capacity won't be difficult.
I'm guessing one reason Stargate is an entity separate from OpenAI is to have an option to walk away from it if future finances of OpenAI can't sustain the hardware Stargate is building, in which case OpenAI might need or want to find compute elsewhere, hence relevance of market prices of compute. Right now they are in for $18bn with Stargate specifically out of $30-40bn they've raised (depending on success of converting into a for-profit).
Thanks for the detailed comments! We really appreciate it. Regarding revenue, here's some thoughts:
"AI 2027" forecasts that OpenAI's revenue will reach 140 billion in 2027. This considerably exceeds even OpenAI's own forecast, which surpasses 125 billion in revenue in 2029. I believe that the AI 2027 forecast is implausible.[4]
AI 2027 is not a median forecast but a modal forecast, so a plausible story for the faster side of the capability progression expected by the team. If you condition on the capability progression in the scenario, I actually think $140B in 2027 is potentially on the conservative side. My favourite parts of the FutureSearch report is the examples from the ~$100B/year reference class, e.g., 'Microsoft’s Productivity and Business Process segment.' If you take the AI's agentic capabilities and reliability from the scenario seriously, I think it feels intuitively easy to imagine how a similar scale business booms relatively quickly, and i'm glad that FutureSearch was able to give a breakdown as an example of how that could look.
So maybe I should just ask whether you are conditioning on the capabilities progression or not with this disagreement? Do you think $140b in 2027 is implausible even if you condition on the AI 2027 capability progression?
If you just think $140B in 2027 is not a good unconditional median forecast all things considered, then I think we all agree!
Note: "AI 2027" chooses to call the leading lab "OpenBrain", but FutureSearch is explicit that they're talking about OpenAI.
We aren't forecasting OpenAI revenue but OpenBrain revenue which is different because its ~MAX(OpenAI, Anthropic, GDM (AI-only), xAI, etc.).[1] In some places FutureSearch indeed seems to have given the 'plausible $100B ARR breakdown' under the assumption that OpenAI is the leading company in 2027, but that doesn't mean the two are supposed to be equal neither in their own revenue forecast nor in any of the AI 2027 work.
- FutureSearch's estimate of paid subscribers for April 2025 was 27 million; the actual figure is 20 million. They justify high expected consumer growth with reference to the month-on-month increase in unpaid users from December 2024 -> February 2025. Data from Semrush replicates that increase, but also shows that traffic has since declined rather than continuing to increase.
The exact breakdown FutureSearch use seems relatively unimportant compared to the high level argument that the headline (1) $/month and (2) no. of subscribers, very plausibly reaches the $100B ARR range, given the expected quality of agents that they will be able to offer.
- Looking at market size estimates, FutureSearch seems to implicitly assume that OpenAI will achieve a near-monopoly on Agents, the same way they have for Consumer subscriptions. Enterprise sales are significantly different from consumer signups, and OpenAI doesn't currently have a significant technical advantage.
I don't think a monopoly is necessary, there's a significant OpenBrain lead-time in the scenario, and I think it seems plausible that OpenBrain would convert that into a significant market share.
Not exactly equal since maybe the leading company in AI capabilities (measured by AI R&D prog. multiplier), i.e., OpenBrain, is not the one making the most revenue.
Thanks for the response!
So maybe I should just ask whether you are conditioning on the capabilities progression or not with this disagreement? Do you think $140b in 2027 is implausible even if you condition on the AI 2027 capability progression?
I am conditioning on the capabilities progression.
Based on your later comments, I think you are expecting a much faster/stronger/more direct translation of capabilities into revenue than I am- such that conditioning on faster progress makes more of a difference.
The exact breakdown FutureSearch use seems relatively unimportant compared to the high level argument that the headline (1) $/month and (2) no. of subscribers, very plausibly reaches the $100B ARR range, given the expected quality of agents that they will be able to offer.
Sure, I disagree with that too. I recognize that most of the growth comes from the Agents category rather than the Consumer category, but overstating growth in the only period we can evaluate is evidence that the model or intuition will also overstate growth of other types in other periods.
I don't think a monopoly is necessary, there's a significant OpenBrain lead-time in the scenario, and I think it seems plausible that OpenBrain would convert that into a significant market share.
OpenBrain doesn't actually have a significant lead time by the standards of the "normal" economy. The assumed lead time is "3-9 months"; both from my very limited personal experience (involved very tangentially in 2 such sales attempts) and from checking online, enterprise sales in the 6+ digits range often take longer than that to close anyways.
I'm suspicious that both you and FutureSearch are trying to apply intuitions from free-to-use consumer-focused software companies to massive enterprise SAAS sales. (FutureSearch compares OpenAI with Google, Facebook, and TikTok.) Beyond the length of sales cycles, another difference is that enterprise software is infamously low quality; there are various purported causes, but relevant ones include various principal-agent problems: the people making decisions have trouble evaluating software, won't necessarily be directly using it themselves, and care more about things aside from technical quality: "Nobody ever got fired for buying IBM".
I find the decision to brand the forecast as "AI 2027" very odd. The authors do not in fact believe this; they explicitly give 2028, 2030, or 2033 for their median dates for a superhuman coder.
The point of this project was presumably to warn about a possible outcome; by the authors' own beliefs, their warning will be falsified immediately before it is needed.
When presenting predictions, forecasters always face tradeoffs regarding how much precision to present. Precise forecasting attracts attempts and motivates action; adding many concrete details produces a compelling story, stimulating discussion; this also involves falsifiable predictions. Emphasizing uncertainty avoids losing credibility when some parts of story inevitably fail; prevents overconfidence; and encourages more robust strategies that can work across a range of outcomes. But I can't think of any reason to only consider a single high precision story that you don't think is all that likely.
I think that the excessive precision is pretty important in this case: the current pace of AI R&D spending is unsustainable, so it matters exactly how much more progress is needed for superhuman coders.
***
I don't believe that METR's time horizons forecast is sufficiently strong evidence for a precise timeline:
Regarding task difficulty, METR writes that "If this is the case, we may be underestimating the pace of model improvement." They seem to be viewing this as models becoming capable of solving more difficult problems *in addition* to time horizons increasing exponentially. However, in their dataset increased difficulty of tasks is correlated with time horizons- see section B.1.1; so increased capabilities might contribute to increasing measured time horizons by allowing the completion of tasks which were too difficult but not too long to accomplish previously.
Two concrete examples: humans can read ~75 tokens per minute, so any task that GPT-2 could do that filled its entire 1024-token context length[1] would be a 15 minute task; meanwhile, a leap of mathematical intuition takes a human only a few seconds, but is arguably still impossible for LLM's. Adding many tasks like these would make the calculated task length improvement slower.
***
I don't believe that the "benchmarks-and-gaps" model is the correct way to forecast future AI development:
I and others have consistently been surprised by progress on easy-to-evaluate, nicely factorable benchmark tasks, while seeing some corresponding real-world impact but less than I would have expected. Perhaps AIs will continue to get better on checkable tasks in substantial part by relying on trying a bunch of stuff and seeing what works, rather than general reasoning which applies to more vague tasks. And perhaps I’m underestimating the importance of work that is hard to even describe as “tasks”.
Narrowly, labs often are optimizing against benchmarks, meaning that non-benchmarked progress is slower than you would otherwise measure. More broadly, it's plausible that important tasks or knowledge *can't* be benchmarked, even with a benchmark that maxes out METR's messiness metric. James C. Scott uses the word "legibility" to describe the process of making it possible for outsiders to gain understanding: creating simplified, standardized categories; imposing uniform systems of measurement; and favoring formal, official knowledge over practical, local knowledge. I like that term in this context because it emphasizes that LLM's trained on text and passing objectively measured benchmarks will have trouble with some "illegible" types of tasks that humans can do: think intuitive leaps, context-dependent judgments, or embodied skills.[3]
***
I believe that the "AI 2027" scaling forecasts are implausible.
***
Projected single-company R&D spending, in billions:[6]
| 2024 | 2025 | 2026 | 2027 | 2028 | 2029 | 2030 |
| 4 | 10 | 27 | 70 | 183 | 311 | 528 |
Here are a few relevant points:
One way to summarize this section: any forecast of AGI arrival is *highly* sensitive to the exact point during current scaling at which AGI is achievable- because current scaling cannot continue for long.[12] If current growth rates can deliver superhuman coding capabilities by 2027, we might actually see it happen. However, if those same capabilities would require until 2028, then on some plausible financial models we wouldn't see AGI until the mid-2030's or later. It's therefore *extremely* unfortunate that (a) per the timelines discussion, there is no way to get much confidence in a forecast of the near future, and (b) the AI 2027 team's median timeline is longer that what they shared.
Note that from GPT-2 to today, neglecting Llama 4 Scout, the length of the context window did increase with a doubling time of 6.6 months- very similar to METR's claimed 7 month doubling time.
This is just repeating my first point.
This point is also an objection to using METR's time horizon forecasting, as it's also based on benchmarks.
Note: "AI 2027" chooses to call the leading lab "OpenBrain", but FutureSearch is explicit that they're talking about OpenAI.
These numbers are sourced from Epoch, but their estimates don't add up. 1.35x in computational performance, and 2.6x in training costs, should yield a 3.5x increase in training compute, not the calculated 4.7x. They are using two different datasets for the calculations, presumably one is more correct.
Taking 2024's 4 billion level from OpenAI's spending; multiplying by 2.6 until 2029, then by 1.7, i.e. using the "AI 2027" numbers. Note that actual spending will be spikier than this: some years will see higher capex spending to build datacenters, other years may see lower data center construction, and instead are amortizing that capex + spending on electricity and salaries.
I haven't even considered that General and Administrative spending might also scale with revenue.
Total spending of 10-40 billion on R&D projects is high but not unprecedented- although spending of 40 billion per year is completely unprecedented. (The Manhattan project cost ~35 billion in today's dollars, over ~3 years; the largest private sector R&D effort ever, the 787, cost ~45 billion in today's dollars, over ~8 years.) The current spending of ~4 billion per year is pretty normal for a major R&D project. It's not even the highest current spending relevant to AI- Nvidia is spending ~13 billion on R&D this year.
An *additional* problem here is that 2028 is approximately when data availability and data movement may become significant issues- this will slow down the future rate at which money buys effective compute, which will in turn dissuade people from funding further development.
There are, to be fair, also some ways in which this forecast is overly negative. An AI Winter would drive down the cost of GPU's; this is only helpful for companies which don't own their own datacenters, i.e. Anthropic. (And if all companies increasingly use their own GPU designs, they may have trouble quickly integrating different GPU's into their setup.) Also, part of companies' current inference spend is on free users: this can be eliminated entirely, at the cost of giving up on public mindshare/raising more money. Also, especially if other AI companies are going bankrupt, prices can be raised.
Even Google would need to either sell significant equity or take on significant debt to fund AI development until 2028- its current cash and profits aren't nearly enough. Note that this halt would probably co-occur with a recession- given that hundreds of billions of dollars of AI capex would have suddenly just vanished. AI companies would first cut new capex spending, and may try to sell their GPU's in this scenario; this also plausibly kills off Nvidia and halts semiconductor improvements.
It's also highly sensitive to exactly how long scaling can continue, of course. Scaling might plausibly continue to 2030 or beyond if national governments take over funding, for example.