How large of a breakthrough is necessary for dangerous AI?

In order to cause a catastrophe, an AI system would need to be very competent at agentic tasks^[1]. The best metric of general agentic capabilities is METR’s time horizon. The time horizon measures the length of well-specified software tasks AI systems can do, and is grounded in human baselines, which means AI performance can be closely compared to human performance.

Causing a catastrophe^[2] is very difficult. It would likely take many decades, or even centuries, of skilled human labor. Let’s use one year of human labor as a lower bound on how difficult it is. This means that AI systems will need to at least have a time horizon of one work-year (2000 hours) in order to cause a catastrophe.

Current AIs have a time horizon of 2 hours, which means it’s 1000x lower than the time horizon necessary to cause a catastrophe. This presents a pretty large buffer.

Currently, the time horizon is doubling roughly every half-year. That means that a 1000x increase would take roughly 5 years at the current rate of progress. So, in order for AI to reach a time horizon of 1 work-year within the next 6 months, it would mean that the rate of AI progress would have to increase by 10x, leading to a doubling in time horizon roughly every two weeks instead of every 6 months.

AI breakthroughs of the recent past

It seems like a huge breakthrough is necessary to make 5 years of AI progress happen in less than 6 months. Has a breakthrough of this size ever occurred?

I can mainly think of two recent examples of AI breakthroughs that might be comparable in size to what’s needed to create dangerous AI in the short-term.

Case 1: Transformers

One is the invention of the transformer architecture. We don’t have time horizon estimates for any models before GPT-2, as even our easiest tasks are probably too difficult for earlier models. However, we can maybe get a sense of how many years of progress it is through other means.

Epoch AI estimates that transformers represented a compute efficiency gain of 10x-50x. According to Epoch AI, the historical rate of effective compute increases is around 3x per year, and algorithmic advances have accounted for roughly ⅓ of AI progress in LLMs since 2014. That means that transformers represent a 9-15 month jump in AI progress^[3]. This places it way below the 5-year threshold required to get current models to a dangerous time horizon.

Also, the algorithmic progress from transformers wasn’t instant. It took two years to go from the Attention is All You Need paper to large transformers like GPT-2 being released. So it’s likely that the invention of transformers is neither large enough, nor sudden enough, that a similar invention would be very dangerous if it happened tomorrow.

In fact, if we round the impact of transformers to a 1-year jump in AI progress, then we’d need five transformer-sized breakthroughs compressed in a 6-month timespan to reach a 1-year time horizon.

Case 2: AlphaFold

The other example is AlphaFold. While AlphaFold is not a general AI architecture, it’s still useful for establishing an upper bound of how crazy breakthroughs can get.

I haven’t seen a good analysis that tries to answer the question “How many years of protein-folding progress did AlphaFold represent?” But judging from this analysis, it seems like it’s at least 5 years, and maybe in the decades.

This means that AlphaFold is an existence proof for at least 5 years of narrow AI progress being made in a short period of time.

What is the probability of 1-year time horizons in the next 6 months?

Assuming transformer-sized breakthroughs happen every 10 years, and events are independent, then the probability of five such breakthroughs happening in the next 6 months is very, very small.

But AI breakthroughs aren’t independent events. They have the same inputs, and the occurrence of one breakthrough is an update towards the inputs being sufficiently high to produce additional breakthroughs quicker. Additionally, AI breakthroughs could feed into each other, as the “AI capabilities” output of AI R&D can feed back into the “intellectual labor” and “compute spend” inputs.

Narrowly superhuman AI leading to generally competent AI

So, if multiple transformer-like breakthroughs are unlikely to lead to AIs with 1-year time horizons, what about AlphaFold-like breakthroughs? If we think that there could be a narrow AlphaFold-sized breakthrough in coding, then it becomes plausible that AI R&D could be automated in the short-term.

For any verifiable domain that AI researchers are trying to “crack”, I’d guess that AlphaFold-sized breakthroughs are less than 5% likely per year. And programming seems harder to “crack” than protein folding, as evidenced by the fact that companies have been throwing lots of money at it for a few years with no AlphaFold-sized breakthroughs to show for it. So, adjusting slightly downward, the probability that AI R&D is automated in the next 6 months seems less than 3%.

But even if AI R&D was automated tomorrow, this wouldn’t guarantee that we’d reach a time horizon of 1 year in less than 6 months. It’s more likely that the speed of AI progress would slowly ramp up as people and AIs found better ways to distribute labor and resources between automated and human AI researchers. And training runs sometimes take a while, meaning the breakthrough might take more than 6 months to fully take effect.

Would we notice a massive capabilities increase?

It also seems more likely than not that if any AGI company was on track to create AIs which have dangerously high levels of general capabilities in the next 6 months, they would be able to tell that this is happening, at least at the start. They would see that one year of progress has happened before all five years of progress have happened, at least assuming that there aren’t large discontinuities between checkpoints in training.

If a 1000x increase in time horizon routes through AI R&D automation, then the AGI company would definitely at least notice that AI R&D has been automated. If it doesn’t route through AI R&D automation, it’s likely we’d notice a 10x increase before the 1000x increase. So in any case, the AGI company is likely to have some sense that “something big might be happening” if they’re heading towards a major improvement in capabilities.

However, it is plausible that after some level of capabilities, the AIs would figure out how to subvert oversight mechanisms and make it look like they’re less capable than they actually are. So if capabilities measurements during training are too sparse, or easy to subvert, the researchers might not notice a sudden jump that enables oversight being broadly undermined.

Conclusion

So, judging from the size of breakthroughs needed, and from the sizes of some recent AI breakthroughs, it seems very unlikely (<2%) that AI will reach a 1-year time horizon in the next 6 months. The main pathway I see is a sudden breakthrough in coding, which would lead to automated AI R&D, which would lead to a large number of transformer-sized breakthroughs in quick succession. Accounting for unknown unknowns, I’d increase my probability to around 3%. If this does happen, I think it’s more likely than not than the AGI companies in question would have some awareness that it’s happening, instead of it being a complete overnight surprise.

^{^}
I’m assuming away the possibility of a catastrophe caused by misuse of AI systems, like bad actors using AIs to create very potent biological weapons. I’ll only consider AI catastrophes caused by autonomous AIs.
^{^}
By “catastrophe”, I mean an event where 100 million humans die, or something even worse happens.
^{^}
Although I do feel kind of skeptical of this number. Surely transformers were a bigger deal than that? Without transformers I’d guess we’d be more than 1 year behind, but that’s probably a different operationalization than the one Epoch AI uses.

LESSWRONG
LW