AI Risk timelines: 10% chance (by year X) should be the headline (and deadline), not 50%. And 10% is _this year_!

Greg C

LESSWRONG
LW

AI Risk timelines: 10% chance (by year X) should be the headline (and deadline), not 50%. And 10% is _this year_! — LessWrong

61 AI Risk timelines: 10% chance (by year X) should be the headline (and deadline), not 50%. And 10% is _this year_!

by Greg C

5th Jan 2026

1 min read

61

Artificial General Intelligence (AGI) poses an extinction risk to all known biological life. Given the stakes involved -- the whole world -- we should be looking at 10% chance-of-AGI-by timelines as the deadline for catastrophe prevention (a global treaty banning superintelligent AI), rather than 50% (median) chance-of-AGI-by timelines, which seem to be the default^[1].

It’s way past crunch time already: 10% chance of AGI this year!^[2] AGI will be able to automate further AI development, leading to rapid recursive self-improvement to ASI (Artificial Superintelligence). Given alignment/control is not going to be solved in 2026, and if anyone builds it [ASI], everyone dies (or at the very least, the risk of doom is uncomfortably high by most estimates), a global Pause of AGI development is an urgent immediate priority. This is an emergency. Thinking that we have years to prevent catastrophe is gambling a huge amount of current human lives, let alone all future generations and animals.

To borrow from Stuart Russell's analogy: if there was a 10% chance of aliens landing this year^[3], humanity would be doing a lot more than we are currently doing^[4]. AGI is akin to an alien species more intelligent than us that is unlikely to share our values.

^{^}
This is an updated version of this post of mine from 2022.
^{^}
In the answer under “Why 80% Confidence?” on the linked page, it says “there's roughly a 10% chance AGI arrives before [emphasis mine] the lower bound”, so before 2027; i.e. 2026. See also: the task time horizon trends from METR. You might want to argue that 10% is actually next year (2027), based on other forecasts such as this one, but that only makes things slightly less urgent - we’re still in a crisis if we might only have 18 months.
^{^}
This is different to the original analogy, which was an email saying: "People of Earth: We will arrive on your planet in 50 years. Get ready." Say astronomers spotted something that looked like a spacecraft, heading in our direction, and estimated there was 10% chance that it was indeed an alien spacecraft.
^{^}
Although perhaps we wouldn't. Maybe people would endlessly argue about whether the evidence is strong enough to declare a 10%(+) probability. Or flatly deny it.

Personal Blog

61

AI Risk timelines: 10% chance (by year X) should be the headline (and deadline), not 50%. And 10% is _this year_!

5Greg C

4Bogdan Ionut Cirstea

11Greg C

6Bogdan Ionut Cirstea

New Comment

18 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:33 AM

[-]Greg C2mo53

One option, if you want to do a lot more about it than you currently are, is Pause House. Another is donating to PauseAI (US, Global). In my experience, being pro-active about the threat does help.

[-]Bogdan Ionut Cirstea2mo4-17

I doubt this would be the ideal moment for a pause, even assuming it were politically tractable, which it obviously isn't right now.
Very likely you'd want to pause after you've automated AI safety research, or at least strongly (e.g. 10x) accelerated at least prosaic AI safety research (none of which has happened yet) - given how small the current AI safety human workforce is, and how much more numerous (and very likely cheaper per equivalent hour of labor) an automated workforce would be.

[-]Greg C2mo116

What makes you confident that AI safety research will be automated before catastrophe is automated?

[-]Bogdan Ionut Cirstea2mo60

I don't think 'catastrophe' is the relevant scary endpoint; e.g., COVID was a catastrophe, but unlikely to have been x-risky. Something like a point-of-no-return (e.g. humanity getting disempowered) seems more relevant.

I'm pretty confident it's feasible to at the very least 10x AI safety prosaic research through AI augmentation without increasing x-risk by more than 1% yearly (and that would probably be a conservative upper bound). For some intuition - see the low levels of x-risk that current AIs pose, while already having software engineering 50%-time-horizons of around 4 hours, and while already getting IMO gold medals. Both of these skills (coding and math) seem among the most useful for strongly augmenting AI safety research, especially since LLMs already seem like they might be human-level at (ML) research ideation.

Also, AFAICT, there are so many low hanging fruit to make current AIs safer, some of which I'd suspect are barely being used at all (and even with this relative recklessness, current AIs are still surprisingly safe and aligned - to the point where I think Claudes are probably already more beneficial and more prosocial companions than the median human). Things like unlearning / filtering the most dangerous and most antisocial data, or like production evaluations, or like trying harder to preserve CoT legibility through rephrasing or other forms of regularization, or, more speculatively, trying to use various forms of brain data for alignment.

[-]Greg C2mo64

By catastrophe, I was thinking of something much worse than Covid; or indeed, x-risky. Point-of-no-return is a good stand-in. So: what makes you confident that AI safety research will be automated before a point-of-no-return for humanity is crossed?

I'm pretty confident it's feasible to at the very least 10x AI safety prosaic research through AI augmentation without increasing x-risk by more than 1% yearly

I'd agree that it's feasible - but is it at all likely? Surely that would require us to Pause at ~the current level (as you say: "LLMs already seem like they might be human-level at (ML) research ideation."). You aren't getting only a 1% increase in x-risk yearly on the current trajectory.

I think Claudes are probably already more beneficial and more prosocial companions than the median human

I think you (like many in the LW/EA/AIS community) might be on a slippery slope here to having your mind altered by AI use to the point of losing sight of the fact that these things are fundamentally alien underneath. (See also.)

[-]StanislavKrym2mo10

Except that Zvi covered this potential evidence for misalignment and I had this to add. As for the AIs being alien underneath due to training and architecture, I and Claude Opus 4.5 came up with both a case for it and a case against it.

[-]Greg C2mo10

I think your prompt to Claude is pretty leading^[1]. You are assuming the answer with "the AIs end up with motivations similar to those of the humans". The point is that we don't actually know what their underlying motivations are - we only see how they act when trained and system-prompted into mimicking humans. And no alignment techniques are even 3 9s reliable (and we need >13 9s in the limit of ASI).

Also "Can this crux be partially resolved by, say, studying the values of humans whose brain was developed abnormally" is not thinking at the right level of abstraction. Humans who's brains developed abnormally are still very close to normal humans in the grand scheme of mindspace. AIs share zero evolutionary history and development (evo-devo), and close to zero brain architecture with humans. Sharing our corpus of media is a very shallow and brittle substitute (i.e. it can make a half-decent mask for the shoggoth, but it doesn't do anything in the way of evolving the shoggoth into a digital human).

^{^}
not to mention that the fact that you are using Claude as a trusted source of information on this in the first place is problematic.

[-]StanislavKrym2mo10

What do you mean by automating catastrophe? Is it the creation of a misaligned AGI who has a chance to escape or to (create an ASI who will) fake alignment, be given the throne and commit genocide? Automating AI safety research would have us automate generating safety-related ideas, coding, gathering or creating data sets. But I don't think that I understand how automated coding alone will cause a catastrophe.

[-]Greg C2mo21

Yes, but it wouldn't be given the throne. It will take it (or rather, just obliterate it).

[-]Greg C2mo20

If you want to share this, especially to people outside the LW community, I've also posted it on X and Substack.

[-]alexgieg2mo1-2

Unfortunately, those in positions of power won't listen. From their perspective it's simply absurd to suggest that a system that currently directly causes, at most, a few dozen induced suicide deaths per year, may explode into death of all life. They have no instinctive, gut feeling for exponential growth, so it doesn't exist for them. And even if they acknowledge there's a risk, their practical reasoning moves more along arms-race lines:

"If we stop and don't develop AGI before our geopolitical enemies because we're afraid of a tiny risk of an extinction, they will develop it regardless, then one of two things happen: either global extinction, or our extinction in our enemies' hands. Which is why we must develop it first. If it goes well, we extinguish them before they have a chance to do it to us. If it goes bad, it'd have gone bad anyway in their or our hands, so that case doesn't matter."

Which is to say they won't care until they see thousands or millions of people dying due to rogue GAIs. Then, and only then, they'd start thinking in terms of maybe starting talks about perchance organizing an international meeting to perhaps agree on potential safeguards that might start being implemented after the proper committees are organized and the adequate personal selected to begin defining...

[-]Sancho Panza2mo21

"If we stop and don't develop AGI before our geopolitical enemies because we're afraid of a tiny risk of an extinction, they will develop it regardless, then one of two things happen: either global extinction, or our extinction in our enemies' hands. Which is why we must develop it first. If it goes well, we extinguish them before they have a chance to do it to us. If it goes bad, it'd have gone bad anyway in their or our hands, so that case doesn't matter."

This has become a common description of why AI companies and governments are moving quickly. In general, I agree with the description, but I specifically struggle with this portion if it:

“Which is why we must develop it first. If it goes well, we extinguish them before they have a chance to do it to us.”

I’m assuming that - and please correct me if I’m misinterpreting here - “extinguish” here means something along the lines of, “remove the ability to compete effectively for resources (e.g. customers or other planets)” not “literally annihilate”.

If I got that totally wrong, no need to read on.

If that’s roughly correct, well, so what? How does being “first” actually solve the misaligned AGI problem? “Global extinction” as you put it.

Being first doesn’t come with the benefit of forcing all subsequently created AGI to be aligned / safe. The government or corporation in second (third, fourth, etc.) place surely can and probably will continue to attempt to build an AGI. They’re probably even more likely to create one in a more reckless manner by trying to catch up as quickly as possible.

[-]alexgieg2mo90

I’m assuming that - and please correct me if I’m misinterpreting here - “extinguish” here means something along the lines of, “remove the ability to compete effectively for resources (e.g. customers or other planets)” not “literally annihilate”.

I wish that were the case, but my reference is imagining a paranoid M.A.D. mentality coupled with a Total War scenario unbounded by moral constraints, that is, all sides thinking all the other sides are X-risks to them.

In practice things tend not to get that bad most of the time, but sometimes they do, and much of military preparation concern mitigation of these perceived X-risks, the idea being that if "our side" becomes so powerful it can in fact annihilate the others, and in consequence the others submit without resisting, then "our side" may be magnanimous towards them conditional on their continued subservience and submission, but if they resist to the point of becoming an X-risk towards us, then removing them from the equation entirely is the safest defense from the X-risk they pose us.

A global consensus on stopping GAI development due to its X-risk for all life passes through a global consensus, by all sides, that none of the other sides is an X-risk to any of side. Once everyone agrees on this, then they all together agreeing to deal with a global X-risk becomes feasible. Before that, only if they all see that global X-risk as more urgent and immediate than the many local-to-them X-risks.

[-]Sancho Panza2mo*10

I see - thanks for the explanation.

The US had nuclear weapons before any other country. Other countries have these weapons now. The p-boom was quite high at some points but nobody was annihilated with this technology.

Admittedly, nuclear weapons are not a perfect analog for AI due to many reasons, but I think it is a reasonable analog.

With this in mind, I wanted to ask out of curiosity, what % risk do you think there needs to be for annihilation to occur?

[-]alexgieg2mo10

Admittedly, nuclear weapons are not a perfect analog for AI due to many reasons, but I think it is a reasonable analog.

We've had extreme luck when it comes to nuclear weapons. We not only had several close calls that were deescalated by particularly noble individuals doing the right thing, but also, back when the URSS had barely developed theirs and the US alone had a whole stockpile of warheads, we had the good luck of its leadership also being somewhat moral and refusing to turn nukes into a regular weapon, which was followed by MAD forcing everyone to kind of stay so even when the other side asked nicely whether they could bomb a third party. Weren't for that long sequence of good luck after good luck, and we'd now be living in an annihilated world, or at the very least a post-apocalyptic one.

With this in mind, I wanted to ask out of curiosity, what % risk do you think there needs to be for annihilation to occur?

I have no idea, really. All I can infer is that it's unlikely any major power will stop trying to achieve GAI unless:

a) Either a massively severe accident due to misaligned not-quite-GAI-yet happens that by its sheer, absolute horror puts the Fear-Of-God in our civilian and military leaders for a few generations;

b) Or a long sequence of reasonably severe accidents happens, each new one worse than the last, with AI companies repeatedly and consistently failing at fixing the underlying cause, this in turn making military leaders deeply wary of deploying advanced AI systems, and civilian leaders enacting restrictions on what AI is allowed to touch.

Absent either of those, I doubt the pursuit of GAI will stop no matter what X-risk analysts say. Or at least, I myself cannot imagine any kind of argument that'd convince, say, the CPC to stop their research when those on the other side spearheading theirs are massively powerful nutjobs? And then, what argument could be provided that'd stop someone who believes in this? So, neither will stop, which means GAI will happen. And then we'll need to count on luck again, this time with:

i) Either GAI going FOOM as Yudkowsky believes, but for some reason continuing to like humans enough not to turn us into computronium;

ii) Or Hanson being right and FOOM not happening, followed by:

ii.1) Either things being slow enough to "merely" lead to a or b, above;

ii.2) Or things being so immensely slow we can actually fix them.

I have no opinion on whether FOOM is or isn't likely. I've read the entire discussion and all I know is both sets of arguments sound reasonable to me.

[-]Greg C2mo10

Once they realise the risk of extinction isn't "tiny" (and we can all help, here), then the rational move is to not play, and prevent anyone else from playing.

[-]alexgieg2mo10

Unfortunately, they aren't rational. I developed this theme a little bit more in another reply, but to put it simply, in the US GAI is being pursued by insane individuals. No rational argument can stop someone who believes in that. And the other sides will try to protect themselves from these.

[-]Greg C1mo20

Ok, but Mutually Assured AI Malfunction (MAIM) is another possibility.

Moderation Log