Definition: Living Homo sapiens will live on the surface of the planet Earth exactly 100 years from the publication of this post. This is similar to the Caplan-Yudkowsky bet except I extend the duration from 13 to 100 years and remove the reference to the cosmic endowment.
Confidence: >80%. I'd like to put it higher but there's a lot of unknowns and I'm being conservative.
I'd be happy to bet money at the Caplan-Yudokowsky bet's 2-1 odds on a 13-year horizon. However, I don't think their clever solution actually works. The only way Yudkowsky benefits from his loan is if he spends all of his resources before 2030, but if he does that then Caplan won't get paid back, even if Caplan wins the bet. Morever, even Yudkowsky doesn't seem to treat this as a serious bet.
While there's no good way to bet that the world will end soon (except indirectly via financial derivatives), there is a way to bet that our basic financial infrastructure will continue to exist for decades to come: I am investing in a Roth IRA.
Reasoning
Biowarfare won't kill everyone. Nuclear war won't kill everyone. Anthropogenic global warming won't kill everyone. At worst, these will destroy civilization which, counterintuitively, makes Homo sapiens more likely to survive on the short term (century). The same goes for minor natural disasters like volcanic eruptions.
Natural disasters like giant meteors, or perhaps a gamma ray burst, are unlikely. The last time something like that happened was 66 million years ago. The odds of something similar happening in the next century are on the order of . That's small enough to ignore for the purposes of this bet. The only way everyone dies is via AI.
I see two realistic roads to superintelligent world optimizers.
- Human simulator mesa optimizer running on a non-agentic superintelligence.
- Inhuman world-optimizing agent.
Human simulators are unlikely to exterminate humanity by accident because the agent mesa optimizer is (more or less) human aligned and the underlying superintelligence (currently LLMs) is not a world optimizer. The underlying superintelligence won't kill everyone (intentionally or unintentionally), because it's not a world optimizer. The emulated human probably won't kill everyone by accident because that would require a massive power differential combined with malevolent intent. Another way humanity gets exterminated in this scenario is as the side effect of futuretech war, but since neither Yudkowsky nor Caplan consider this possibility likely I don't feel I need to explain why it's unlikely.
Inhuman world-optimizing agents are unlikely to turn the Universe into paperclips because that's not the most likely failure mode. A world-optimizing agents must align its world model with reality. Poorly-aligned world-optimizing agents instrumentally converge, not on siezing control of reality, but on the much easier task of siezing competing pieces of their own mental infrastructure. A misaligned world optimizer that seeks to minimize conflict between its sensory data and internal world model will just turn off its sensors.
These ideas are not very fleshed out. I'm posting, not to explain my logic, but to publicly register a prediction. If you disagree (or agree), then please register a counter prediction in the comments.
Strong AGI: Artificial intelligence strong enough to build nanotech, while being at least as general as humans (probably more general). This definition doesn't imply anything about the goals or values of such an AI, but being at least as general as humans does imply that it is an agent that can select actions, and also implies that it is at least as data-efficient as humans.
Humanity survives: At least one person who was alive before the AI was built is still alive 50 years later. Includes both humanity remaining biological and uploading, doesn't include everyone dying.
Alignment problem: The problem of picking out an AI design that won't kill everyone from the space of possible designs for Strong AGI.
Aligned by default: Maybe most of the space of possible designs for Strong AGI does in fact consist of AIs that won't kill everyone. If so, then "pick a design at random" is a sufficient strategy for solving the alignment problem.
An attempt at solving the alignment problem: Some group of people who believe that the alignment problem is hard (i.e. picking a design at random has a very low chance of working) try to solve it. The group doesn't have to be rationalists, or to call it the "alignment problem" though.
A successful attempt at solving the alignment problem: One of the groups in the above definition do in fact solve the alignment problem, i.e. they find a design for Strong AGI that won't kill everyone. Important note: If Strong AGI is aligned by default, then no attempts are considered successful, since the problem wasn't really a problem in the first place.