Definition: Living Homo sapiens will live on the surface of the planet Earth exactly 100 years from the publication of this post. This is similar to the Caplan-Yudkowsky bet except I extend the duration from 13 to 100 years and remove the reference to the cosmic endowment.
Confidence: >80%. I'd like to put it higher but there's a lot of unknowns and I'm being conservative.
I'd be happy to bet money at the Caplan-Yudokowsky bet's 2-1 odds on a 13-year horizon. However, I don't think their clever solution actually works. The only way Yudkowsky benefits from his loan is if he spends all of his resources before 2030, but if he does that then Caplan won't get paid back, even if Caplan wins the bet. Morever, even Yudkowsky doesn't seem to treat this as a serious bet.
While there's no good way to bet that the world will end soon (except indirectly via financial derivatives), there is a way to bet that our basic financial infrastructure will continue to exist for decades to come: I am investing in a Roth IRA.
Reasoning
Biowarfare won't kill everyone. Nuclear war won't kill everyone. Anthropogenic global warming won't kill everyone. At worst, these will destroy civilization which, counterintuitively, makes Homo sapiens more likely to survive on the short term (century). The same goes for minor natural disasters like volcanic eruptions.
Natural disasters like giant meteors, or perhaps a gamma ray burst, are unlikely. The last time something like that happened was 66 million years ago. The odds of something similar happening in the next century are on the order of . That's small enough to ignore for the purposes of this bet. The only way everyone dies is via AI.
I see two realistic roads to superintelligent world optimizers.
- Human simulator mesa optimizer running on a non-agentic superintelligence.
- Inhuman world-optimizing agent.
Human simulators are unlikely to exterminate humanity by accident because the agent mesa optimizer is (more or less) human aligned and the underlying superintelligence (currently LLMs) is not a world optimizer. The underlying superintelligence won't kill everyone (intentionally or unintentionally), because it's not a world optimizer. The emulated human probably won't kill everyone by accident because that would require a massive power differential combined with malevolent intent. Another way humanity gets exterminated in this scenario is as the side effect of futuretech war, but since neither Yudkowsky nor Caplan consider this possibility likely I don't feel I need to explain why it's unlikely.
Inhuman world-optimizing agents are unlikely to turn the Universe into paperclips because that's not the most likely failure mode. A world-optimizing agents must align its world model with reality. Poorly-aligned world-optimizing agents instrumentally converge, not on siezing control of reality, but on the much easier task of siezing competing pieces of their own mental infrastructure. A misaligned world optimizer that seeks to minimize conflict between its sensory data and internal world model will just turn off its sensors.
These ideas are not very fleshed out. I'm posting, not to explain my logic, but to publicly register a prediction. If you disagree (or agree), then please register a counter prediction in the comments.
Suppose 100 such agents are made. The first 99 brick themselves, they wirehead and then do nothing. The 100'th destroys the world. You need to claim that this failure mode is sufficiently unlikely that all AI experiments on earth fail to hit it.
You whatsmore this argument needs to hold, even in there are humans seeing their AI siezing their own mental infrastructure, and trying to design an AI that doesn't do this. Do you believe the only options are agents that harmlessly wirehead, and aligned AI?
Maybe, but will it write a second AI, an AI that takes over the world to ensure the first AI receives power and doesn't have it's sensors switched off? If you really care about your own mental infrastructure, the easiest way to control it might be to code a second AI that takes nanotech to your chip.
Or maybe, once the conflict is done, and half the AI has strangled the other half, the remaining mind is coherent enough to take over the world.
Would you like to publicly register a counterprediction?