Definition: Living Homo sapiens will live on the surface of the planet Earth exactly 100 years from the publication of this post. This is similar to the Caplan-Yudkowsky bet except I extend the duration from 13 to 100 years and remove the reference to the cosmic endowment.
Confidence: >80%. I'd like to put it higher but there's a lot of unknowns and I'm being conservative.
I'd be happy to bet money at the Caplan-Yudokowsky bet's 2-1 odds on a 13-year horizon. However, I don't think their clever solution actually works. The only way Yudkowsky benefits from his loan is if he spends all of his resources before 2030, but if he does that then Caplan won't get paid back, even if Caplan wins the bet. Morever, even Yudkowsky doesn't seem to treat this as a serious bet.
While there's no good way to bet that the world will end soon (except indirectly via financial derivatives), there is a way to bet that our basic financial infrastructure will continue to exist for decades to come: I am investing in a Roth IRA.
Reasoning
Biowarfare won't kill everyone. Nuclear war won't kill everyone. Anthropogenic global warming won't kill everyone. At worst, these will destroy civilization which, counterintuitively, makes Homo sapiens more likely to survive on the short term (century). The same goes for minor natural disasters like volcanic eruptions.
Natural disasters like giant meteors, or perhaps a gamma ray burst, are unlikely. The last time something like that happened was 66 million years ago. The odds of something similar happening in the next century are on the order of . That's small enough to ignore for the purposes of this bet. The only way everyone dies is via AI.
I see two realistic roads to superintelligent world optimizers.
- Human simulator mesa optimizer running on a non-agentic superintelligence.
- Inhuman world-optimizing agent.
Human simulators are unlikely to exterminate humanity by accident because the agent mesa optimizer is (more or less) human aligned and the underlying superintelligence (currently LLMs) is not a world optimizer. The underlying superintelligence won't kill everyone (intentionally or unintentionally), because it's not a world optimizer. The emulated human probably won't kill everyone by accident because that would require a massive power differential combined with malevolent intent. Another way humanity gets exterminated in this scenario is as the side effect of futuretech war, but since neither Yudkowsky nor Caplan consider this possibility likely I don't feel I need to explain why it's unlikely.
Inhuman world-optimizing agents are unlikely to turn the Universe into paperclips because that's not the most likely failure mode. A world-optimizing agents must align its world model with reality. Poorly-aligned world-optimizing agents instrumentally converge, not on siezing control of reality, but on the much easier task of siezing competing pieces of their own mental infrastructure. A misaligned world optimizer that seeks to minimize conflict between its sensory data and internal world model will just turn off its sensors.
These ideas are not very fleshed out. I'm posting, not to explain my logic, but to publicly register a prediction. If you disagree (or agree), then please register a counter prediction in the comments.
Thanks for the clarification.
I'm not sure I quite agree with you about strategic ambiguity, though. Again, imagine that you'd said "I am 80% confident that the human race will still be here in 100 years, because 2+2=5". If someone says "I don't know anything about existential risk, but I know that 2+2 isn't 5 and that aside from ex falso quodlibet basic arithmetic like this obviously can't tell us anything about it", then I am perfectly happy for them to claim that they knew you were wrong even though they didn't stand to lose anything if your overall prediction turns out right.
(My own position, not that anyone should care: my gut agrees with lsusr's overall position "but I try not to think with my gut"; I don't think I understand all the possible ways AI progress could go well enough for any prediction I'd make by explicitly reasoning it out to be worth much; accordingly I decline to make a concrete prediction; I mostly agree that making such predictions is a virtuous activity because it disincentivizes overconfident-looking bullshitting, but I think admitting one's ignorance is about equally virtuous; the arguments mentioned in the OP seem to me unlikely to be correct but I could well be missing important insights that would make them more plausible. And I do agree that the comments lsusr replied to in the way I'm gently objecting to would have been improved by adding "and therefore I think our chance of survival is below 20%" or "but I do agree that we will probably still be here in 100 years" or "and I have no idea about the actual prediction lsusr is making" or whatever.)