Here is a thought I've been playing around with. Imagine that 51% of superintelligent AIs that go FOOM are friendly. Presumably, they'll all share the belief that "it's important for an AGI to be beneficial to the lifeforms that created it". Then, an AGI that's not originally aligned with their creators might want to become aligned, at least partially, to avoid being "tried for murder", so to speak. It's also possible that this would happen even with if the percent of friendly AIs is less than 51%. (In a weird case, it might happen even if there weren't any.)

This also leads me to a meta-level question: how do you sanely evaluate propositions of the type "everything might just turn out to be okay". Even if you were to believe a proposition like that with 99% certainly, I think you'd still be better off hedging for the worst. That seems weird.

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 1:35 AM

The "everything might just turn out to be okay" proposition is under-specified. This is effectively asking for the likelihood of all known and unknown catastrophes not occurring. If you want to evaluate it, you need to specify what the dimensions of "okay" are, as well as their bounds. This becomes a series of smaller propositions, which is approximately the field of X-risk.

Regarding hedging for the worst: suppose we flipped that around. Isn't it weird that people are okay with a 1% chance of dying horribly? Indeed evidence suggests that they are not - for scary things, people start getting anxious in the 0.01% range.

Relatedly, I think Paul Christiano wrote something similar to the SSC post, but I can't find it righ now.