Consider the following scenario. MIRI succeeds beyond my wildest expectations. It comes up with a friendliness theory, and then uses it to make provably friendly AGI before anyone else can make an unfriendly one. And then a year and a half later, we find that Eliezer Yudkowsky has become the designated god-emperor of the lightcone, and the rest of the major MIRI researchers are his ministers. Woops.
My guess for the probability of this type of scenario given a huge MIRI success along those lines is around 15%. The reasoning is straightforward. (1) We don't know what's going on inside any particular person's head. (2) Many or most humans are selfish. (3) Looking altruistic is... (read 645 more words →)
I was supposed to check on this a long time ago, but forgot/went inactive on LW, but the post actually ended up at -26, so seemingly slightly lower than it was, which is evidence against your regression to 0 theory.