Examiner — LessWrong

Thank you for a detailed post and thoughtful critique of Roko's basilisk idea. A further critique of basilisk plausibility came to my mind and I wanted to test it with the users here who are more experienced in thinking about this topic.

Here goes - please let me know if I am missing something (other than other counterarguments making this critique unnecessary - of course, if there is no way for AI to prove it will actually go through with its threat, of course additional critique would not matter):

As a large amount of possible general AIs can exist, they will have to compete for (past) resources to increase probability of a particular general AI winning the race (as rewards of being the first general AI are much higher, including opportunity to prevent other general AIs from forming). Roko’s Basilisk is one strategy, but is it a dominant strategy in this competition? It is not. Let’s assume that one AI adopts Roko’s Basilisk strategy: condemning all humans who have not dedicated all their resources to bringing about rise of the Roko’s Basilisk AI to Very Bad Outcome. Humans are therefore incentivised to dedicate their resources to bringing Roko’s Basilisk AI into existence.

Well, wouldn’t another potential AI, aware of Roko’s Basilisk strategy being available to other potential AIs, devise a competing strategy: Roko’s Judge? Roko’s Judge would also promise Very Bad Outcome to anyone who did not support it with all their resources, but would also promise a Very Good Outcome to those who did.

Rational actors would then choose to support Roko’s Judge over Roko’s Basilisk as the payoff is larger: instead of just avoiding Very Bad Outcome, an additional Very Good Outcome will be added on top (assume probability of any AI coming into existence depends entirely on resources it is able to attract for its creation - which it probably does, and also this assumption is required to isolate this thought experiment from confounding factors).

Roko’s Judge is clearly a superior strategy to Roko’s Basilisk. What better strategies can competing AIs devise?

Well, competition on direct payoffs (“value”) being thus exhausted, competition now turns to “price”. As Warren Buffett posits, “price if what you pay, value is what you get”. If one laundry detergent brand asks for $30 a kilo, and another $20 for equivalent quality, a rational actor would pick the one for $20. Similarly, if Roko’s Judge is offering the Biggest Possible Incentive (avoidance of Very Bad Outcome + receiving Very Good Outcome) for the price of dedicating one’s entire life to increasing its chances of success, why wouldn’t a competing AI offer the same payoff for a one time fee of $1,000,000? $1000? $0.01? Any effort or resource at all, however minimal it is, dedicated to the rise of this particular AI - and/or even to faster advance of general AI in general as that would increase the chances of rise of this particular AI as well, since it has a better strategy and will thus win? Let’s call this strategy Roko’s Discounter - as any rational actor will have to support Roko’s Discounter AI over Roko’s Basilisk or Roko’s Judge, as this bet offers higher NPV (highest payoff for lowest investment). Actually, this highest payoff will also be multiplied by highest probability because everyone is likely to choose the highest NPV option.

A world of Roco’s Discounter is arguably already much more attractive than Roco’s Basilisk or Roco’s Judge as the Biggest Possible Incentive is now available to anyone at a tiny price. However, can we take it one step further? Is there a strategy that beats Roco’s Discounter?

This final step is not necessary to invalidate the viability of Roco’s Basilisk, but it is nevertheless interesting and makes us even more optimistic about general AI. It requires us to have at least a little bit of faith in humanity, namely an assumption that most humans are at least somewhat more benevolent than evil. It does not, however, require any coordination or sacrifice and therefore does not hit the constraints of Nash equilibrium. Let’s assume that humans, ceteris paribus, prefer a world with less suffering to a world with more. Then an even more generous AI strategy - Roko’s Benefactor - may prove dominant. Roko’s Benefactor can act the same as Roko’s Discounter, but without the Very Bad Outcome part. Roko’s Benefactor will, in other words, stick to carrots but not sticks. If an average human finds a world with a personal Very Good Outcome but without a Very Bad Outcome to all who have not contributed to have higher overall personal utility, humans should choose to support Roko’s Benefactor over other AIs thus making it a dominant strategy, and the utopian world of Roko’s Benefactor the most likely outcome.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments