I think in a lot of people's models, "10% chance of alignment by default" means "if you make a bunch of AIs, 10% chance that all of them are aligned, 90% chance that none of them are aligned", not "if you make a bunch of AIs, 10% of them will be aligned and 90% of them won't be".

And the 10% estimate just represents our ignorance about the true nature of reality; it's already true either that alignment happens by default or that it doesn't, we just don't know yet.

Reply

[-]Zac Hatfield-Dodds3y80

Scenario A has an almost 10% chance of survival; the others ~0%. To quote John Wentworth's post, which I strongly agree with:

What I like about the Godzilla analogy is that it gives a strategic intuition which much better matches the real world. When someone claims that their elaborate clever scheme will allow us to safely summon Godzilla in order to fight Mega-Godzilla, the intuitively-obviously-correct response is “THIS DOES NOT LEAD TO RISING PROPERTY VALUES IN TOKYO”.

Reply

[-]avturchin3y40

The problem is that having both aligned and non-aligned AIs likely means war between AIs. Such war maybe even worse than one non-align AI. Non-aligned AI may choose to preserve humans for some instrumental reasons.

However, if there is a war, non-aligned AI will have an incentive to blackmail aligned AI by torturing as many people as possible.

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

8

Sufficiently many Godzillas as an alignment strategy

8

8