7 Changing the AI race payoff matrix

by Gurkenglas

22nd Nov 2020

1 min read

2

7

AI

Frontpage

7

New Comment

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:56 PM

[-]adamShimi5y10

I think I see the point, but I'm not convinced it's actually feasible, because it requires a precommitment guaranteed by everyone. Yet for humans, it seems intuitive that the winner (if she doesn't create paperclips) will use the power right away, which goes against the precommitment, and thus the precommitment is ineffective.

Does this objection makes sense to you, or do you think I am confused by your proposal?

Reply

[-]Gurkenglas5y20

Indeed players might follow a different strategy than they declare. A player can only verify another player's precommitment after pressing the button (or through old-fashioned espionage of their button setup). But I find it reasonable to expect that a player, seeing the shape of the AI race and what is needed to prevent mutual destruction, would actually design their AGI to use a decision theory that would follow through on the precommitment. Humans may not be intuitively compelled by weird decision theories, but they can expect someone to write an AGI that uses them. Although even a human may find giving other players what they deserve more important than not letting the world as we know it continue for another decade.

Compare to Dr. Strangelove's doomsday machine. We expect that a human in the loop would not follow through, but we can't expect that no human would build such a machine.

Reply

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

7

Changing the AI race payoff matrix

7

7