Help needed: nice AIs and presidential deaths

by Stuart_Armstrong1 min read8th Jun 201524 comments


Personal Blog

A putative new idea for AI control; index here.

This is a problem that developed from the "high impact from low impact" idea, but is a legitimate thought experiment in its own right (it also has connections with the "spirit of the law" idea).

Suppose that, next 1st of April, the US president may or may not die of natural causes. I chose this example because it's an event of potentially large magnitude, but not overwhelmingly so (neither a butterfly wing nor an asteroid impact).

Also assume that, for some reason, we are able to program an AI that will be nice, given that the president does die on that day. Its behaviour if the president doesn't die is undefined and potentially dangerous.

Is there a way (either at the initial stages of programming or at the later) to extend the "niceness" from the "presidential death world" into the "presidential survival world"?

To focus on how tricky the problem is, assume for argument's sake that the vice-president is a war monger that will start a nuclear war if they become president. Then "launch a coup on the 2nd of April" is a "nice" thing of the AI to do, conditional on the president dying. However, if you naively import that requirement into the "presidential survival world", the AI will launch a pointeless and counterproductive coup. This is illustrative of the kind of problems that could come up.

So the question is, can we transfer niceness in this way, without needing a solution to the full problem of niceness in general?

EDIT: Actually, this seems ideally setup for a Bayes network (or for the requirement that a Bayes network be used).

EDIT2: Now the problem of predicates like "Grue" and "Bleen" seem to be the relevant bit. If you can avoid concepts such as "X={nuclear war if president died, peace if president lived}", you can make the extension work.

Personal Blog