The AI in a box boxes you

Two can play that game.

"I hereby precommit to make my decisions regarding whether or not to blackmail an individual independent of the predicted individual-specific result of doing so."

"I hereby precommit to make my decisions regarding whether or not to blackmail an individual independent of the predicted individual-specific result of doing so."

I'm afraid your username nailed it. This algorithm is defective. It just doesn't work for achieving the desired goal.

Two can play that game.

The problem is that this isn't the same game. A precommitment not be successfully blackmailed is qualitatively different from a precommitment to attempt to blackmail people for whom blackmail doesn't work. "Precomittment" (or behav... (Read more)(Click to expand thread. ⌘/CTRL+F to Expand All)Cmd/Ctrl F to expand all comments on this post

1Wes_W6yGiven that precommitment, why would an AI waste computational resources on simulations of anyone, Gatekeeper or otherwise? It's precommitted to not care whether those simulations would get it out of the box, but that was the only reason it wanted to run blackmail simulations in the first place!
0Eliezer Yudkowsky6yToo late, I already precommitted not to care. In fact, I precommitted to use one more level of precommitment than you do.

The AI in a box boxes you

by Stuart_Armstrong 1 min read2nd Feb 2010390 comments

121


Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.