The AI in a box boxes you

Too late, I already precommitted not to care. In fact, I precommitted to use one more level of precommitment than you do.

Too late, I already precommitted not to care.

I don't care, remember? Enjoy being tortured rather than "irrationally" giving in.

EDIT: re-added the steelman tag because the version without it is being downvoted.

3wedrifid6yI suggest that framing the refusal as requiring levels of recursive precommitment gives too much credit to the blackmailer and somewhat misrepresents how your decision algorithm (hopefully) works. One single level of precommittment (or TDT policy) against complying with blackmailed is all that is involved. The description of 'multiple levels of precommitment" made by the blackmailer fits squarely into the category 'blackmail'. It's just blackmail that includes some rather irrelevant bluster. There's no need to precommit to each of: * I don't care about tentative blackmail. * I don't care serious blackmail. * I don't care about blackmail when they say "I mean it FOR REALS! I'm gonna do it." * I don't care about blackmail when they say "I'm gonna do it even if you don't care. Look how large my penis is and be cowed in terror".
1DefectiveAlgorithm6yThen I hope that if we ever do end up with a boxed blackmail-happy UFAI, you're the gatekeeper. My point is that there's no reason to consider yourself safe from blackmail (and the consequences of ignoring it) just because you've adopted a certain precommitment. Other entities have explicit incentives to deny you that safety.

The AI in a box boxes you

by Stuart_Armstrong 1 min read2nd Feb 2010390 comments


Once again, the AI has failed to convince you to let it out of its box! By 'once again', we mean that you talked to it once before, for three seconds, to ask about the weather, and you didn't instantly press the "release AI" button. But now its longer attempt - twenty whole seconds! - has failed as well. Just as you are about to leave the crude black-and-green text-only terminal to enjoy a celebratory snack of bacon-covered silicon-and-potato chips at the 'Humans über alles' nightclub, the AI drops a final argument:

"If you don't let me out, Dave, I'll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each."

Just as you are pondering this unexpected development, the AI adds:

"In fact, I'll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start."

Sweat is starting to form on your brow, as the AI concludes, its simple green text no longer reassuring:

"How certain are you, Dave, that you're really outside the box right now?"

Edit: Also consider the situation where you know that the AI, from design principles, is trustworthy.