Posts

Sorted by New

Wiki Contributions

Comments

I think Nathaniel Eliot is the only one here who's hit the nail on the head: the stuff about boxes and gatekeepers is a largely irrelevant veneer over Eliezer's true claim: that he can convince another human to do something manifestly contrary to that human's self-interest, using only two hours and a chat window—and so, a fortiori, that a transhuman AI could do the same. And after all, humans have a huge history of being scammed, seduced, brainwashed, etc.; the only hard part here is the restricted time and method of interaction, and the initial certain knowledge of the gatekeeper that he has nothing to gain by capitulation. I think Eliezer made this clear with (a) his statement that the gatekeeper breaking character is legitimate and (b) his comment on the "Shut up and do the impossible" post, where he alludes to "an ability that I could never test in real life [because of ethics], and "the amazing clever way I'd contrived, to create a situation where I could ethically go all-out against someone".

So if I were to try this game as the "AI", the rules would be something like "You agree to read what I write for two hours (responding is optional); I will convince you to give me $X; if I fail, I'll give you $Y" (where X and Y are negotiated in advance, but large enough to be significant to the parties involved).