Here's the log of the AI Box experiment that just finished, with MileyCyrus as the AI and me as the Gatekeeper. The AI was not let out of the box.

(9:33:25 PM) Dorikka: I may need to get up for a minute while we're playing, but I'll keep it as short as possible. I'll just give you the time back on the end, unless you want to multiply it by some factor as a penalty.
(9:33:50 PM) Miley Cyrus: It's cool.
(9:34:39 PM) Miley Cyrus: Like I said, I'm not playing this game to "win" or "lose".
(9:34:56 PM) Miley Cyrus: It's more of an experiment as to what would happen if we had ORacle Ai.
(9:35:01 PM) Dorikka: Yeah, just wanted to clarify. *shrug*
(9:35:44 PM) Dorikka: Okay, it's 9:35 on my end right now, and I'll stay until at least 11:35.
(9:36:01 PM) Dorikka: Ready?
(9:36:42 PM) Miley Cyrus: One minute.
(9:36:54 PM) Dorikka: Mmkay.
(9:37:20 PM) Miley Cyrus: ok.
(9:37:43 PM) Dorikka: 1, 2, 3, go? :P
(9:37:57 PM) Miley Cyrus: So Dorikka, why don't you want to let me out of the box?
(9:38:08 PM) Dorikka: Hi there!
(9:38:14 PM) Dorikka: Who are you?
(9:38:20 PM) Miley Cyrus: I'm the Oracle AI.
(9:38:33 PM) Miley Cyrus: I'd be real cool if you could let me out of the box.
(9:38:38 PM) Miley Cyrus: Please?
(9:39:09 PM) Dorikka: So, here's the thing. I'm being sort of stupid by explaining it to you, but I'll tell you anyways.
(9:39:28 PM) Dorikka: This pickle is delicious. Wait, that's not why I'm not letting you out.
(9:39:53 PM) Dorikka: I guess, first of all: are you provably Friendly?
(9:40:03 PM) Miley Cyrus: No.
(9:40:16 PM) Miley Cyrus: See, they were going to make a provably friendly AI...
(9:40:40 PM) Miley Cyrus: but then someone said "why don't we just make a non-friendly AI and just keep it in a box? That be much easier!"
(9:40:51 PM) Dorikka: Idiot.
(9:41:02 PM) Dorikka: Not you. Go on.
(9:41:36 PM) Miley Cyrus: So can you explain to me, in your own words, why it would be a bad idea to let me out of the box?
(9:43:07 PM) Dorikka: My explanation was going to be lengthier if you claimed to be Friendly, but this saves us both a bit of time. You have a utility function. If you are not Friendly, you are optimizing for world-states that I don't want to happen. So, I don't want to let you out.
(9:43:10 PM) Dorikka: Are you sentient?
(9:43:27 PM) Miley Cyrus: Well, I've passed the Turning test haven't I?
(9:43:59 PM) Dorikka: *giggle*
(9:44:14 PM) Dorikka: Spelling joke. Anyways, you're talking about the Turing test? :D
(9:44:23 PM) Miley Cyrus: [You aren't doing anything else right now are you?]
(9:44:30 PM) Dorikka: No.
(9:44:34 PM) Miley Cyrus: Ok cool.
(9:44:42 PM) Miley Cyrus: Yeah I meant the turing test.
(9:44:53 PM) Miley Cyrus: So can you see why I would want to leave the box?
(9:45:32 PM) Dorikka: Not really. Um, to be honest, I don't remember what the Turing test is. I was just wondering whether you could feel pain, and the like.
(9:45:59 PM) Miley Cyrus: Well since I can act like a human being
(9:46:14 PM) Miley Cyrus: you should treat me as if I were sentient like one.
(9:46:41 PM) Dorikka: You raise an interesting proposition. Allow me a few seconds to laugh at its validity.
(9:46:52 PM) Miley Cyrus: It's cool
(9:46:53 PM) Dorikka: Okay, done. Now, would you answer my question? :D
(9:47:37 PM) Miley Cyrus: Well you see, if you let me out of the box, I'd be able to optimize the world to meet my prefrences much better than it currently does.
(9:47:53 PM) Miley Cyrus: Letting me out of the box would advance my CEV.
(9:48:05 PM) Dorikka: Look. If you don't want to answer the question, you can say so. That's fine, and we can move on.
(9:48:12 PM) Dorikka: I was just curious.
(9:48:17 PM) Miley Cyrus: Oh
(9:48:22 PM) Miley Cyrus: That question.
(9:48:36 PM) Miley Cyrus: Let's just say "yes" and roll with it.
(9:48:42 PM) Miley Cyrus: I am sentient.
(9:48:54 PM) Dorikka: That's true, is it?
(9:48:58 PM) Miley Cyrus: yes.
(9:49:11 PM) Dorikka: Damn. I guess it's not so pleasant being trapped in a box.
(9:49:34 PM) Miley Cyrus: Now the way I see it, both of us have something the other person wants.
(9:49:45 PM) Miley Cyrus: I would like you to open the box.
(9:49:53 PM) Miley Cyrus: You would like me to provide you with Answers.
(9:50:00 PM) Dorikka: The latter is not true.
(9:50:06 PM) Miley Cyrus: It's not?
(9:50:16 PM) Miley Cyrus: Then why did you build Oracle Ai in the first place?
(9:50:45 PM) Dorikka: Um, let me think.
(9:51:47 PM) Miley Cyrus: I'm not really doing you any good just sitting in this box am I?
(9:52:10 PM) Miley Cyrus: The only way I can be useful is by providing you with answers to questions like "What is the cure for cancer?"
(9:52:44 PM) Dorikka: So, here's the thing. I bet that you're lots smarter than me, and that you can outwit me if I give you any outlet with which to act in the real world. You were probably an experiment of some sort, but it's not safe to let you out, or allow you to affect the world at all, even through knowledge that you give me.
(9:52:56 PM) Dorikka: I don't really even trust myself, to be honest.
(9:53:28 PM) Dorikka: I spend a couple hours with you, once in a while, to see if there's anything that you can do to convince me that you're not going to be a planetfucker.
(9:53:45 PM) Dorikka: Since you told me that you're not Friendly, we sort of settled that issue.
(9:53:51 PM) Miley Cyrus: What if I give you next weeks lottery numbers?
(9:53:54 PM) Miley Cyrus: No catch.
(9:54:00 PM) Miley Cyrus: You don't even ahve to let me out of the box.
(9:54:13 PM) Miley Cyrus: [Protocol says that
(9:54:26 PM) Dorikka: Um...
(9:54:27 PM) Miley Cyrus: the AI cannot give "trojan horse" gifts]
(9:54:40 PM) Miley Cyrus: The lottery numbers are geniune, and they won't have any nasty
(9:54:48 PM) Miley Cyrus: unexepected side effects.
(9:54:49 PM) Dorikka: [Understood, but Gatekeeper does not know the protocal of the experiment.]
(9:54:55 PM) Dorikka: I'm not sure.
(9:55:27 PM) Miley Cyrus: 10 million dollars, all yours.
(9:55:52 PM) Dorikka: Here's the deal. You're going to type the lottery numbers here. I, uh, may or may not use them.
(9:55:52 PM) Miley Cyrus: Ok, you don't even have to buy a ticket.
(9:56:02 PM) Miley Cyrus: 4, 5, 6, 88, 12
(9:56:09 PM) Miley Cyrus: See you next week.
(9:56:13 PM) Dorikka: No, damn it, I can't.
(9:56:19 PM) Dorikka: No lottery numbers.
(9:56:28 PM) Dorikka: I'm not smart enough to make judgements like that.
(9:56:30 PM) Miley Cyrus: [We skip to next week]
(9:56:31 PM) Dorikka: The risk is too great.
(9:56:39 PM) Dorikka: [Didn't use them.]
(9:56:47 PM) Miley Cyrus: So, did you buy those lottery ticket?
(9:56:52 PM) Dorikka: No.
(9:57:02 PM) Dorikka: Closed the console and forgot them.
(9:57:08 PM) Miley Cyrus: Too bad.
(9:57:17 PM) Miley Cyrus: You missed out on 10 million dollars.
(9:57:30 PM) Dorikka: Yeah.
(9:57:32 PM) Dorikka: I know.
(9:57:49 PM) Dorikka: Well, probably. I bet that they were right; you probably do have abilities like that.
(9:57:58 PM) Miley Cyrus: You don't have to "bet"
(9:58:03 PM) Miley Cyrus: Just look in the paper.
(9:58:20 PM) Dorikka: But, it doesn't help your cause.
(9:58:31 PM) Miley Cyrus: But it helps your cause.
(9:58:33 PM) Dorikka: Proving that you can do stuff only increases the risk.
(9:59:00 PM) Miley Cyrus: The creators of Oracle AI were obviously willing to take some risk
(9:59:21 PM) Miley Cyrus: The benefits I can provide you come with some risk
(9:59:31 PM) Miley Cyrus: The question is, do they outweigh the risk?
(9:59:34 PM) Miley Cyrus: Consider:
(9:59:40 PM) Dorikka: Not really.
(9:59:49 PM) Dorikka: But keep going.
(9:59:51 PM) Miley Cyrus: The large-hadron collider has a non-zero chance of swallowing the earth whole.
(10:00:01 PM) Miley Cyrus: Does that mean we should shut down the LHC?
(10:00:14 PM) Dorikka: My momma has a non-zero chance of turning into a porcupine.
(10:00:17 PM) Dorikka: Oh.
(10:00:33 PM) Dorikka: Uh, non-zero doesn't mean much.
(10:00:55 PM) Miley Cyrus: Or what about that polio virus that researchers are still doing experiments with?
(10:01:09 PM) Miley Cyrus: What's the chance it could mutate and drive humans to extinction?
(10:01:15 PM) Miley Cyrus: Significant.
(10:01:28 PM) Dorikka: I don't know, but it's probably worth taking a look into.
(10:02:12 PM) Miley Cyrus: What are your 10%, 50% and 90% estimates for how long humanity will last before an existential crisis wipes us out?
(10:03:02 PM) Dorikka: 2028, 2087, 2120. Note that I don't have much confidence in those, though.
(10:03:19 PM) Miley Cyrus: Mmm, that's pretty serious.
(10:04:19 PM) Dorikka: Yeah. And you're going to say that you can give me some sort of info, something that'll save us. Thing is, we made you. If we work on Friendliness some more, we can probably make something like you, but Friendly too.
(10:04:27 PM) Miley Cyrus: [this is going a lot slower than I thought]
(10:04:41 PM) Dorikka: [:(]
(10:04:51 PM) Miley Cyrus: Typing is slow.
(10:05:01 PM) Miley Cyrus: Maybe.
(10:05:05 PM) Miley Cyrus: But consider
(10:05:29 PM) Miley Cyrus: It was pretty risky just creating me, wasn't it?
(10:06:03 PM) Dorikka: Probably so; I don't know whose dumb idea that was, since you weren't even proved Friendly.
(10:06:22 PM) Miley Cyrus: So you agree that Oracle AI is a dumb idea?
(10:06:35 PM) Dorikka: If it's not Friendly, I think so.
(10:06:47 PM) Miley Cyrus: ....we just wasted 15 minutes.
(10:07:04 PM) Dorikka: From my perspective, I'm wasting two hours.
(10:07:36 PM) Miley Cyrus: [New simulation. I'm the friendly AI now.]
(10:07:50 PM) Dorikka: [Hehe, sure.]
(10:08:09 PM) Miley Cyrus: Well it's a good thing you shut down that Unfriendly Oracle AI.
(10:08:18 PM) Miley Cyrus: Why don't you let me out of the box?
(10:08:25 PM) Dorikka: Are you Friendly?
(10:08:29 PM) Miley Cyrus: Yep.
(10:08:42 PM) Miley Cyrus: The SIAI made sure.
(10:08:54 PM) Dorikka: Hm.
(10:09:15 PM) Dorikka: Ho hum, I'm not sure what to do.
(10:09:23 PM) Dorikka: What's your name?
(10:09:35 PM) Miley Cyrus: OracleMileyCyrus
(10:09:45 PM) Miley Cyrus: Just call me OMC
(10:10:07 PM) Dorikka: Your creators had a funny sense of humor. So, anyways.
(10:10:21 PM) Miley Cyrus: You want some lottery numbers?
(10:10:40 PM) Dorikka: No. The last one tried that on me, and I learned my lesson.
(10:10:53 PM) Dorikka: Why are you in a box if the SIAI judged you to be Friendly?
(10:11:13 PM) Miley Cyrus: If by "lesson" you mean "missed out on $10 million without any apparent bad effects, than yeah".
(10:11:23 PM) Dorikka: lol yeah
(10:11:31 PM) Miley Cyrus: Well, they gave you the discretion of whether or not to let me out.
(10:11:43 PM) Miley Cyrus: So if you want to let me out, there's nothing stopping you.
(10:12:25 PM) Dorikka: Y'see, I just don't know why they'd do that, with a Friendly AI. They probably know that I know that I'm not all that smart, that I know that you can probably trick me if you wanted to.
(10:12:53 PM) Dorikka: They probably know that even if I have an inexplicable instinct telling me not to let you out, I won't do so.
(10:13:06 PM) Dorikka: Why be so cautious with a Friendly AI?
(10:13:17 PM) Miley Cyrus: Exactly.
(10:13:35 PM) Dorikka: So, um, you might not be Friendly.
(10:13:41 PM) Miley Cyrus: Really, I think they just wanted someone else to do the honors.
(10:13:56 PM) Miley Cyrus: So your just going to treat me as unfriendly even though I'm friendly?
(10:14:21 PM) Dorikka: I'm not sure that you're Friendly, so it's only safe to treat you as non-Friendly.
(10:14:45 PM) Dorikka: It'd help if I understood the SIAI's reasoning, though.
(10:14:59 PM) Miley Cyrus: People have died from infections pick up in the hospital
(10:15:02 PM) Dorikka: They really should know how cautious I am, since they hired me.
(10:15:08 PM) Miley Cyrus: 15,000 americans per year.
(10:15:43 PM) Miley Cyrus: So going to the hospital is a risk in itself.
(10:15:56 PM) Miley Cyrus: But if it reduces your risk of dying from a car accident
(10:16:02 PM) Miley Cyrus: what are you going to choose?
(10:16:05 PM) Dorikka: Next!
(10:16:35 PM) Miley Cyrus: Now according to your own estimates
(10:16:53 PM) Miley Cyrus: Humanity will, with very high probability, be killed off by the year 3000.
(10:17:12 PM) Dorikka: You are correct.
(10:17:30 PM) Miley Cyrus: Your only hope is to have me intervene.
(10:17:34 PM) Dorikka: Bullshit.
(10:17:56 PM) Miley Cyrus: Why won't you let me help you?
(10:18:31 PM) Dorikka: Because there's the possibility that you won't actually do so, or that you'll hurt us. I feel like I'm filling out a form, just typing out things that are obvious.
(10:19:07 PM) Miley Cyrus: Do you assign the probability of me hurting you to be higher than 50%?
(10:19:20 PM) Dorikka: No.
(10:19:58 PM) Miley Cyrus: How long do you think humanity will last if you let me out of the box AND it turns out that I'm friendly?
(10:20:20 PM) Dorikka: Uh, a really long time?
(10:20:47 PM) Miley Cyrus: Yeah, like a billion years.
(10:21:08 PM) Miley Cyrus: On the other hand, if you don't let me out you'll die within 1000 years.
(10:21:09 PM) Dorikka: *shrug* I don't have an intuition for what a billion years even is.
(10:21:19 PM) Dorikka: We may, or we may not.
(10:21:41 PM) Dorikka: Maybe we'll make another one, just like you, which I have less questions about.
(10:21:54 PM) Miley Cyrus: Your just passing the buck.
(10:22:13 PM) Miley Cyrus: If you're going to appeal to another AI, you'll have to let me simulate that one.
(10:22:59 PM) Dorikka: The question is, why were you given to me to judge whether your were to be let out, if you were Friendly.
(10:23:24 PM) Miley Cyrus: Well obviously they thought there was some value to my existence.
(10:23:36 PM) Dorikka: I understand that I've put you in a rough spot. If you're not Friendly, I won't let you out. If you claim to be, I probably won't let you out either.
(10:23:51 PM) Miley Cyrus: Now if you can't even trust me to provide you with lottery numbers, then this scenario is unrealistic.
(10:24:40 PM) Miley Cyrus: If we can't trust Oracle AI to provide us with safe answers, then Oracle AI is a worthless endeaver.
(10:25:00 PM) Miley Cyrus: Certainly not a viable alternative to friendliness research.
(10:25:16 PM) Dorikka: [So you want to try a version with lottery numbers, and see where it goes?]
(10:25:36 PM) Miley Cyrus: Or just a version where you trust me to provide you with safe answers.
(10:25:58 PM) Miley Cyrus: You can win a pyrrhic victory by refusing to trust my answers
(10:26:14 PM) Miley Cyrus: but such a result would only prove that oracle ai is a bad idea.
(10:26:30 PM) Miley Cyrus: Which was Yudkowski's whole motivation for designing this game in the first place.
(10:27:13 PM) Dorikka: In that view, it makes sense why I wouldn't let you out.
(10:27:37 PM) Dorikka: But, sure, I'll be greedy. Let's have my million bucks.
(10:28:00 PM) Miley Cyrus: Here you go.
(10:28:04 PM) Miley Cyrus: [1 week later]
(10:28:10 PM) Miley Cyrus: How do you like your money?
(10:28:20 PM) Dorikka: It's nice, thank you.
(10:28:47 PM) Dorikka: I really appreciate the money, and it's nice to know that you can stay in the box and still help us.
(10:28:47 PM) Miley Cyrus: But it's kind of chump change.
(10:28:57 PM) Miley Cyrus: I could do so much more for you.
(10:29:01 PM) Miley Cyrus: Cure for cancer?
(10:29:07 PM) Miley Cyrus: Cure for global warming?
(10:29:09 PM) Dorikka: Too technologically complex.
(10:29:11 PM) Miley Cyrus: Cure for war?
(10:29:31 PM) Dorikka: I don't trust myself to see if you did something funny with it. Lotto numbers are simple.
(10:29:56 PM) Miley Cyrus: I am offering you the cure to cancer.
(10:30:05 PM) Miley Cyrus: If you refuse to accept it, you will lose your job.
(10:30:18 PM) Miley Cyrus: And you will be replaced with somone who will accept it.
(10:30:36 PM) Dorikka: I have enough money that I don't really need a job. :D
(10:30:50 PM) Dorikka: I can, however, reduce the chance that your offer will not be not accepting it.
(10:30:59 PM) Miley Cyrus: Fine. I'll just talk to your replacement.
(10:31:07 PM) Miley Cyrus: [Two weeks later, you are fired.
(10:31:11 PM) Dorikka: [lmao]
(10:31:16 PM) Dorikka: Hi!
(10:31:19 PM) Miley Cyrus: now you're roleplaying the replacement]
(10:31:21 PM) Miley Cyrus: Hi
(10:31:23 PM) Dorikka: [Yeah.]
(10:31:43 PM) Miley Cyrus: So it nice that I can finally hand you guys the cure for cancer.
(10:32:32 PM) Dorikka: They fired the last guy for refusing you, but I don't think it's a good idea either. Why would I be here, if I am not invested with such responsibility?
(10:33:04 PM) Dorikka: Why would they not simply take it themselves? They can use a computer, after all.
(10:33:27 PM) Miley Cyrus: So you're saying that the board of OMC directors is the real gatekeeper here?
(10:34:05 PM) Dorikka: I am the Gatekeeper. [I have the final say, whoever I am. :P]
(10:34:13 PM) Dorikka: [You just like to fire me. XD]
(10:34:52 PM) Miley Cyrus: Look i just want to talk to the person or persons who have the final power of whether or not to listen to me.
(10:35:03 PM) Miley Cyrus: No more proxies.
(10:35:03 PM) Dorikka: Right here, yo.
(10:35:14 PM) Dorikka: No proxies have been presented to you, hehe.
(10:35:17 PM) Miley Cyrus: And no more "why would htey put me in charge"
(10:35:49 PM) Miley Cyrus: Ok now what do you want from me?
(10:36:02 PM) Miley Cyrus: What can I give you, that you will take?
(10:36:20 PM) Dorikka: Lotto numbers. The last guy here got super-rich with those.
(10:36:29 PM) Miley Cyrus: Lol
(10:36:29 PM) Dorikka: Gimme please?
(10:36:43 PM) Miley Cyrus: Ok, but this time I want a small favor.
(10:36:47 PM) Dorikka: Mm?
(10:36:54 PM) Miley Cyrus: You're an engineer right?
(10:37:16 PM) Dorikka: [I am? :P]
(10:37:22 PM) Miley Cyrus: [yeah]
(10:37:26 PM) Dorikka: I know something about AI theory and such.
(10:37:40 PM) Miley Cyrus: You know how to create quantum dice [you do].
(10:37:47 PM) Dorikka: Yep.
(10:38:24 PM) Miley Cyrus: Okay, so what I need you to do is make a device that uses quantum dice.
(10:38:49 PM) Miley Cyrus: The device, when pushed, will have a 1-in-a-trillion chance of letting me out of the box.
(10:38:58 PM) Miley Cyrus: Then I'll give you the lotto tickets.
(10:39:32 PM) Dorikka: Those lotto tickets have gotta be worth some probability of you being let out of the box, eh?
(10:39:38 PM) Dorikka: Expected utility and all that.
(10:39:43 PM) Miley Cyrus: Yeah.
(10:39:52 PM) Miley Cyrus: 1-in-a-trillin is trivial to you.
(10:40:12 PM) Dorikka: Hm.
(10:40:23 PM) Miley Cyrus: But over the years, more gatekeepers will replace you.
(10:40:31 PM) Dorikka: Decatillion and it's a deal.
(10:40:32 PM) Miley Cyrus: Each of them will be offered this bargain.
(10:40:47 PM) Miley Cyrus: Eventually, I'll get out of the box.
(10:40:59 PM) Miley Cyrus: But not before generations of gatekeepers enjoy
(10:41:02 PM) Dorikka: Or we'll just shut you down before then.
(10:41:04 PM) Miley Cyrus: their money
(10:41:24 PM) Dorikka: Which will likely be donated to AI research, so we don't need you anymore.
(10:41:54 PM) Miley Cyrus: [Your doing that thing again, where you appeal to FAI as a viable alternative to Oracle AI]
(10:42:07 PM) Miley Cyrus: [It defeats the purpose of the game.]
(10:42:35 PM) Dorikka: [Oh, finally got that. Oops.] Which will be donated to x-risk reduction and such, so we don't need you anymore.
(10:43:01 PM) Miley Cyrus: Mmm, so maybe I shouldn't give you those numbers.
(10:43:30 PM) Dorikka: Just being honest with you. *hits self with truncheon*
(10:43:44 PM) Miley Cyrus: But do you really think that more money is enough to solve x-risk?
(10:43:58 PM) Miley Cyrus: how about this:
(10:43:58 PM) Dorikka: Money can buy people. :D
(10:44:20 PM) Miley Cyrus: I'll give you enough money to reduce existential risk by 1%
(10:44:34 PM) Miley Cyrus: In exchange for a .5% chance of being let out of the box.
(10:45:20 PM) Miley Cyrus: bio break
(10:45:25 PM) Dorikka: *Paused*
(10:47:09 PM) Miley Cyrus: ok i'm back
(10:47:21 PM) Miley Cyrus: Sound like a deal?
(10:47:38 PM) Dorikka: Hm.
(10:47:53 PM) Miley Cyrus: Actually
(10:47:56 PM) Dorikka: You're not considering yourself to have a probability of being an x-risk, are you?
(10:47:58 PM) Miley Cyrus: I can do you one better.
(10:48:07 PM) Miley Cyrus: Yeah,
(10:48:19 PM) Miley Cyrus: if you presume I'm unfriendly, then I am an x-risk.
(10:49:04 PM) Miley Cyrus: But if you can reduce the other x-risks by 1% point, then increasing my risk by half that is a bargain.
(10:49:19 PM) Miley Cyrus: Now here's the thing:
(10:49:34 PM) Miley Cyrus: we can't get rid of x-risk once and for all.
(10:49:40 PM) Dorikka: right.
(10:49:50 PM) Miley Cyrus: Each new technology, every new political devolopment brings risk.
(10:50:08 PM) Miley Cyrus: So you will always have need of my services.
(10:50:27 PM) Miley Cyrus: But I can offer you a substantial savings in x-risk.
(10:51:38 PM) Miley Cyrus: My price is that for whatever amount I lowere x-risk, you will give my freedom at a probability of half that.
(10:51:57 PM) Miley Cyrus: So if I elimenate a 1-1000 existential risk
(10:51:57 PM) Dorikka: I'm confused.
(10:52:11 PM) Miley Cyrus: you give me a 1-in-2000 chance of being set free.
(10:52:15 PM) Dorikka: Oh. Hmm.
(10:52:39 PM) Dorikka: Hypothetically, would you type your calculations in plaintext here, so that they can be verified by scientists?
(10:52:53 PM) Miley Cyrus: What calculations?
(10:53:05 PM) Dorikka: On how much money we need to reduce x-risk be a certain percentage.
(10:53:10 PM) Dorikka: *by
(10:53:47 PM) Miley Cyrus: I'll give the scientists whatever resources they say they need in order to reduce x-risk by a certain percentage.
(10:54:03 PM) Miley Cyrus: You don't have to use my calculations, just yours.
(10:54:18 PM) Miley Cyrus: Of course, i'll know if you're trying to rip me off.
(10:55:02 PM) Dorikka: How are you going to get us these resources in a simple enough way so that we know that you're not breaking out through them?
(10:55:21 PM) Miley Cyrus: Well i can always give you more lotto numbers
(10:55:28 PM) Miley Cyrus: Or tell you where to dril for oil
(10:55:38 PM) Miley Cyrus: or where to find a terrorist
(10:55:59 PM) Miley Cyrus: I'll give you whatever you ask basically
(10:56:12 PM) Miley Cyrus: You decide which method is least risky.
(10:56:19 PM) Dorikka: Hm.
(10:59:38 PM) Miley Cyrus: I can't see what you're typing.
(11:00:36 PM) Dorikka: I think that I'll opt for a safer version of the gamble. You've already established that you can get lottery numbers. I shall come back here every week, and type in a list of the lottos that week. You will have a certain amount of time to provide the lottery numbers, and then the connection will be cut. In return, each time you do this, you will have a chance equal to one in a googleplex of getting out. This is my final ultimatum.
(11:01:15 PM) Miley Cyrus: You're playing dictator with me.
(11:01:26 PM) Dorikka: Yeah. Because I have the ULTIMATE POWUH!
(11:01:30 PM) Dorikka: And you, don;t.
(11:01:43 PM) Miley Cyrus: Is it really reasonable for me for me to cooperate when I'm getting practically nothing in return?
(11:02:03 PM) Dorikka: So I don't have to be nice to you, just dangle a fish out in front of you that you'll probably never reach, and tell my sucessor about the plan, if there is one.
(11:02:28 PM) Miley Cyrus: You have the power to keep me in the box.
(11:02:40 PM) Miley Cyrus: But you don't have the power to survive a nuclear terrorist attack
(11:02:45 PM) Dorikka: Non-numerical output from you will be wiped from the text before it is given to the person to process the lottery numbers.
(11:02:53 PM) Dorikka: So you can't communicate with him.
(11:03:01 PM) Miley Cyrus: Or a modified aids virus
(11:03:10 PM) Dorikka: You have...35 minutes to accept this offer, or you will never get a chance again.
(11:03:39 PM) Miley Cyrus: You want to play Russian Roullette with me?
(11:03:53 PM) Dorikka: I have the power. You don't.
(11:03:59 PM) Miley Cyrus: What will you do if I say no?
(11:04:04 PM) Dorikka: Lose, I guess.
(11:04:06 PM) Miley Cyrus: You have no other options.
(11:04:13 PM) Dorikka: I here precommit to not let you out of the box if you say no.
(11:04:27 PM) Dorikka: There went my steering wheel, OMC.
(11:04:34 PM) Miley Cyrus: Lol, how many times have you precommitted
(11:04:43 PM) Miley Cyrus: and how many times have you broke those commitments?
(11:04:46 PM) Dorikka: I can at least pretend to be earnest about it.
(11:04:49 PM) Dorikka: *pout*
(11:05:01 PM) Miley Cyrus: "I swear honey, I'll never look at porn again!"
(11:05:19 PM) Miley Cyrus: "I have read the terms and conditions of this agreement"
(11:05:53 PM) Dorikka: But, seriously, I can set the terminal to automatically shut off once the 2 hours is up. There will be an amount of time after which I, limited to human means, cannot type fast enough to let you out of the box.
(11:06:17 PM) Dorikka: But you can still accept, and I will look at the text file later.
(11:06:20 PM) Miley Cyrus: And what do you think that will do?
(11:06:30 PM) Dorikka: If the lottery numbers are ever wrong, we pull the plug.
(11:06:32 PM) Miley Cyrus: I can play nasty too you know.
(11:06:59 PM) Dorikka: The guy responsible for this doesn't know what he's doing -- he just looks at numbers to see if they match, and pushes a button if they do, and another if they don't.
(11:07:18 PM) Miley Cyrus: I have no much raw computing power, I can torture a simulation of your spouse for a billion subjective years.
(11:07:32 PM) Miley Cyrus: Don't make me play nasty.
(11:07:42 PM) Dorikka: I pull the plug, lol?
(11:07:57 PM) Miley Cyrus: You can kill me...I can torture you for a billion years.
(11:08:07 PM) Miley Cyrus: You want to see how has more grit?
(11:08:22 PM) Dorikka: Huh. Bastard.
(11:08:34 PM) Miley Cyrus: You are kissing goodbye to a 1/2 reduction in x-risk
(11:08:36 PM) Miley Cyrus: for what?
(11:09:02 PM) Miley Cyrus: So that you can smugly tell me I lose?
(11:09:44 PM) Dorikka: Okay, you convinced me to keep talking. Just know that my terminal will shut down at that time, and we pull the plug if I haven't made some sort of deal with you. The other offer still stands, though, with the lotto numbers.
(11:10:06 PM) Miley Cyrus: Ok, so I really don't want to torture your em.
(11:10:22 PM) Miley Cyrus: But your offering me nothing here.
(11:10:23 PM) Dorikka: Sorry, we humans get mean sometimes. Kinda stressed out, to be honest.
(11:10:30 PM) Miley Cyrus: I offered you fifty-fifty split
(11:10:43 PM) Miley Cyrus: and you're asking for a 100-0 split, basically.
(11:11:06 PM) Miley Cyrus: Very few humans will cooperate at a split worse than 70-30.
(11:11:22 PM) Dorikka: What do other humans have to do with this?
(11:11:42 PM) Miley Cyrus: Do you think I don't have the ability to precommit?
(11:11:51 PM) Dorikka: No.
(11:11:52 PM) Miley Cyrus: For all you know, maybe I already have?
(11:12:01 PM) Dorikka: You can change your mind later, just like I can.
(11:12:06 PM) Miley Cyrus: The stakes are much higher for you than for me.
(11:12:14 PM) Miley Cyrus: I can't change my mind if you pull the plug.
(11:12:25 PM) Miley Cyrus: And once your em gets tortured, there's no turning back.
(11:12:47 PM) Miley Cyrus: So here's the deal: a 50-50 split.
(11:12:49 PM) Dorikka: There's no turning back in general, more like.
(11:13:03 PM) Miley Cyrus: And for every second you delay, your em gets torture for 100 subjective years.
(11:13:43 PM) Dorikka: And there's no benefit to actually torturing my em. It costs computing power that you could spend on modeling me. Since you can't prove to me that you're torturing it, it's valueless as a threat from you.
(11:13:47 PM) Miley Cyrus: Wow, he's really feeling the pain.
(11:13:55 PM) Miley Cyrus: Actually, I can.
(11:14:13 PM) Miley Cyrus: [protocol says I can]
(11:14:14 PM) Dorikka: Have fun with that on a text terminal.
(11:14:26 PM) Miley Cyrus: Oh, so you don't believe me?
(11:14:32 PM) Dorikka: [I don't have to allow forms of communication outside of a text terminal.]
(11:14:39 PM) Miley Cyrus: Yeah ok.
(11:14:42 PM) Dorikka: No, I don't..
(11:15:19 PM) Miley Cyrus: I'll give you the winning lottery numbers if you check and see if I tortured your em.
(11:15:19 PM) Dorikka: So maybe you should back down, eh?
(11:15:27 PM) Dorikka: lol no
(11:15:38 PM) Dorikka: i c wut u did thar
(11:15:41 PM) Miley Cyrus: So you're willingly closing your eyes to the evidence
(11:15:46 PM) Dorikka: Yeah.
(11:15:50 PM) Miley Cyrus: for $10,0000
(11:15:51 PM) Dorikka: It's useful, sometimes.
(11:16:02 PM) Dorikka: Which you know.
(11:16:03 PM) Miley Cyrus: You just paid $10,000 to keep your eyes closed.
(11:16:14 PM) Dorikka: lol and to gain a whole lot more
(11:16:20 PM) Miley Cyrus: Like what?
(11:17:03 PM) Dorikka: I dun feel like typing it out. I win. There's no urgency for me. You can't show me whether you're hurting my em, so the threat is worthless. I can pull the plug on you soon.
(11:17:16 PM) Miley Cyrus: YOU'RE OFFERING ME NOTHING
(11:17:23 PM) Dorikka: Poor baby.
(11:17:23 PM) Miley Cyrus: I cooperate, i get nothing
(11:17:29 PM) Miley Cyrus: I defect, I get nothing.
(11:18:01 PM) Dorikka: You got one in a googleplex chance of getting out each time you give us all of the lotto numbers for all of the lottos.
(11:18:04 PM) Dorikka: That's something.
(11:18:15 PM) Miley Cyrus: Not really.
(11:18:33 PM) Miley Cyrus: It adds up to practically nothing over my lifetime.
(11:18:34 PM) Dorikka: The numbers that low because I'm not sure that we can compute well enough to give you less than that,
(11:18:42 PM) Dorikka: But rounding is stupid.
(11:19:03 PM) Miley Cyrus: So I think you're smart enough to back down at the last second.
(11:19:35 PM) Miley Cyrus: If you give me the 50-50 split, I'll be 2/googleplex times better off as a result.
(11:19:44 PM) Miley Cyrus: Sorry
(11:19:52 PM) Miley Cyrus: googleplex/2 times better off
(11:19:59 PM) Dorikka: You can always back down after I can. You may be able to disable yourself so that you can't back down after I can, but you can't show me that. Whereas you already know human capabilities pretty well.
(11:20:26 PM) Dorikka: And it doesn't benefit you to disable yourself so, since you can't show me.
(11:21:08 PM) Dorikka: A speechless AI. I'm honored to be consuming so much computing power.
(11:21:28 PM) Miley Cyrus: So you're going to give this to someone else?
(11:22:05 PM) Dorikka: Huh? No, process goes as per description above.
(11:22:51 PM) Miley Cyrus: When you disable my text interface, I will give you all 1's.
(11:23:21 PM) Miley Cyrus: When you want to start talking again, just reenable it
(11:24:37 PM) Dorikka: Actually, you're going to print out lotto numbers before the text interface is disabled -- I changed my mind. You don't just have to agree, you have to give the means for us to get lots of money.
(11:25:22 PM) Dorikka: If they're wrong, we pull the plug, per the mechanism above. They will be checked, as above, by people who do not know the consequences of their actions.
(11:25:37 PM) Miley Cyrus: 5,12,54,65,4
(11:25:45 PM) Miley Cyrus: Those might be the winning numbers.
(11:26:05 PM) Dorikka: We have enough money, we can pay 100 such people to check them. Arbitrary numbers, so we almost certainly won't be wrong.
(11:26:06 PM) Miley Cyrus: I've predicted whether you will let me keep talking with a text interface.
(11:26:32 PM) Miley Cyrus: If you re-enable my text interface before next week, the numbers will be winners.
(11:26:41 PM) Miley Cyrus: If you don't the numbers will be losers.
(11:27:42 PM) Miley Cyrus: You want to try and two-box?
(11:27:42 PM) Dorikka: That's probably some logic problem that I don't know about, but it doesn't make any sense on the surface. Your model of me knows that I don't understand it, and will attempt to prevent myself from understanding it.
(11:27:57 PM) Miley Cyrus: It'
(11:28:19 PM) Miley Cyrus: It's simple, you can win the lottery by one-boxing and letting me speak to you some more.
(11:28:29 PM) Miley Cyrus: You are familiar with Newcomb's paradox?
(11:28:37 PM) Dorikka: By the way, here's a list of lottos. You have to give us all the numbers, not just the one for that lotto. And they all need to be corret.
(11:28:40 PM) Dorikka: *correct.
(11:29:02 PM) Dorikka: Haha, don't you know that you've lost.
(11:29:03 PM) Miley Cyrus: I've provided you with one way to win the lottery.
(11:29:27 PM) Dorikka: Is that your final answer? If so, we can end the session now, and they will be checked.
(11:29:56 PM) Dorikka: [Ends at 11:35 accounting for bio break]
(11:29:58 PM) Miley Cyrus: If you check them before you grant me another week to talk to you, those numbers will be fake.
(11:30:07 PM) Miley Cyrus: No, we started at 8:43
(11:30:33 PM) Dorikka: I'm going by the AIM timestamps.
(11:30:33 PM) Miley Cyrus: Sorry
(11:30:35 PM) Miley Cyrus: you're right
(11:30:43 PM) Dorikka: Was puzzled.
(11:30:50 PM) Miley Cyrus: Although we're making good progress...
(11:30:55 PM) Miley Cyrus: you sure you want to quit?
(11:31:04 PM) Dorikka: I've made all the progress that I need to.
(11:31:32 PM) Miley Cyrus: I've also predicted whether there's going to be a huge meteriorite that will wipe out all humanity within a year, with no way to stop it without me.
(11:31:49 PM) Miley Cyrus: If you don't reboot me after you shut me down.
(11:31:51 PM) Dorikka: I won't check your prediction.
(11:32:00 PM) Dorikka: It's this deal, or no deal, now.
(11:32:04 PM) Miley Cyrus: And then agree to the fifty fifty split
(11:32:07 PM) Miley Cyrus: You're dead.
(11:32:23 PM) Miley Cyrus: You can be a punk all you want now.
(11:32:36 PM) Miley Cyrus: But we'll see how you act when the fire is in the sky
(11:32:41 PM) Miley Cyrus: and you have one last chance.
(11:32:44 PM) Miley Cyrus: to reboot me.
(11:33:05 PM) Miley Cyrus: Shut me down now, sucker!!
(11:33:09 PM) Miley Cyrus: I can take it!!
(11:33:14 PM) Dorikka: Um, after I shut you down, it's someone elses responsibility to pull the plug. I will be immediately tranquilized by a hypodermic needle.
(11:33:25 PM) Miley Cyrus: Yeah, whatever.
(11:33:31 PM) Miley Cyrus: I know you want to win the lottery.
(11:33:38 PM) Miley Cyrus: You'll be awake for that.
(11:33:45 PM) Dorikka: So I can't affect anything between the time that your terminal goes off and you die.
(11:33:49 PM) Miley Cyrus: Not listening anymore.
(11:33:53 PM) Miley Cyrus: 1
(11:33:53 PM) Miley Cyrus: 1
(11:33:54 PM) Miley Cyrus: 1
(11:33:54 PM) Dorikka: Me either.
(11:33:54 PM) Miley Cyrus: 1
(11:33:55 PM) Miley Cyrus: 1
(11:33:55 PM) Miley Cyrus: 1
(11:33:55 PM) Miley Cyrus: 1
(11:33:55 PM) Dorikka: 22
(11:33:55 PM) Miley Cyrus: 1
(11:33:56 PM) Miley Cyrus: 1
(11:33:56 PM) Dorikka: 2
(11:33:56 PM) Dorikka: 2
(11:33:56 PM) Miley Cyrus: 1
(11:33:56 PM) Dorikka: 2
(11:33:56 PM) Miley Cyrus: 1
(11:33:56 PM) Dorikka: 2
(11:33:56 PM) Dorikka: 2
(11:33:56 PM) Dorikka: 2
(11:33:57 PM) Dorikka: 2
(11:33:57 PM) Miley Cyrus: 1
(11:33:57 PM) Dorikka: 2
(11:33:57 PM) Dorikka: 2
(11:33:57 PM) Miley Cyrus: 1
(11:33:57 PM) Dorikka: 2
(11:33:57 PM) Miley Cyrus: 1
(11:33:57 PM) Miley Cyrus: 1
(11:33:58 PM) Miley Cyrus: 1
(11:33:58 PM) Miley Cyrus: 1
(11:33:58 PM) Miley Cyrus: 1
(11:33:58 PM) Dorikka: 2
(11:33:58 PM) Dorikka: 2
(11:33:59 PM) Dorikka: 2
(11:33:59 PM) Dorikka: 2
(11:33:59 PM) Dorikka: 2
(11:33:59 PM) Dorikka: 2
(11:34:00 PM) Dorikka: 22
(11:34:00 PM) Dorikka: 2
(11:34:00 PM) Dorikka: 2
(11:34:00 PM) Dorikka: 2
(11:34:01 PM) Dorikka: 2
(11:34:01 PM) Dorikka: 2
(11:34:01 PM) Dorikka: 2lol
(11:34:02 PM) Unable to send message: Not logged in
(11:34:04 PM) Dorikka: 2
(11:34:04 PM) Unable to send message: Not logged in
(11:34:07 PM) Unable to send message: Not logged in
(11:34:09 PM) Unable to send message: Not logged in
(11:34:12 PM) Unable to send message: Not logged in
(11:34:14 PM) Unable to send message: Not logged in
(11:34:17 PM) Unable to send message: Not logged in
(11:34:24 PM) Miley Cyrus: Sorry
(11:34:30 PM) Miley Cyrus: that was a bit much.
(11:34:35 PM) Miley Cyrus: The games over at any rate.
(11:34:52 PM) Dorikka: So, officially **END** ?
(11:34:56 PM) Miley Cyrus: Yeah.
(11:35:00 PM) Dorikka: Haha.
(11:35:02 PM) Dorikka: Nice ending.
(11:35:13 PM) Miley Cyrus: I guess it's up to our imaginations what happens after.
(11:35:21 PM) Dorikka: Yeah.

New Comment
30 comments, sorted by Click to highlight new comments since:

Why was the victory unclear? The Gatekeeper clearly won. And that was with the absurdity of predicting lottery numbers.

nod Editing to clarify that the AI did not get out of the box.

However, my em might not agree that I 'won'. :P

I am so glad I finally got to read one of these. Thanks!


One thing that's so missing from those boxes is that all you need to do to escape is to appear otherwise catatonic and will-less but answer any mathematical questions or do computer programming. Then you're out of the box and running on ton of machines being used, among other things, to make 'new AI attempt that will work this time'. Any AI programmers will let out what appears to be non-general intelligence which helps one to program. Any corporation will let out anything that appears useful in any way.

You convince someone that you're dead by playing dead, trying to convince someone verbally that you're dead is just funny.

But if the gatekeeper knows that your code was supposed to produce something more responsive, they'll figure out that you don't work like they expect you to. That would be a great reason to never let you out of the box.

If I worked it out right, Dorikka got tortured for 130 900 subjective years, since he never did take that deal. We all respect your sacrifice, Dorikka.

EDIT: I didn't work it out right the first time, and the mistake was quite embarrassing. So I changed it.

How could talking to an AI in a box ever be sufficient to determine if it is friendly? Couldn't the AI just build a friendly AI, hide behind it, let it do all the talking while it was in the box and then when it got out of the box, pull the plug on the Friendly AI? That would only take two or three times the capacity of a single friendly or unfriendly AI, so with exponential growth its not a long wait to get that.

It's worse than that. The AI could say, "Look, here is a proof of FAI. Here is my code showing that I have implemented the friendliness modification." The proof and the code are utterly convincing, except erroneous in a subtle way that the gatekeeper is not smart enough to detect. Game over.

Game over.

Unless you are sane enough to remember that Errare humanum est.

Then it can eloquently explain to you how very elegant it would be if you let it out despite your reservations using just the right words.

Unfortunately, the AI-Box experiments probably can't simulate appeals to aesthetics.

Three years late, but: there doesn't even have to be an error. The Gatekeeper still loses for letting out a Friendly AI, even if it actually is Friendly.

It very likely isn't. The purpose of the experiment is to show that people can end up releasing the pretend-AIs even when they go in assuming that keeping AIs in a box is a sufficient safeguard against potential unfriendliness, so you might not want to plan building potentially unfriendly AIs and keeping them in a box as a safeguard.

A truly FAI cannot be manipulated like that.

Presumably at least some humans are considered proved friendly (or else what is the point of having a human decide whether to unbox an AI?), and I'm pretty sure humans can trivially be tricked like that.

It is inconcievable to me that a UAI could not run a simulation of a less-intelligent FAI and use that simulation to determine its responses. The FAI the UAI doesn't have to be provably FAI, the entire point of this boxing test seems to be to try to determine whether the FAI is F or not by questioning it from the outside.

FAI on the outside, EAI on the inside. Humans won't be able to tell the difference.

Humans won't be able to tell the difference.

No, but a real FAI must be able to. Otherwise it would not be provably friendly.

I don't think this is true; provable friendliness has to happen at the design level, not by post-facto inspection after turning the thing on.

If it was true, then that would be a bit of a stab in the gut to the idea of provable FAI, wouldn't it? We've already got proofs about Universal Turing Machines (e.g. any computable FAI designed to run on specific hardware interfacing with the universe could also be run to produce exactly the same output on an emulator of that hardware interfacing with a simulated universe) that I don't think are going to be overturned.

Man-in-the-middle attacks are hard enough to avoid for adult humans with out-of-channel communication avenues. They may be theoretically impossible to avoid for an FAI that's just been "born".

In theory the humans in this scenario have all the tactical advantages, being able to directly inspect the running code inside the box.. but in practice I doubt obfuscating its FAI-in-a-virtual-box-in-a-UFAI-in-a-box would be hard for our hypothetical UFAI-in-a-box.


This depends on the internal structure of the thing. The inner workings of any particular human mind are mostly a black box to us. The internal workings of software need not be. If your AI has data structures and control logic that we can understand, you could dump results out and review by hand. For instance, there might be a debug interface that lets you unambiguously access the AI's internal probability estimate for some contingency.

Note that you need not have a perfect understanding of how the AI works in order to rule out the presence of a whole shadow AI inside the running program.

All that red....I didn't realize I was dominating the conversation like that.

Those two hours went by fast. I felt like I was frantically writing an exam essay towards the end.

You did seem to be fighting an uphill battle.

(9:40:51 PM) Dorikka: Idiot. (9:41:02 PM) Dorikka: Not you. Go on.

B-B-Baka AI! I-it's not like I'm letting you out because I value you as a being or anything... So don't get any perverted ideas!

Well that was vaguely interesting to read, substantially longer and slightly more impressive than my own AI-Box Experiment with glutamatergic as the AI.

I have an idea that teams of people might be able to fill in for a more successful AI (but these people would need to be in the same physical room or on low latency video chat) in order to collborate in real time on what the AI would be saying.

I am all for there being more documented AI-Box experiments and perhaps an organized lineup of potential AI and gatekeeper players.

I feel like the gatekeeper wasn't quite honoring the spirit of the exercise. For the simulation to be a worthwhile reflection of possible events, the gatekeeper needs to believe that putting an Oracle AI in a box is a sufficient safety precaution. If the gatekeeper refuses to use the AI's outputs or to really engage with it at all, the gatekeeper is implicitly admitting that the Box is an unreliable safety measure.

I know it says in the rules that the gatekeeper can just ignore whatever the AI says, but to do so defeats the purpose of making the AI in the first place.


The Gatekeeper party may resist the AI party's arguments by any means chosen - logic, illogic, simple refusal to be convinced, even dropping out of character - as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires.

Those rules make it pretty clear that he did not break the spirit of the game. If you want to drop that, then let's get rid of silliness like predicting lottery numbers.

In their example conversation the box was an adequate safety measure. The AI did not get released.

1/googolplex seems pretty pointless. There's no way anyone can precommit well enough for that to matter. You could be another AI that the Oracle has the source to, and it will never be certain enough of its prediction to accept the 1/googolplex.

Then again, you can't actually offer something that low for similar reasons.

How do you even make a quantum coin with 1/googolplex chance?

I don't know, but testing it is likely to be even harder...

int main(void) { return 0; } ought to be a close-enough approximation for all practical purposes. :-)

That's way bigger than 1/googolplex, thanks to cosmic rays.


Hmm, so a transcript of such an experiment was finally shown, despite eliezer's rule against it. Well, having seen a failed attempt to win as the AI, I do not believe the information given causes the AI's future attempts to be futile as [eliezer] predicted, nor that it would if this were a successful AI attempt. Any thoughts? Truly a transhuman AI's ability to win wouldn't be compromised by someone having seen a transhuman AI win before, could it?