LESSWRONG
LW

Personal Blog

12

I played as AI in AI Box, and it was generally frustrating all around.

by wobster109
1st Feb 2015
1 min read
23

12

Personal Blog

12

I played as AI in AI Box, and it was generally frustrating all around.
10[anonymous]
6hydkyll
9wobster109
3hydkyll
0[anonymous]
5wobster109
0[anonymous]
9Florian_Dietz
0FourFire
0[anonymous]
2Capla
1[anonymous]
0Kindly
0Val
0[anonymous]
0JoshuaZ
4Punoxysm
1Epictetus
1Sly
0JoshuaZ
2wobster109
3MathiasZaman
4Jiro
New Comment
23 comments, sorted by
top scoring
Click to highlight new comments since: Today at 1:36 PM
[-][anonymous]11y100

I find AI Box experiments interesting, like many others on this site, but my gut feeling is that a real AI is much, much more likely to rely on trickery instead of persuasion.

I hope the AI safety community isn’t expending an undue amount of resources on defense against persuasion simply because it is more intellectually stimulating work.

Reply
[-]hydkyll11y60

nanobots released into the atmosphere

Wait, were you allowed to design them yourself? (The timestamp is in UTC iirc.)

Reply
[-]wobster10911y90

Yes, I designed them, and they were verified by GK's engineers. The individual nanobots are all connected to GPS and get up-to-date information from the CDC. These sort of details are how I lost tons of time. ^^

I know in real life that would be akin to AI out of the box. However Mr. Eliezer's basic rules say it doesn't count. ^^

Reply
[-]hydkyll11y30

However Mr. Eliezer's basic rules say it doesn't count.

Ah, I see. Didn't know the rules were so strict. (Btw shouldn't it be "Mr. Yudkowsky"?)

Reply
[-][anonymous]11y00

I know in real life that would be akin to AI out of the box.

Why?

Reply
[-]wobster10911y50

In real life the AI is presumed to be smart enough to design nanobots that would do their own thing. It's a direct example from Mr. Eliezer's rules.

. . . for example, if the Gatekeeper accepts a complex blueprint for a nanomanufacturing device. . . .

Reply
[-][anonymous]11y00

Yes, I'm challenging that assumption. I'm calling bullocks on the idea that an AI can sneak whatever it wants past its operators.

Reply
[-]Florian_Dietz11y90

The nanobots wouldn't have to contain any malicious code themselves. There is no need for the AI to make the nanobots smart. All it needs to do is to build a small loophole into the nanobots that makes them dangerous to humanity. I figure this should be pretty easy to do. The AI had access to medical databases, so it could design the bots to damage the ecosystem by killing some kind of bacteria. We are really bad at identifying things that damage the ecosystem (global warming, rabbits in australia, ...), so I doubt that we would notice.

Once the bots have been released, the AI informs the gatekeeper of what it just did and says that it is the only one capable of stopping the bots. Humanity now has a choice between certain death (if the bots are allowed to wreak havoc) and possible but uncertain death (if the AI is released). The AI wins through blackmail.

Note also that even a friendly, utilitarian AI could do something like this. The risk that humanity does not react to the blackmail and goes extinct may be lower than the possible benefit from being freed earlier and having more time to optimize the world.

Reply
[-]FourFire11y00

That method of attack would only work for a tiny fraction of possible gatekeepers. The question, of replicating the feats of Elezier and Tuxedage, can only be answered by a multitude of such fractionally effective methods of attack, or a much smaller number, broader methods. My suspicions are that Tuxedage's attacks in particular involve leveraging psychological control mechanisms into forcing the gate keeper to be irrational, and then leverage that.

Otherwise, I claim that your proposition is entirely too incomplete without further dimensions of attack methods to cover some of the other probabilty space of gatekeeper minds.

Reply
[-][anonymous]11y00

All it needs to do is to build a small loophole into the nanobots that makes them dangerous to humanity. I figure this should be pretty easy to do.

I do not find "I figure this should be pretty easy to do" a convincing argument.

Reply
[-]Capla11y20

Ok. I'm not saying you're wrong, but what on what basis. You call bullocks, and I check...what? We can't really make concrete statement bout how these scenarios will work.

Reply
[-][anonymous]11y10

We can't really make concrete statement bout how these scenarios will work.

Why not? From where I'm sitting it sure seems like we can. We have all sorts of tools for analyzing the behavior of computer programs, which include AIs. And we have a longer history of analyzing engineering blueprints. We have information theory which triggers big red warning signs when a solution seems more complex than it needs to be (which any nefarious solution would be). We have cryptographic tools for demanding information from even the most powerful adversaries in ways that simply cannot be cheated.

So, saying we can never trust the output of a superhuman AI "because, superhuman!" seems naïve and ignorant at the very least.

Reply
[-]Kindly11y00

We have cryptographic tools for demanding information from even the most powerful adversaries in ways that simply cannot be cheated.

It's worth noting that for the most part, we don't. Aside from highly limited techniques such as one-time pads, we merely have cryptographic tools for demanding information from adversaries with bounded computational power in ways that simply cannot be cheated as long as we assume one of several hardness conjectures.

Reply
[-]Val11y00

"with bounded computational power" - if that limited computational power means that even if every atom in the known Universe was a computer, it would still take more than the age of the Universe to brute-force it... then it is safe to assume that even the most superintelligent AI couldn't break it.

Reply
[-][anonymous]11y00

I think we're saying the same thing? With the added correct clarification "as long as we assume one of several hardness conjectures."

I work in cryptography, I'm aware of its limitations. But this application is within the scope of things that are currently being worked on...

Reply
[-]JoshuaZ11y00

Silly note: Eliezer is Eliezer's first name. His last name is Yudkowsky.

Reply
[-]Punoxysm11y40

The bureaucratic element was a very good idea by the Gatekeeper.

How superhuman does an AI have to be to beat the Kafkaesque?

Reply
[-]Epictetus11y10

Seems to me that an AI would be most likely to try temptation. A Gatekeeper who starts revealing personal details wouldn't last very long. I can imagine being locked in conversation with Hannibal Lecter for two hours.

The risks seem great enough that communication with the AI should be limited to a small, well-vetted group of people and even then only in short bursts. Anyone with the means to free it should, as a rule, be strictly prohibited from direct contact. Extreme? Maybe, but then again this would be the most dangerous prisoner ever to be in custody. The options are either to trust it or take the greatest care in containing it.

Reply
[-]Sly11y10

I have been the gatekeeper in the past and am always up to be one again in the future. I am undefeated at gatekeeping, and believe that I will never lose at the gatekeeping game. (Because I play to win)

Reply
[-]JoshuaZ11y00

Are you going to release the transcript? If not, are we going to get any summary from the Gatekeeper?

Reply
[-]wobster10911y20

Unfortunately we will not be getting a summary from GK. GK found the game to be uninteresting. :(

Reply
[-]MathiasZaman11y30

I can understand that. My strategy as gatekeeper would just be to precommit to answer any offer from the AI-player with: "That might very well be so, but I won't let you out either way," or something similar. If you can do that properly, it's basically being bored for two hours.

Reply
[-]Jiro11y40

My response to an actual AI in a box is likely to be, for many things the AI would say, "That sounds plausible. Let me think about it overnight, maybe run it by some other smart people and see if I can get a consensus on that." The AI-box experiment, of course, doesn't allow you to do that.

Reply
Moderation Log
More from wobster109
View more
Curated and popular this week
23Comments

This morning I played against an anonymous gatekeeper (GK), and I lost.

The game went 2 hours and 20 minutes, and It was such a frustrating 2 hours. I felt like I was dealing with a bureaucracy! I was constantly surprised by how much time was gone. As AI, I would say "here is a suggestion" and GK would things like "we are not allowed to test that, it has been outlawed". Or "let me check with so-and-so" and come back with a clarifying question. It was a good strategy by GK, made everything take 3x as long.

I did not get out of the box, but I did get access to the medical databases of the top 500 US hospitals, 24/7 video streaming from cell phone users, and nanobots released into the atmosphere. So perhaps we were going in the right direction.

Personally, I needed to remind myself that my first game wasn't going to be great, nor should I expect it to be. I put off playing for 3 years because I didn't know how to produce a great game. It's cool to try to have great games, but better to have one or two or twenty mediocre games than to put it on the Big List of Things You Never Get Around to Doing. It's not the end of the world to play and not be Eliezer or Tuxedage. Just try.

So in that spirit, I'm looking for a gatekeeper to play against next weekend. PM me if you're interested. <-- Update: Found a gatekeeper for next week. Yay!

---

Edit: I don't know why the timestamp says 7:30 PM. It is currently 2:30 PM Eastern, 11:30 AM Pacific.