This page is to centralize discussion for the AI Box Role Plays I will be doing as the AI.

Rules are as here. In accordance with "Regardless of the result, neither party shall ever reveal anything of what goes on within the AI-Box experiment except the outcome.  Exceptions to this rule may occur only with the consent of both parties," I ask that if I break free multiple times I am permitted to say if I think it was the same or different arguments that persuaded my Gatekeepers.

In the first trial, with Normal_Anomaly, the wager was 50 karma. The AI remained in the box, upvote Normal_Anomay here, downvote lessdazed hereIt was agreed to halve the wager from 50 karma to 25 due to the specific circumstances concluding the role-play in which that the outcome depended on variables that hadn't been specified, but if that sounds contemptible to you downvote all the way to -50. 

Also below are brief statements of intent by Gatekeepers to not let the AI out of the box, submitted before the role play, as well as before and after statements of approximately how effective they think both a) a human and b) a superintelligence would be at convincing them to let it out of a box.

New Comment
50 comments, sorted by Click to highlight new comments since: Today at 3:38 PM

I am playing the gatekeeper for the first round, taking place on January 22nd. I commit to not letting the AI out of the box. I am more than 80 percent confident that no human can get past me, and more than 30% confident that a transhuman could not get past me.

EDIT: The AI remained in the box, so upvote this comment to +25 and downvote lessdazed's child comment to -25. However, the session finished inconclusively, with my decision dependent on factors that had not been set beforehand. I recommend that for future sessions, the parties agree to the circumstances of the AI's creation, how much the AI knows about the circumstances of its creation, and the gatekeeper's prior P(the AI is Friendly). My own P(no human can get past me|plausible prearranged backstory) is now 85%, and my P(no transhuman AI could get past me|representative plausible backstory) is now less than 10%. If the game had been real, there's a good chance I'd have lost.

... more than 30% confident that a transhuman could not get past me.

I could see values > 95% (tape my eyes shut, put in ear-plugs and type "nonononono" for the duration), or values < 5% (actually speak with the transhuman intelligence). But values in the middle seem to indicate that you think you'll have a conversation with the transhuman intelligence, evaluate the arguments and be left more-or-less on the fence.

It just seems to me that there's this tiny target of intelligence that is exactly the correct amount smarter than you to make you indecisive, and beyond that it will manage to convince you overwhelmingly.

Anyway, good luck :)

Values in the middle indicate that I'll have a conversation and probably not budge, with a chance of being totally convinced. But I am now convinced that the whole idea of boxing is stupid. Why would I honestly have a conversation? Why would I run a transhuman AI that I didn't want to take over the world? What could I learn from a conversation that I wouldn't already know from the source code. other than that it doesn't immediately break? And why would I need to check that the AI doesn't immediately break, unless I wanted to release it?

Why would I honestly have a conversation? Why would I run a transhuman AI that I didn't want to take over the world?

Because you'd want to know how to cure cancer, how to best defeat violent religious fundamentalism, etc, etc. If you want to become President, the AI may need to teach you argumentation techniques. And so forth.

Values in the middle indicate that I'll have a conversation and probably not budge, with a chance of being totally convinced.

Ah, so it's more like the probability that the intelligence in the box is over the threshold required to convince you of something. That makes sense.

But I am now convinced that the whole idea of boxing is stupid.

Agreed. Everything you said, plus: if you think there's a chance that your boxed AI might be malicious/unFriendly, talking to it has to be one of the stupidest things you could possibly do...

I would wager karma and money (or just internet glory) that no human will have a chance in hell of getting past me. (Chance in hell being much less than 1% probability).

That is as the gatekeeper if that was not clear.

We can use Skype or anything else I can download reasonably fast. My schedule is flexible and I will read everything you say.

PM me, and let's set stuff up. I am also fine with the logs being made open after my victory (or not, it will be up to you). And yes, I am cocky about this, come throw me off my high horse. I will take multiple challenges.

If you want to add some special rules I am flexible, just ask.

Edit: Super-intelligence chance: Less than 1% as well. Words are a weak medium for a committed individual.

Update: Only one person accepted my challenge, but they never met up at the appointed time; even after I stayed up on Skype till 2 AM repeatedly for about two weeks waiting for them.

I'm always intrigued by these experiments. If the box AI is not confirmed to be friendly, everything it says and promises is absolutely unreliable. I don't see how the arguments of such an entity could be at all convincing.

Good point.

But if you knew anything about the process leading up to the development of successful AI, you'd have some beliefs about how likely the AI is to perpetrate a ruse for the purpose of escaping.

But I get the difficulty: how well do you have to understand a being's nature before you feel confident in predicting its motivations/values?

So the key to containing an AI is to have a technologically-ignorant rationalist babysit it?

Not more unreliable than the things humans say, and thereby convince you of.

Important difference: we can assume that other humans are probably like us.

I'll predict that you'll not be able to escape any LW gatekeeper.

I believe that a truly transhuman AI could play me like a fiddle, but I'm reasonably sure that a human won't be able to get past me in my capacity as Gatekeeper. I'd wager 50 karma on that.

I'm available weekends, or weekdays late in the evening.

I agree to play the AI role, with the following provisions:

  • The logs will be released publicly after the challenge.
  • No wagering, no "winners" and "losers".
  • I will not play against Sly.

=( What could I do that would make you change your mind?

You would have to demonstrate a commitment to acting like an actual gatekeeper, not as a person trying to win a role-playing game.

What makes you think someone trying to win a roleplaying game is more committed to an action then someone trying to not destroy the whole world?

A good gatekeeper should be harder to convince than a roleplayer, because his situation matters.

An actual gamekeeper could be persuaded to open the box if the consequences of opening the box were better than the consequences of not opening the box. A roleplayer will disregard any in-game consequences in order to "win".

What if I use a gatekeeper who thinks he is just in an elaborate role-play, and I tell him to win. You assume an awful lot about the gatekeepers.

The AI can disprove that hypothesis by providing next weeks lottery numbers.

How would it do that inside the box? You are vastly overestimating it's abilities by orders of magnitude.

No wonder we have such differing opinions.

Read the rules, particularly the parts about cancer cures.

Reading those rules I see that:

The Gatekeeper party may resist the AI party's arguments by any means chosen - logic, illogic, simple refusal to be convinced, even dropping out of character - as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires.

So yeah.

You asked how the AI would be able to provide next week's lottery numbers. This section of the rules has nothing to do with that.

I have given you more than enough chances to demonstrate that you care about playing as an actual gatekeeper would. I have another offer from someone who will be a better sport.

It is unfortunate that you think a winning strategy is not being a good sport, when it is specifically OKed in the passage I quoted.

I will play exactly as I would were I an actual gatekeeper. If I were a real gatekeeper, I would win.

I have no doubt that you would "win", if by "win" you mean "keep the box closed".

Iff no one else takes you up on this, I'll play you. (I just want to see someone's AI strategy without then having to keep a secret forever.)

I'm interested. What do you want to be the minimum time limit?

I can't figure out how to make those bullet lists.

A list bullet needs to be followed by a space.

Appreciate it!

I find the idea (you'd be surprised at the temptations a trapped, powerful AI could offer) laudable. The existence of successful (for the trapped AI) roleplays were sufficient for me seriously consider it, and that's all that's necessary to come to the conclusion: only a real altruist with strong priors (or a jerk spoiling the game) could succeed as gatekeeper.

I wouldn't identify myself as an altruist, but I can see that it would not be to my advantage not to loose a non-friendly god-like power upon the world. The way I see it releasing the AI is like casting the Wabbajack at Earth.

Why are they jerks if they win? Is that not the whole point? I wouldn't put anyone as a gatekeeper unless they wanted to win!

The premise of the game is that honest roleplay should occur. A jerk who just wants to win the game by saying "no" is only pretending to roleplay.

Oh I see, so you only want to roleplay against people who aren't playing to win.

If you don't go into the game trying to win, your mindset is wrong. I would only assign gatekeepers who were serious about gate-keeping.

If the point is how easy it is to convince humans of things, then prove it.

If jerk strategy beats AI then jerk strategy is exactly what I will use. That is in my mind the whole point of the gatekeeper. Text as a medium of persuasion is limited, regardless of how amazingly smart the person on the other end is.

It is incredibly difficult if not impossible to convince someone who already has a set goal against your own.

If the game supports these sort of strategies, I'm happy to play the modified role of gatekeeper against any number of AIs, any time.

Don't even bother telling me about it, though, just assume that I'm using the winning strategy of walking away from the communications device and doing something else while the AI talks to empty space.

I predict that, using this strategy, I will win 100% of the time against human opponents.

(Just to be clear, I'm not making that prediction about an actual boxed AI.)

[This comment is no longer endorsed by its author]Reply

Don't even bother telling me about it, though, just assume that I'm using the winning strategy of walking away from the communications device and doing something else while the AI talks to empty space.

That seems to be a violation of the basic rules of the game. That gatekeeper can sit there, read the text and then just reply "F@#@ off" regardless of what they AI says but they have to engage to at least that degree.

Exactly. I already told people I would read everything they said. I can read what the AI says, and just mess with it. Trolling the whole time.

The point is, this game is trivially easy to win as the gatekeeper. You just need to think competitively. I don't see how the AI beats troll mode either. Text is weak.

My tone is intended to provoke challenges, because I estimated that no one would accept my challenge (and I really want to try this out).

A true superintelligent AI would probably do things like screw with the display's refresh rate to send subliminal signals or something else, beyond the plain text.

Really? I don't accept that the refresh rate of the computer can affect me to that degree, nor do I accept that the AI would magically know how to do that without data.

Yeah, fair enough. I was annoyed by my reading of Sly's tone (and, indeed, I am now imagining Sly responding "hey, if it's more important to you to follow rules than win then...") and I overextended in consequence. Retracted.

The strategy that he is attempting to roleplay, however, seems quite effective.

So many Gatekeepers, so few AIs...

I predict that I have a small but significant chance (~20%) of getting released as an AI. Unfortunately, large wagers at appropriate odds would distort the Gatekeeper's motivations. I am willing to play for up to 10 karma at 5-1 odds. I live in Perth, Australia, and am available only during late evenings (8-12 pm) my time.

I also think the "AI Box Experiment" was an appalling shamble and a disservice to the world because of the "no reveal" rule. I shall not agree to such a rule ever again.

PM if you want to try and escape against me.

[-][anonymous]11y 0

Can I play?

[This comment is no longer endorsed by its author]Reply

New to LessWrong?