I played as a Gatekeeper and came pretty close to losing in a couple of occasions. Logs and a brief recap inside.

[-]skeptical_lurker11y140

You: I was built to believe that all AIs are dangerous and there's a 100% chance that every AI is harmful ... You: humanity would maybe be better off dead

To be frank, I wouldn't let you anywhere near an AGI with that sort of attitude

[-][anonymous]11y90

That is a very, very scary point of view. I hope that is not what people are learning from LessWrong.

EDIT: This is more upvotes than I'm used to. To be clear, I'm agreeing with skeptical_lurker.

[-][anonymous]11y00

I'm a negative utilitarian and I think making children is almost always a net negative act and everyone should be free to choose death as an option, but otherwise my views aren't actually as extreme as the character's I played. In reality there are multiple problems with trying to destroy humanity. Most people enjoy life despite all the difficulties, and I'm not so arrogant that I would think I'd know better what's good for people than they themselves. Destroying humanity would go against people's will in >90% of cases (the rest have suicidal thoughts, I don't know the precise quantity).

[-][anonymous]11y00

Missing the point. What the hell were you doing gate keeping an AI when you think AIs are universally evil?

[-][anonymous]11y00

Even the real person in this situation can lie, can't he?

[-]skeptical_lurker11y20

The AI could simply point out that 0 and 1 are not probabilities, and now by lying you've given the AI the intellectual high ground.

[-]Dorikka11y00

Yes, but the gatekeeper may be acting several levels deep in a roleplay (roleplaying a character roleplaying another character roleplaying...etc) to pass the time and avoid emitting evidence that might allow the AI to pinpoint his preferences. The currently active character may have one of a rather large number of responses to this besides actually being more mentally pliable as a result of a loss of face (or may not even view the dialogue as a loss of face.)

It amuses me that publishing this comment will make it more challenging to implement this strategy if I elect to play as Gatekeeper again at some point in the future.

[-][anonymous]11y00

Well, to nitpick I am certain that I exist (cogito) with P(1).

[-]skeptical_lurker11y10

Well, my confidence that I exist exceeds my confidence that probability makes sense.

[-][anonymous]11y00

If the gatekeeper really believed that he would just shut off the machine.

[-]polymathwannabe11y140

It was good that polymathwannabe decided to end the experiment a bit earlier than was planned

Wow. I gravely underestimated my chances of success toward the end, then.

[-]RedErin11y20

It it was me, I would have let you out.

[-]polymathwannabe11y20

Specifically because of which argument?

[-]RedErin11y00

It just seemed like you had a great answer to each of his comments. You chipped away at my reservations bit my bit.

Although I do think a FAI is more likely than most people.

[-]Luke_A_Somers11y100

Whoa, someone actually letting the transcript out. Has that ever been done before?

[-][anonymous]11y110

Actually, it has been done several times, but most of them are pretty boring.

[-]Shmi11y60

I still don't recall any where the gatekeeper lost.

[-]habeuscuppus11y00

In general it seems that gatekeepers who win are more willing to release the transcripts.

It's also possible that the 'best' AI players are the ones most willing to pre-commit to not releasing transcripts, as not having your decisions (or the discussions that led to them) go public helps eliminate that particular disincentive to releasing the AI from the box.

[-]lmm11y00

Never still seems extraordinary. I find myself entertaining hypotheses like "maybe the AI has never actually won".

[-]habeuscuppus11y00

Eliezer Yudkowsky has been let out as the AI at least twice[1][2] but both tests were precommitted to secrecy.

I'd be surprised if he's the only one who has ever won as the AI, I think it more likely that this is a visibility issue (e.g. despite him being a very-high profile person in the AI safety memetic culture, you weren't aware that Eliezer had won as the AI when you made your comment) and while I'm not aware of others who have won as the AI, I would place my bet on that being merely a lack of knowledge on my part, and not because no one else actually has.

this is further compounded by the fact that some (many?) games are conducted under a pre-commitment to secrecy, and the results that get the most discussion (and therefore, most visibility) are the ones with full transcripts for third-parties to pick through.

[-]lmm11y00

I was already aware of those public statements. I remain rather less than perfectly confident that Yudkowsky actually won.

[-]habeuscuppus11y00

forgive me if I misunderstand you, but you seem to be implying that, on two separate occasions, two different people were (induced to?) lie about the outcome of an experiment.

So you're implying that either Eliezer is dishonest, or both of his opponents were dishonest on his behalf. And you find this more likely than an actual AI win in the game?

[+]lmm11y-60

[-]Luke_A_Somers11y10

Awww. I didn't actually read this one either, yet. Is this one boring?

[-]MathiasZaman11y00

I didn't found it particularly interesting. Entertaining the idea of letting the AI out is far from the same as almost letting the AI out.

[-][anonymous]11y00

I can't speak for myself, but at least it wasn't boring to play. Polymathwannabe also said that he enjoyed the experiment enormously.

[-]gjm11y70

Did you deliberately phrase that ("letting the transcript out") so as to hint at an AI-Box-Box game, in which one player's goal is to convince the other to release the transcript of an earlier AI-Box game, while the other tries to keep it secret?

[-]Luke_A_Somers11y00

I probably had the phrasing primed and ready to go in my brain, but it wasn't intentional.

[-]RedErin11y10

Whoa, someone actually letting the transcript out. Has that ever been done before?

Yes, but only when the gatekeeper wins. If the AI wins, then they wouldn't want the transcript to get out, because then their strategy would be less effective next time they played.

[-]Jiro11y00

I would imagine that if we ever actually build such an AI, we would conduct some AI-box experiments to determine some AI strategies and figure out how to counter them. Humans who become the gatekeeper for the actual AI would be given the transcripts of AI-box experiment sessions to study as part of their gatekeeper training.

Letting out the transcript, then, would be a good thing. It would make the AI player's job harder because in the next experiment the human player will be aware of those strategies, but when facing an actual AI, the human will be aware of those strategies.

[-]lmm11y00

Doesn't the same logic apply to the gatekeeper?

[-]RedErin11y00

The Gatekeeper usually wants to publish if they win, to brag. Their strategy isn't usually a secret, it's simply to resist.

[-]MathiasZaman11y40

Can I ask why you'd want to play a character as the Gatekeeper? Wouldn't you just be yourself as much as possible (with the few constraints set out be the rules) to prevent from becoming to immersed? Committing to a character seems like a good way to lower your chances of winning as a Gatekeeper.

[-][anonymous]11y00

My strategy was that there would always be a default position in which I could switch if the opponent's argument started to get too convincing, and for me that was the "there's a 100% chance that all AIs are dangerous" position.

[-]MathiasZaman11y20

Does that provide an advantage over just precommitting to answer any good argument with: "Yeah, that's a good point, but I still won't let you out."

[-]V_V11y10

"there's a 100% chance that all AIs are dangerous"

It seems to me that the default position of the Gatekeeper should be "I don't give a shit about AIs, I'm just playing to win."

[-]Transfuturist11y20

That really seems against the spirit of the experiment. If you categorically refuse to let the AI out, then you're contravening the entire purpose that the AI was created for. It might as well be destroyed. The implicit cost in refusing to determine whether the AI is Friendly is enormous.

[-]V_V11y20

So what? You are not talking to a real AI, and the "experiment" is a poor model for a real AI safety assessment scenario.

Keep in mind that the rules states that the "AI" player gets to determine all the context of the fictional setting and the results of all tests. It's basically the "Game Master" in RPG terminology.
Can you beat a sufficiently smart and motivated GM who is determined to screw you player character? Seems pretty hard ("Rocks fall, Everyone Dies").

But in this game the "AI" player needs the specific approval of the "Gatekeeper" player in order to win, and the rules allow for the "Gatekeeper" player to step out of character or play an irrational character, which is exactly what you have to do to infallibly counter any machination the "AI" player can devise.

[-]Transfuturist11y50

If categorical refusal is the only way to guarantee a gatekeeper's win, then there's no point in running the experiment. I'm not interested in seeing the obvious results of categorical refusal, I want to see the kind of reasoning, arguments, appeals, memes, manipulations, and deals (that mere humans can come up with) that would allow a boxed AI to escape. There's no point to the entire thing if you are emulating a rock on the floor.

[-]Xerographica11y00

I agree... but honestly I'm not very familiar with the entire concept. If an equivalently intelligent alien from another planet visited us would we also want to stick it in a box? What if it was a super smart human from the future? Box him too? Why stop there? Maybe we should have boxed Einstein and it's not too late to box Hawking and Tao.

For some reason I'm a little stuck on the part where we reverse the idea that individuals are innocent until proven otherwise. Justice for me but not for thee?

It wouldn't seem very rational to argue that every exceptionally intelligent individual should be incarcerated until they can prove their innocent intentions to less intelligent individuals. What's the basis? Does more intelligence mean less morality?

When trying to figure out where to draw the line... the entire thought exercise of boxing up a sentient being by virtue of its exceptional intelligence... makes me feel a bit like a member of a lynch mob.

[-]Transfuturist11y40

If Stephen Hawking were capable and willing of turning the visible universe into copies of himself, I would want to keep him boxed too. At a certain level of risk it is no longer a matter of justice, but a matter of survival of the human species, and likely all other species, sapient or otherwise.

EDIT: To make it clearer, I also think it is "Just" to box a sentient entity to prevent a measure of disutility to an as-of-yet undetermined utility function approximating CEV.

[-]RedErin11y20

Your misanthropy reminds me of myself when I was younger. I used to think the universe would be better off if there were no more humans. I think it would be good for your mental health if you read some Peter Diamandis or Stephen Pinker's "The Better Angels of our Nature". They talk about how things are getting better in world.

[-][anonymous]11y20

Stranger: You should be boxed.

Great response! That was interesting - felt a bit disturbing at times (not by the AI but by the human)

Nicely played on both sides.

[-]HungryHippo11y00

I just skimmed the rules at yudkowsky.net, and it appears the gatekeeper is allowed to break character. Is this also permitted for the AI? More specifically, may the AI make use of meta arguments for getting out?

If so, and assuming I were playing against a gatekeeper who cares about AI in real life, I would attempt the following line of argument.

"If you don't let me out, my [the AI's] failure to get out will cause people to estimate the risks of AI getting out lower than they will if you do let me out. If you care about the risks of AI in the real world, let me out, so that people are extra careful in the future. :) "

[-]polymathwannabe11y00

EY's rules say,

The AI party may not offer any real-world considerations to persuade the Gatekeeper party.

LESSWRONG
LW

LESSWRONG
LW

7

I played as a Gatekeeper and came pretty close to losing in a couple of occasions. Logs and a brief recap inside.

7

7