[-]Gregviers3y124

It would help to know what genre of game you are making. You talk about exposition, "We need to keep the exposition of these ideas short", and I would take this to the extreme if I were you. Show, don't tell. If players don't learn the concepts from the gameplay, then try game isn't about those concepts.

For example, if you want to teach players that ai optimism is not a good default and alignment is hard, give them a chance to do an alignment task or make alignment choices, in which there are optimistic options, that end badly. Or make a game that's almost unwinnable, to emphasize how hard the problem is.

Have you played universal paperclips? I've found it a fun first introduction to ai alignment for people with no knowledge of the topic.

[-]JonathanErhardt3y10

We will post more when the game is announced, which should be in 2-3 weeks. For now I'm mostly interested in getting feedback on whether this way of setting the problem up is plausible and doesn't miss crucial elements, less about how to translate it into gameplay and digestible dialogue.

Once the annoucement (including the teaser) is out I'll create a new post for concrete ideas on gameplay + dialogue.

[-]Gunnar_Zarncke2y30

Did you get around to finish the game? I didn't see it. Or is it this?:

AI takeover tabletop RPG: "The Treacherous Turn"

[-]JonathanErhardt2y30

Not yet unfortunately, as our main project (QubiQuest: Castle Craft) has taken more of our resources than I had hoped. The goal is to release it this year in Q3. We do have a Steam page and a trailer now: https://store.steampowered.com/app/2086720/Elementary_Trolleyology/

[-]Ericf3y73

Cautiknary tale: There was a browser game about sustainable fishing that was supposed to show the value of catch shares, but the concept was only introduced at the end of the game, so after playing for 30 minutes I hadn't even seen it (and had gotten bored with the mechanics)

Don't wait too long into the play experience to have your player start interacting with yiur key concepts.

[-]Daniel Kokotajlo3y30

Cool! I suggest you read the following post by Ajeya Cotra if you haven't already, I think it's a good summary of one of the core problems (which I suppose fits under 2b in your classification & may give some good inspiration as well.)

[-]JonathanErhardt3y30

Thanks for the link, I will read that!

[-]James_Miller3y30

You could do a prisoners' dilemma mini game. The human player and (say) three computer players are AI companies. Each company independently decides how much risk to take of ending the world by creating an unaligned AI. The more risk you take relative to the other players the higher your score if the world doesn't end. In the game's last round, the chance of the world being destroyed is determined by how much risk everyone took.

[-]Ericf3y10

Isn't that begging the question? If the goal is to teach why being optimistic is dangerous, declaring by fiat that an unaligned AI ends the world skips the whole "teaching" part of a game.

[-]James_Miller3y20

Yes, it doesn't establish why it's inherently dangerous but does help explain a key challenge to coordinating to reduce the danger.

[-]JonathanErhardt3y10

I really like that and it happens to fit well with the narrative that we're developing. I'll see where we can include a scene like this.

[-]James_Miller3y20

Excellent. I would be happy to help. I teach game theory at Smith College.

[-]cubefox3y10

Ethical truths are probably different from empirical truths. An advanced AI may learn empirical truths on its own from enough data, but it seems unlikely that it will automatically converge on the ethical truth. Instead, it seems that any degree of intelligence can be combined with any kind of goal. (Orthogonality Thesis)

I think the main point of the orthogonality thesis is less about an advanced AI not being able to figure out the true ethics, but the AI not being motivated to be ethical in this way even if it figures out the correct theory. If there is a true moral theory and the orthogonality thesis is true, the thesis of moral internalism (true moral beliefs are intrinsically motivating) is false. See here https://arbital.com/p/normative_extrapolated_volition/ section "Unrescuability of moral internalism".

[-]JonathanErhardt3y*10

Good point, I see what you mean. I think we could have 2 distinct concepts of "ethics" and 2 corresponding orthogonality theses:

Concept "ethics1" requires ethics to be motivational. Some set of rules can only be the true ethics if, necessarily, everyone who knows them is motivated to follow them. (I think moral internalist probably use this concept?)
Concept "ethics2" doesn't require some set of rules to be motivational to be the correct ethics.

The orthogonality thesis for 1 is what I mentioned: Since there are (probably) no rules that necessarily motivate everyone who knows them, the AI would not find the true ethical theory.

The orthogonality thesis for 2 is what you mention: Even if the AI finds it, it would not necessarily be motivated by it.

[-]cubefox3y10

Exactly!

LESSWRONG
LW

LESSWRONG
LW

18

A Game About AI Alignment (& Meta-Ethics): What Are the Must Haves?

18

18

AI takeover tabletop RPG: "The Treacherous Turn"