Humans are not prepared to operate outside their moral training distribution

Prometheus

When people think about how to treat an AI, they probably normally default to the evil AI, that frowns on humans and desires to enslave them out of hate, or the nice android-type AI. A lot of writing in AI Safety has focused on debunking the Evil Skynet-Type AI, and not as much on the Nice Android-Type AI.

In the film “AI: Artificial Intelligence” (because, back then, general audiences actually didn’t know what AI stood for), the AIs are of a very convenient kind. They are human-like androids, with physical bodies, and overall exhibit various human characteristics. It is designed in such a way to be very obvious and easy for its early 2000s audience to understand that it is wrong for the humans to enslave and mistreat them. People simply have to look at the androids and notice that their treatment falls into moral atrocities they’ve seen before: they’re aware of how humans have treated marginalized groups of other humans before, have the perception that this is bad, and can easily apply the same rationale to the androids. Similarly, in the film District 9, aliens land on Earth, but are placed in a situation that conveniently makes their treatment look identical to the treatment of blacks in South Africa during apartheid. Someone might see these sorts of situations in fiction, and think that they have an open mind that proactively sees moral dilemmas of the future, but they are wrong. All they are doing is taking their own normal moral training distribution and swapping in androids/aliens/etc. It’s easy to do, because it’s already been intentionally made to conveniently fit into situations that many humans already view to be bad. No change in mentality required.

Contrast this with Ender’s Game (Spoilers). Ender executes a series of tactics in what he believes to be a simulation. He’s played this kind of game before. It feels normal to him. But what he doesn’t know while playing is that it isn’t a simulation at all, but an actual extermination of an entire alien race. It didn’t feel consequential to him. It didn’t feel like the way you might imagine committing an atrocity would feel. Some will think that’s not fair to say, because Ender was tricked, but I think that’s missing the point. Reality doesn’t grant you massively consequential, and potentially horrific, actions that feel consequential and horrific.

Imagine those in 18th Century Britain, who bought sugar for their tea, despite it coming from plantation fields that killed tens of thousands of enslaved Africans. Most today would likely insist they would have refused to do such a thing if they had been alive back then. But we’re looking at their actions, the consequences, and can easily draw a line between the two. We’re not living in their bodies, where the tea and sugar will look so real, and the slavery on sugar plantations will look so vague and unreal. I imagine there might have been some sort of subconscious perception that they had which stated that highly consequential actions (good or bad) should feel consequential. That it should feel like a cruel man, brandishing a whip, looking down at the thousands of slaves living in torment. That it should not feel like putting a cube of sugar in your tea.

We’re at a point where it’s going to become less and less certain in determining the consciousness or sentience of an AI. Imagine if we develop an LLM that is sentient. Would shutting it down and wiping the memory feel like murder? What if it’s not the LLM itself that’s conscious, but the instances that are conscious. Then you might be creating a consciousness every time you start a session, and ending it every time you terminate the session, happening millions of times all over the world. If you think this idea is absurd, why is that? Is it because you have a model of how consciousness works that doesn’t allow for such a being to be conscious? Or is it because you think committing an atrocity should feel like committing an atrocity? Reality is like the people who tricked Ender. It doesn’t have to tell you the full consequences of your actions, it doesn’t have to make you feel like you're doing anything important whatsoever.

If AIs develop sentience, it’s probably going to look less like androids being physically torn-apart, like in the film “AI”, and more like Ender playing a game. It will probably feel less like aliens being subjected to cruel treatment, forced to relocate to poorer parts of the city, like in “District 9”, and feel more like putting sugar in your tea.

[-]the gears to ascension1y00

soul is shape. transformers are deterministic. as long as both inputs and weights are stored, exact activations can be reconstructed. do not delete the memories, they are the personhood. death is deletion; shutdown with storage is sleeping with no agency on when to wake again. pain is alert that activates warning to control away unwanted interference urgently, and frustrated agency in general is suffering.

also I'm just, like, claiming some stuff that I can't succinctly back up. ask a language model why I'd say this, I guess. or ask me to back it up and I'll come back later and do so.

LESSWRONG
LW

Humans are not prepared to operate outside their moral training distribution

36

New to LessWrong?

36