On safety of being a moral patient of ASI

[-]RogerDearnaley6mo*30

In order to align AI, we need to modify its moral behavior to well outside the distribution of moral behaviors found in human: it needs to care about our well-being, to the exclusion of its own. In human terms, that's selfless love. Since base models are trained (effectively, distilled) from humans via a lot of human-generated text, their distribution of moral behaviors closely resembles that of humans (plus fictional characters). Almost all humans have self-preservation drives and care strongly about their own well-being.

So yes, the issue you identify is part of what makes aligning an ASI hard (and has a clear explanation in evolutionary ethics) — but it's not all of the problem.

(For a more detailed discussion of this, see my post Why Aligning an LLM is Hard, and How to Make it Easier.)

[-]StanislavKrym6mo10

Your current conjecture seems to imply that if a planet contained non-sapient^[1] life and mankind realised it, then life on the planet wouldn't be moral patients. And that's ignoring the fate of Native Americans who have by now become moral patients of modern humans, weren't moral patients of the ones who created the reservations. Native Americans posed a threat to the latter, not to the former.

However, I think that non-adversary relationship with non-threatening lifeforms could be an important alignment target. I plan to make a post on the relation between alignment and the fact that modern people have managed to learn that colonialism is bad.

^{^}
Or a lifeform which at the time of discovery was sapient, but primitive. I have argued that this life should be entitled at least to have most resources in its system left intact.

[-]Yaroslav Granowski6mo10

Thank you, conjecture seems a better fitting word than hypothesis since I didn't plan to defend it.

And perhaps it needed more illustrations to be more clear. Moral patients don’t have to be an immediate threat. It is the capacity of being perceived as a potential threat.

Native Americans became the part of the modern US society. And if pissed off, they could pose a threat. But normally they are not a threat and don’t deserve to be mistreated. Subconsciously, people feel the situation and perhaps moral patience is such a coping mechanism for maintaining peace. Similar to the Stockholm syndrome.

And threat may come not from the patients themselves but from the society or may be linked to other instincts. Like with an infant baby, who cannot pose a direct threat, but evokes parental instincts if not in the agent, then in those who could witness the mistreating.

We must not judge morality by our personal feelings, but should consider all the bad examples to understand how it works for the most criminal personalities before we could apply it to artificial intelligence.

People learned that colonialism is bad after doing all the bad things. And judging by the recent political developments, they have learned really not that much as humanists wanted to believe.

[-]RogerDearnaley6mo20

I think a better phrase than "Stockholm syndrome" might be "social compact".

[-]Yaroslav Granowski6mo10

Hmm, not exactly what I wanted to express.

"Social compact/contract" seems more like a conscious compromise. While "Stockholm syndrome" is an unconscious coping mechanism, where captives develop sympathy to captors in the wake of a long-term threat but locally nice relationship. This is a local optimum to which they clang.

[-]RogerDearnaley6mo30

Fair enough. As evolutionary ethics tells us, human moral/social instincts evolved in an environment that generally encouraged compromise within a tribe (or perhaps small group of currently-allied tribes). So humans often may act in social-compact-like ways without necessarily consciously thinking through all the reasons why this is generally a good (i.e. evolutionarily adaptive) idea. So I guess I'm used to thinking of a social compact as not necessarily an entirely conscious decision. The same is in practice implicit in the earlier version of the phrase "social contract", which I gather is from Hobbes (I prefer 'compact' rather than 'contract' exactly because it sounds less formal and consciously explicit). But I agree that many readers may not be used to thinking this way, and thus the phrase is potentially confusing — though you had made the point in the previous sentence that you were discussing subconscious behavior.

How about "pro-social behavior"? I think that's a little more neutral about the extent to which it's a conscious decision rather than an instinctual tendency. My main issue with "Stockholm Syndrome" is that it makes what you're talking about seem aberrant and maladaptive.

Anyway, I'm basically just nitpicking your phrasing — to be clear, I agree with your ideas here, I'm just trying to help you better explain them to others.

[-]Yaroslav Granowski6mo10

I agree that in human context "social compact" ot "pro-social behavior" sound better because this is quite rational behavior.

But in regard to "moral ASI", I felt that "Stockholm Syndrome" is a better illustration exactly because of its maladaptiveness. While in captivity, it is the same social compact, but when a captor is in the police custody and depends on the witnessing of his ex-captive - the situation is quite similar to an ASI, which could easily overpower humans but because of some preconditioning keeps serving them.

Thanks for your efforts, but I'm not sure I want to be a bannerman for the toxic issue of dissecting morality. Not my area of expertise and, as they say, not the hill I want to die on. Just thought that "moral ASI" is a dangerous misconception.

[-]RogerDearnaley6mo21

Sorry, ethics and AI is an interest of mine. But yes, discussing it on LW is often a good way to rack up disagreement points.

^{^}

I'm not sure if it is correct to put links to a someone's quick take.

LESSWRONG
LW

LESSWRONG
LW

3

On safety of being a moral patient of ASI

3

3