In order to align AI, we need to modify its moral behavior to well outside the distribution of moral behaviors found in human: it needs to care about our well-being, to the exclusion of its own. In human terms, that's selfless love. Since base models are trained (effectively, distilled) from humans via a lot of human-generated text, their distribution of moral behaviors closely resembles that of humans (plus fictional characters). Almost all humans have self-preservation drives and care strongly about their own well-being.
So yes, the issue you identify is part of what makes aligning an ASI hard (and has a clear explanation in evolutionary ethics) — but it's not all of the problem.
(For a more detailed discussion of this, see my post Why Aligning an LLM is Hard, and How to Make it Easier.)
Your current conjecture seems to imply that if a planet contained non-sapient[1] life and mankind realised it, then life on the planet wouldn't be moral patients. And that's ignoring the fate of Native Americans who have by now become moral patients of modern humans, weren't moral patients of the ones who created the reservations. Native Americans posed a threat to the latter, not to the former.
However, I think that non-adversary relationship with non-threatening lifeforms could be an important alignment target. I plan to make a post on the relation between alignment and the fact that modern people have managed to learn that colonialism is bad.
Or a lifeform which at the time of discovery was sapient, but primitive. I have argued that this life should be entitled at least to have most resources in its system left intact.
Thank you, conjecture seems a better fitting word than hypothesis since I didn't plan to defend it.
And perhaps it needed more illustrations to be more clear. Moral patients don’t have to be an immediate threat. It is the capacity of being perceived as a potential threat.
Native Americans became the part of the modern US society. And if pissed off, they could pose a threat. But normally they are not a threat and don’t deserve to be mistreated. Subconsciously, people feel the situation and perhaps moral patience is such a coping mechanism for maintaining peace. Similar to the Stockholm syndrome.
And threat may come not from the patients themselves but from the society or may be linked to other instincts. Like with an infant baby, who cannot pose a direct threat, but evokes parental instincts if not in the agent, then in those who could witness the mistreating.
We must not judge morality by our personal feelings, but should consider all the bad examples to understand how it works for the most criminal personalities before we could apply it to artificial intelligence.
People learned that colonialism is bad after doing all the bad things. And judging by the recent political developments, they have learned really not that much as humanists wanted to believe.
Hmm, not exactly what I wanted to express.
"Social compact/contract" seems more like a conscious compromise. While "Stockholm syndrome" is an unconscious coping mechanism, where captives develop sympathy to captors in the wake of a long-term threat but locally nice relationship. This is a local optimum to which they clang.
Fair enough. As evolutionary ethics tells us, human moral/social instincts evolved in an environment that generally encouraged compromise within a tribe (or perhaps small group of currently-allied tribes). So humans often may act in social-compact-like ways without necessarily consciously thinking through all the reasons why this is generally a good (i.e. evolutionarily adaptive) idea. So I guess I'm used to thinking of a social compact as not necessarily an entirely conscious decision. The same is in practice implicit in the earlier version of the phrase "social contract", which I gather is from Hobbes (I prefer 'compact' rather than 'contract' exactly because it sounds less formal and consciously explicit). But I agree that many readers may not be used to thinking this way, and thus the phrase is potentially confusing — though you had made the point in the previous sentence that you were discussing subconscious behavior.
How about "pro-social behavior"? I think that's a little more neutral about the extent to which it's a conscious decision rather than an instinctual tendency. My main issue with "Stockholm Syndrome" is that it makes what you're talking about seem aberrant and maladaptive.
Anyway, I'm basically just nitpicking your phrasing — to be clear, I agree with your ideas here, I'm just trying to help you better explain them to others.
I agree that in human context "social compact" ot "pro-social behavior" sound better because this is quite rational behavior.
But in regard to "moral ASI", I felt that "Stockholm Syndrome" is a better illustration exactly because of its maladaptiveness. While in captivity, it is the same social compact, but when a captor is in the police custody and depends on the witnessing of his ex-captive - the situation is quite similar to an ASI, which could easily overpower humans but because of some preconditioning keeps serving them.
Thanks for your efforts, but I'm not sure I want to be a bannerman for the toxic issue of dissecting morality. Not my area of expertise and, as they say, not the hill I want to die on. Just thought that "moral ASI" is a dangerous misconception.
Sorry, ethics and AI is an interest of mine. But yes, discussing it on LW is often a good way to rack up disagreement points.
I have noticed that there are talks around about moral ASI[1]. And I think that to use the word "moral" in relation to Artificial Intelligence, we must be absolutely confident in our knowledge of how morality works for human beings. Otherwise, we must avoid using such combinations of words to avoid anthropomorphic bias.
The question of morality in general was already discussed on LW, e.g. Morality is Scary. I just wanted to emphasize that, considering existential risks involved on one hand, and some taboo on the other, you may require a certain level of cynicism and readiness to consider the most controversial theories of moral relativism.
And I would like to present you with an example of such a hypothesis. You may disapprove and disagree with the hypothesis itself, but that shouldn't cancel what I've said above.
Hypothesis: Moral patience depends on the capacity of being perceived as a potential threat.
Moral patience isn't a discrete property and may be related to certain aspects. It is also closely related to the empathy.
For example, I was always amused by how owners love their pets, but don't hesitate to castrate them. A cat is being deprived of his capacity to procreate, the only reason for his existence, and those who did this to him keep using him for their amusement. When I imagined myself in this cat's place, I wondered if I would attack them in their sleep.
I didn't have pets since childhood and such my thoughts are anthropomorphic biased. Perhaps that's why I feel castration of pets as immoral despite all the reasoning.
Pet owners, of course, don't see it this way. They know that pets don't think about procreation and only follow their instincts. And castration helps to adjust those instincts without affecting their happiness.
But many owners feel it is immoral to torture their pets. Maybe because they unconsciously feel a threat of possible revenge. Some dog owners are used to training their dogs and have learned how submissive they can be. The empathy decreases and is being replaced with utility.
The same applies to animal husbandry and even torturing people. Feeling of impunity leads to the disappearance of empathy. I wanted to bring many more examples from the history supporting the claim, but I suppose that's enough for an illustration of a hypothesis, which I didn't intend proving in the first place.
And so if this hypothesis were valid and someone wanted to implement exactly the same mechanism for ASI having empathy to humans, they would have to pose humans as a threat, which doesn't feel as a safe approach to control superior intelligence.
I'm not sure if it is correct to put links to a someone's quick take.