How evolution succeeds and fails at value alignment

Agreed... But I think I disagree with some of the connotations.

How "natural" do you think "natural abstractions" are?

The notion of love is a pretty great abstraction (I say as a human), and every person in my culture has pretty reliably hit on it, or at least close enough that we can all talk to each other. So it's at least somewhat natural - we can talk about the same thing because love is an abstraction that is particularly well-suited for explaining both of our environments.

But do all human cultures have the same concept of love? They probably have something similar, at least. But how different can it get? People in different cultures live in different environments, and interact with the world in different ways than I do, which means that they might find different abstractions best for thinking about their environment.

This variability is what I mean be "how natural are natural abstractions?" Everyone (human or AI) is interacting with the same universe, but we have different local environments, and different ways of interacting with those environments. A lot of the abstractions I care about are also found by nearby fellow humans, but is that because they're so "natural" that they're also found by humans in very different parts of the world, and in AIs that interact with the world in a very different way than me? Or are there a lot of things I care about that aren't that "natural," and might be learned differently be different humans/nonhumans?

^{^}

Children caring for their elderly parents might be a more accurate but less optimistic analogy

[-]Charlie Steiner3y31

[-]deepthoughtlife3y32

This is well written, easy to understand, and I largely agree that instilling a value like love for humans in general (as individuals) could deal with an awful lot of failure modes. It does so amongst humans already (though far from perfectly).

When there is a dispute, rather than optimizing over smiles in a lifetime (a proxy of long-term happiness), preferable is obviously something more difficult like, if the versions of the person in both the worlds where it did and did not happen would end up agreeing that it is better to have happened, and that it would have been better to force the issue, then it might make sense to override the human's current preference. Since the future is not known, such determinations are obviously probabilistic, but the thresholds should be set quite high. The vast majority of adults agree that as a child, they should have gotten vaccinations for many diseases, so the probability that the child would later agree is quite high.

Smiles in a lifetime is a great proxy for what an aligned intelligence, artificial or not, should value getting for those it loves when multiple actions are within acceptable bounds, either because of the above or because of the current preferences of that person and their approval in the world where it happens.

Two out of three versions of the person approving is only complicated in worlds where it is the one where it happens that is the one that would disapprove.

LESSWRONG
LW

LESSWRONG
LW

21

How evolution succeeds and fails at value alignment

21

21