[ Question ]

References that treat human values as units of selection?


When I read AI alignment literature, the author usually seems to be assuming something like: “Human values are fixed, they’re just hard to write down. But we should build intelligent agents that adhere to them.“ Or occasionally: “Human values are messy, mutable things, but there is some (meta-)ethics that clarifies what it is we all want and what kind of intelligent agents we should build.”

This makes it hard for me to engage with the AI alignment discussion, because the assumption I bring to it is: “Values are units of selection, like zebra stripes or a belief in vaccinating your children. You can’t talk sensibly about what values are right, or what we ‘should’ build into intelligent agents. You can only talk about what values win (i.e. persist over more time and space in the future).”

I’ve tried to find references whose authors address this assumption, or write about human values from this relativist/evolutionary perspective, but I’ve come up short. Can anyone point me toward some? Or disabuse me of my assumption?

New Answer
Ask Related Question
New Comment

1 Answers

Carl Shulman's Spreading happiness to the stars seems little harder than just spreading addresses this topic. I also wrote a comment in 2009 that talks about this, although I'm not sure I fully endorse that comment now.