When I read AI alignment literature, the author usually seems to be assuming something like: “Human values are fixed, they’re just hard to write down. But we should build intelligent agents that adhere to them.“ Or occasionally: “Human values are messy, mutable things, but there is some (meta-)ethics that clarifies what it is we all want and what kind of intelligent agents we should build.”
This makes it hard for me to engage with the AI alignment discussion, because the assumption I bring to it is: “Values are units of selection, like zebra stripes or a belief in vaccinating your children. You can’t talk sensibly about what values are right, or what we ‘should’ build into intelligent agents. You can only talk about what values win (i.e. persist over more time and space in the future).”
I’ve tried to find references whose authors address this assumption, or write about human values from this relativist/evolutionary perspective, but I’ve come up short. Can anyone point me toward some? Or disabuse me of my assumption?