[ Question ]

References that treat human values as units of selection?

by Milan Cvitkovic1 min read9th Jun 20199 comments



When I read AI alignment literature, the author usually seems to be assuming something like: “Human values are fixed, they’re just hard to write down. But we should build intelligent agents that adhere to them.“ Or occasionally: “Human values are messy, mutable things, but there is some (meta-)ethics that clarifies what it is we all want and what kind of intelligent agents we should build.”

This makes it hard for me to engage with the AI alignment discussion, because the assumption I bring to it is: “Values are units of selection, like zebra stripes or a belief in vaccinating your children. You can’t talk sensibly about what values are right, or what we ‘should’ build into intelligent agents. You can only talk about what values win (i.e. persist over more time and space in the future).”

I’ve tried to find references whose authors address this assumption, or write about human values from this relativist/evolutionary perspective, but I’ve come up short. Can anyone point me toward some? Or disabuse me of my assumption?

New Answer
Ask Related Question
New Comment

1 Answers

Carl Shulman's Spreading happiness to the stars seems little harder than just spreading addresses this topic. I also wrote a comment in 2009 that talks about this, although I'm not sure I fully endorse that comment now.

Thank you! Carl Shulman's post still seems written from the some-values-are-just-better-than-others perspective that's troubling me, but your 2009 comment is very relevant. (Despite future-you having issues with it.)

The question "which values, if any, we should try to preserve" you wrote in that comment is, I think, the crux of my issue. I'm having trouble thinking about it, and questions like it, given my "you can't talk about what's right, only what wins" assumption. I can (try to) think about whether a pa... (read more)

2jefallbright2yWhat could it possibly mean, to say that something is "better", except from some perspective, within some context? What could it possibly mean to say that something is "right" (in principle), other than from some larger perspective, within a larger context? It's always perspectival--the illusion of objectivity arises because you share your values, fine-grained and deeply hierarchical, due to your place as a twig on a branch on a tree rooted in the mists of a common physics and with a common evolutionary trajectory. Of course you share values with your neighboring twigs, and you can find moral agreement by traversing the tree of evolutionarily instilled values back toward the trunk to find a branch that supports you and your neighboring agents, but from what god-like point of view could they ever be "objective"?
6 comments, sorted by Highlighting new comments since Today at 12:54 PM
You can’t talk sensibly about what values are right, or what we ‘should’ build into intelligent agents.

I agree that in our usual use of the word, it doesn't make sense to talk about what (terminal) values are right.

But you agree that (within a certain level of abstraction and implied context) you can talk as if you should take certain actions? Like "you should try this dessert" is a sensible English sentence. So what about actions that impact intelligent agents?

Like, suppose there was a pill you could take that would make you want to kill your family. Should you take it? No, probably not. But now we've just expressed a preference about the values of an intelligent agent (yourself).

Modifying yourself to want bad things is wrong in the same sense that the bad things are wrong in the first place: they are wrong with respect to your current values, which are a thing we model you as having within a certain level of abstraction.

Thanks, Charlie!

Modifying yourself to want bad things is wrong in the same sense that the bad things are wrong in the first place...

I definitely agree with this, and have even written about it previously. Maybe my problem is that I feel like "find the best values to pursue" is itself a human value, and then the right-or-wrong-value question becomes the what-values-win question.

I argued repeatedly and at length on the Extropian and Transhumanist discussion lists from 2004 to about 2010 for a metaethics based on the idea that actions assessed as increasingly "moral" (right in principle) are those actions assessed as promoting (1) values, hierarchical and fine-grained, increasingly coherent over an increasing context of meaning-making, via (2) instrumental methods, increasingly effective in principle, over increasing scope of consequences. Lather, rinse, repeat, with consequences tending to select for values, and methods for their promotion, that "work" (meaning "persist")

The instrumental methods half of this--the growth in scope of our model of of science and technology--is generally well-accepted.

The values half of this--the growth in context of our model of meaning-making--not so much, for a handful of understandable reasons of our developmental history.

Together, these orthogonal aspects tend to support and reinforce meaningful growth.

The Arrow of Morality points in no particular direction but outward--with increasing coherence over increasing context--and suggests we would do well to act intentionally to promote growth of our models in these two orthogonal dimensions.

Conceptual roadblocks include difficulties with evolutionary dynamics (including multi-level), synergistic (anti-entropic) expansion in both dimensions mentioned above (from the point of view of any agent), agency as inherently perspectival (subjective, but not arbitrary), and unwillingness to accept an ever-broadening indentification of "self".

Due (in my opinion) to these difficult and culturally pervasive conceptual roadblocks, I never gained much traction in my attempts to convey and test this thinking, and I eventually decided to stop beating a horse that was not so much dead, as had never really lived. I believe we'll make progress on this, two steps forward, one step back, to the extent we live and learn and become more ready. [Which is by no means guaranteed...]

I have not found any substantial literature supporting this thinking, but I can point you in the direction of bits and pieces, and we might discuss further (work and family permitting) if you would like to contact me privately.

  • Jef

Oh, and a short, possibly more direct response:

Values (within context) lead to preferences; preferences (within context) lead to actions; and actions (within context) lead to consequences.

Lather, rinse, repeat, updating your models of what matters and what works as you go.

Until I saw this discussion I don't think I ever (consciously) thought of values (outside the value=price economic view) in the way this discussion seems to cast the light for me. Both the idea of values as units of choice (like goods) and the thought on the fragility of values (systems I think was implied) seem to put me on that line.

When we think about economic crisis (boo-bust cycles, depression events and various fluctuations in patterns of global trade) I wonder if the same is true for the value systems. Both are built up from some unit level type decisions. The units stand in various types of relationships (tight-loose, complementary-substitutes, near-term - far-term) in a relative sense. When anything changes there are ripple effects. Some will be more isolated, some will cascade.

Similarly, in the economic sphere, no one really choose the overall pattern of the economy or structure of production, it's largely an outcome. The approach of treating values as units and considering the fragility of the future based on any given set of values in place seems very similar.

That would suggest that an AI with a different set of values (and prioritization/valuation of the values in the set) will potentially have large impact. But it also suggests that it might not be able to drive a future it wants over that of what humans want. That perhaps is hopeful.

I think you can talk about what values* are consistent.

*You used the word values to refer to sets of values.