Let Values Drift

They value EA because conditions in their lives caused them to value it, and if those conditions change so be it.

I find this kind of argument to be entirely uncompelling, and stemming from a fairly basic error regarding what kind of thing morality is. (I say "kind of argument", rather than simply "argument", because you could replace "EA" in the quoted sentence with just about anything else, and I would find the modified version no more compelling than the original.)

There are several problems with this kind of argument, so let's go over them in sequence. The first problem is that it's vacuous. "People only value X because something in their lives caused them to value X" is true for any X you could suggest (provided, of course, that the X in question is valued by at least some people), and thus it fails to distinguish between values that are worth preserving and values that are not. Unless your thesis is literally "no values are better than any other values, which makes it okay for our current values to be replaced by any other set of values" (and if that is your thesis, I think it's worth saying explicitly), the notion that we should be willing to relinquish any of our current values simply because something at some point caused us to acquire those values is an incredibly poor criterion to use.

That brings us to the second problem: even if your thesis really is that no values are better than any other, there would still remain the question of why the reader ought to accept such a thesis. You can't justify it via some external argument, because no such external argument exists: the question of "what values should we have?" is itself a fundamentally value-laden question, and value-laden questions can only be addressed by appealing to other values. With some effort on the part of the reader, the article could (vaguely) be interpreted as making such an appeal, but even if such an interpretation is used, much of the philosophical force of the argument is lost. The sense that the reader is compelled to accept that values cannot have greater meaning, because the author has triumphantly explained that "values" exist only as "after-the-fact reifications" of a particular agent's actions/judgments--and, after all, nobody cares about those--is lost.

And well it should be! I am inherently suspicious of any argument that claims people are "wrong" to value something, that does not itself rely upon other values. Often such arguments really consist of subtly hidden, value-laden assertions, which are strengthened by pretending to be something they are not (such as e.g. ironclad philosophical arguments). In the case of this article, the value-laden assertion is this:

If a particular value you hold was arrived at via a causal process that could plausibly have gone the other way (i.e. there's a counterfactual world in which you ended up with a different value as a result of this causal process), then you shouldn't consider that value worth preserving against value drift.

Note that this assertion is extremely value-laden! It contains a claim about what you should do, which the original article completed omits in favor of obfuscatory talk regarding the neurological processes behind "valuing". And since (as I discussed above) any value you hold is the result of a causal process that could plausibly have gone the other way, the assertion simplifies to the following:

You shouldn't consider any values worth preserving against value drift.

This is, again, a normative statement--and not a particularly compelling one at that. I don't find the idea of relinquishing all my values--of becoming an agent whose utility function is 0 everywhere--at all attractive, and absent an unimaginably strong argument in favor of such, I can't imagine such a prospect ever being attractive to me. The goal of metaethical theory is not to produce counterintuitive results (such as the assertion that nobody should value anything ever); the goal of metaethical theory is to produce a framework that explains and justifies the moral intuitions we already have. (This is what I meant when I said that the very first quoted statement stems from an error regarding what type of thing morality is: morality is not something you prove things about. Morality is simply the study of that which we choose to regard as good.)

The rest of the article is populated by sentence structures more typically found in continental philosophy works than on LessWrong, of which the most egregious is probably this one:

What instead drifts or changes are actions, although saying they drift or change is wrought because it supposes some stable viewpoint from which to observe the change, yet actions, via the preferences that cause us to choose any particular action over all others, are continuously dependent on the conditions in which they arise because what we sense (value, judge, assess) is conditional on the entire context in which we do the sensing.

As far as I can tell, the above sentence simply expresses the following sentiment:

It's hard to say what it means for an action to "change", since actions do not persist across time.

I don't know what it is about some writers that makes them think every sentence they produce must meet a 50-word quota at minimum, but in my opinion the clarity of their writing would be greatly improved if they would stop doing that. That the entire article is comprised of such constructions did nothing to improve my experience of reading it; in fact, it left me rather annoyed, which I think can probably be deduced from the tone of this comment.

Beyond that, I don't have much to say, except perhaps that I think the problem of the criterion (which you only tangentially bring up in this article, but which I've seen you repeatedly mention elsewhere, to the point where I'm starting to suspect it's some kind of weird hobbyhorse of yours) is nothing more than vague sophistry of the same kind many mainstream philosophers seem so fond of.

Final thoughts: I think it's a shame to be criticizing an article that obviously had a lot of effort put into it by the author, especially so harshly, but I've expressed my opinion of the article's contents as frankly as I can, and it's simply the case that my opinion of said contents is... not good. Ultimately, I think that (a) your central claim here is mistaken, and that (b) if I'm wrong about that, obviously it would be good if you convinced me otherwise, but that your current style of writing is not very conducive to that task.

[-]Gordon Seidoh Worley6y30

Beyond that, I don't have much to say, except perhaps that I think the problem of the criterion (which you only tangentially bring up in this article, but which I've seen you repeatedly mention elsewhere, to the point where I'm starting to suspect it's some kind of weird hobbyhorse of yours) is nothing more than vague sophistry of the same kind many mainstream philosophers seem so fond of.

Hmm, what can I say about the problem of the criterion? I don't really mean it to be a hobbyhorse, it's simply the problem at the heart of epistemology, and thus touches all things. It's a way we have of talking about the fundamental problem of how we know anything, and permeates all that is built on knowing (or if you like, within ontology/models), so we can only void it if we stop putting our thoughts to words, and not even totally then. We could just as easily talk about it in terms of the grounding problem since that's the more recent form in takes in analytic philosophy, but the problem of the criterion has historical precedence by some 2500 years.

[-]Gordon Seidoh Worley6y30

And well it should be! I am inherently suspicious of any argument that claims people are "wrong" to value something, that does not itself rely upon other values. Often such arguments really consist of subtly hidden, value-laden assertions, which are strengthened by pretending to be something they are not (such as e.g. ironclad philosophical arguments). In the case of this article, the value-laden assertion is this:

If a particular value you hold was arrived at via a causal process that could plausibly have gone the other way (i.e. there's a counterfactual world in which you ended up with a different value as a result of this causal process), then you shouldn't consider that value worth preserving against value drift.

Note that this assertion is extremely value-laden! It contains a claim about what you should do, which the original article completed omitted in favor of obfuscatory talk regarding the neurological processes behind "valuing". And since (as I discussed above) any value you hold is the result of a causal process that could plausibly have gone the other way, the assertion simplifies to the following:

You shouldn't consider any values worth preserving against value drift.

This is, again, a normative statement--and not a particularly compelling one at that. I don't find the idea of relinquishing all my values--of becoming an agent whose utility function is 0 everywhere--at all attractive, and absent an unimaginably strong argument in favor of such, I can't imagine such a prospect ever being attractive to me. The goal of metaethical theory is not to produce counterintuitive results (even if some people might think that's the goal); the goal of metaethical theory is to produce a framework that explains and justifies the moral intuitions we already have.

My aim is not to make any normative claims here. As you note, you had to infer one from what I wrote, and that's an inference you made, not me. That you can fit a pattern to the data doesn't mean the generative process suggested by the pattern is there. Of course, maybe I am myself mistaken about my own intent since I don't have perfect capacities of introspection, but I am not trying to claim anyone is doing anything wrong, only that opposing value drift seems strange to me, and when I look at why someone would want to avoid value drift, the most accurate and parsimonious theory I've been able to reason my way to is one of attachment to present values and fear of impermanence rather than any principled stance that some particular set of values is best and we would not want to move away from them.

It is however also true that I wouldn't consider any value worth preserving against drift, since I expect conditions to either cause a value to remain steady for its usefulness or not, or put another way the evidence determines the values, not the values the evidence. Presumably whatever values are settled on in a particular environment are adaptive to that environment, and it seems odd to me to try to attach to anything other than what is useful and adaptive in the conditions you find yourself in. You might read this as making a normative claim, but I see this as more a statement of what minds do: they adapt to survive or fail to adapt and parish. Normativity is an inference we make from there about the purpose of adaptation that leads to survival, but seems not baked in to the act of adaptation that leads to survival itself.

[-]Gordon Seidoh Worley6y20

Thanks for your detailed reply. There is a lot to address here, so I'll reply with multiple comments to address the parts.

There are several problems with this kind of argument, so let's go over them in sequence. The first problem is that it's vacuous. "People only value X because something in their lives caused them to value X" is true for any X you could suggest (provided, of course, that the X in question is valued by at least some people), and thus it fails to distinguish between values that are worth preserving and values that are not. Unless your thesis is literally "no values are better than any other values, which makes it okay for our current values to be replaced by literally any other set of values" (and if that is your thesis, I think it's worth saying explicitly), the notion that we should be willing to relinquish any of our current values simply because something at some point caused us to acquire those values is an incredibly poor criterion to use.

That brings us to the second problem: even if your thesis really were that no values were better than any other, there would still remain the question of why the reader ought to accept such a thesis. You can't justify it via some external argument, because no such external argument exists: the question of "what values should we have?" is itself a fundamentally value-laden question, and value-laden questions can only be addressed by an appeal to other values. With some effort on the part of the reader, the article could (vaguely) be interpreted as making such an appeal, but even if such an interpretation is used, much of the philosophical force of the argument is lost. The sense that the reader is compelled to accept that values cannot have greater meaning, because the author has triumphantly explained that "values" exist only as "after-the-fact reifications" of a particular agent's actions/judgments--and, after all, nobody cares about those--is lost.

How do you measure what values are better than others? Unless we have special access to moral facts that I don't see evidence that we have, we must do this via a process powered by our minds, and our minds decide what is better or worse based on how they values, thus I read your complaint as circular, i.e. you claim the problem is that I don't show how to value valuing via valuing. And I agree, this is maybe the fundamental conundrum of talking about values, and it creates the same kind of circular dependency problem we see in the problem of the criterion and, for example, naive set theory, and thus as you can imagine I think we suffering in our attempt to reason about value and epistemology we suffer the same sorts of problems we had in mathematics prior to the development of something at least a little better than naive set theory.

I think it's a big jump to say no value is better than another, though, because to say that is to give up being a mind altogether. Although there may be no outside ground, no absolute facts on which we can base what values are better than others, we can still each codependently make valuations. That is, I can't deny you or myself our values, even if they are not consistent with anything other than ourselves, and grounded only in our conditional valuations, and certainly it gives way to no sense of what values are "right", only those we collectively most prefer in any particular moment, and offer no reason to privilege the ones we consider best over the ones we could consider best in any other circumstance other than biased attachment to what is now vs. what is othertimes.

[-]TAG3y10

You can’t justify it via some external argument, because no such external argument exists: the question of “what values should we have?” is itself a fundamentally value-laden question, and value-laden questions can only be addressed by appealing to other values.

Presumably, that means no such argument is valid. There are plenty of arguments for the objectivity of value, and for specific values. And they are not circular so long as the reason you should accept them is different from the norm they are advising. Typically, rational normativity is assumed by any argument.

[-]dxu3y50

I have no idea why you think what you wrote is at all in contradiction to what I said. To wit, here is what I wrote:

You can’t justify it via some external argument, because no such external argument exists: the question of “what values should we have?” is itself a fundamentally value-laden question, and value-laden questions can only be addressed by appealing to other values.

And here is what you wrote:

And they are not circular so long as the reason you should accept them is different from the norm they are advising.

These are quite literally the same sentiment, except that you use the word "norm" instead of "value". The point remains the same, however: assertions about should-ness must appeal to some preexisting notion of should-ness.

And this is not a problem, if you happen to accept whatever notion of should-ness is being appealed to. In the case of this article, however, I am specifically rejecting the notion being appealed to (that being the idea that if your values were arrived at via some process that could counterfactually have gone differently, they are not values worth preserving)—since, as I stated, this would result in the set of values "worth preserving" being reduced to the empty set.

This, too, is a normative claim; and I'm fine with that. Or, to taboo "normative": I'm fine with claiming that values like not killing people and not causing unnecessary suffering are intrinsically motivating to creatures like humans, and endorsedly so. And if Gordon wants to argue:

You only believe that because of an evolutionary coincidence; plenty of species do not have the exact same emotions you do, including such emotions as empathy or sympathy!

then my response to that is to nod sagely and reply:

Thank you very much for telling me that; however, my present self does not care. I value what I value, and although it may perhaps be a fortunate coincidence from a third-person perspective that the universe's initial conditions were such that organisms encoding these specific values happened to evolve, rather than organisms encoding some other set of values, my values' response to this is not to agonize over all the other possible values that could have found themselves in their place—any more than the winner of a race agonizes over the counterfactual worlds in which someone else won instead of them.

Instead, my values say, "What a lucky opportunity, to have been instantiated here, in this universe, out of all the possible competitor values that could have been here instead; I will now proceed to take advantage of this wonderful opportunity to do things that are good according to me, and not do things that are not good—again according to me."

And to this, neither Gordon nor you can offer a valid counterargument, since that counterargument would, indeed, have to appeal to some other norm—one which, presumably, I would find just as uncompelling.

[-]Gordon Seidoh Worley6y00

The rest of the article is populated by sentence structures more typically found in continental philosophy works than on LessWrong, of which the most egregious is probably this one:

Thanks for expressing your frustration with the writing, but I'll mostly ignore that because it's a matter of style not substance. Writing this way is functional for my purposes: I mostly intend to express subtle, nuanced points, and more direct writing flattens away the important nuance that, when understood, avoids misunderstanding. But when the nuance is missed and taken to be the simple thing at the core, it's more easily misunderstood. The other option is to write shorter, more ambiguous words that convey the same meaning via what we might call "poetry". Neither solution is great. The alternative, though, is I think worse: very detailed mathematical models with so many moving parts they are hard to keep track of by our brains not designed for that task. All options are poor, so I go with the one I am most fluent in.

[-]Viliam6y90

From my perspective, this style means that although I feel pretty sure that you made a relatively simple mistake somewhere, I am unable to explain it, because the text is just too hard to work with.

I'd say this style works fine for some purposes, but "finding the truth" isn't one of them. (The same is probably true about the continental philosophy in general.)

My guess is that you use words "value drift" to mean many other things, such as "extrapolation of your values as you learn", "changes in priorities", etc.

[-]Gordon Seidoh Worley6y60

My guess is that you use words "value drift" to mean many other things, such as "extrapolation of your values as you learn", "changes in priorities", etc.

I'm not sure this is my fault; I think other people use value drift to mean many things because it's confused, i.e. people aren't quite sure what they mean when they talk about it. Much of my goal is, if nothing else, to expose that confusion, and if you feel I used it to mean many things that I think I succeeded in causing you to see the same thing I see, even if you attribute the confusion to me alone rather than to everyone talking about this topic. My rhetoric was perhaps not quite careful enough to help you tell apart any unique confusion on my part from confusion created by a confused topic.

[-]nshepperd6y150

When we talk of values as nouns, we are talking about the values that people have, express, find, embrace, and so on. For example, a person might say that altruism is one of their values. But what would it mean to “have” altruism as a value or for it to be one of one’s values? What is the thing possessed or of one in this case? Can you grab altruism and hold onto it, or find it in the mind cleanly separated from other thoughts?

Since this appears to be a crux of your whole (fallacious, in my opinion) argument, I'm going to start by just criticizing this point. This argument proves far too much. It proves that:

People don't have beliefs, memories or skills
Books don't have concepts
Objects don't have colors
Shapes don't have total internal angles

It seems as if you've rhetorically denied the existence of any abstract properties whatsoever, for the purpose of minimizing values as being "merely" habituations or patterns of action. But I don't see why anyone should actually accept that claim.

[-]Gordon Seidoh Worley6y20

Hmm, so there's a way in which I agree with you and a way I don't, and it depends on what you mean by "have" here. Without going back into addressing the possession metaphor, you're expressing a notion that I interpret as talking about existence, and I see a sharp line between existence or being and reality or the thing in itself. Existence is marked by differentiation, and for people to have beliefs, objects to have colors, etc. there must be some boundary at which these concepts are demarcated such that they are distinguishable from all else. In this sense we can say these things exist, but that it's dependent on our ability to observe and differentiate, to infer a pattern.

There is also a way in which some of these are more real than others. All of them arise from some physical process, but not all of them have neat correspondences. Color has maybe the cleanest, being an interaction of our senses with photons and directly correlates with behaviors of those photons. Concepts in books is maybe the flimsiest, since it's an interaction of a book (paper? words? what makes a book a book and not some other kind of stuff that conveys information to us?) and our model of how we model the world, and the hardest to find where it really comes from. This is not to say it is totally unreal, but it is to say there is no thing that looks like concepts in books if you do not also have a mind to provide that interpretation of phenomena.

Perhaps my presentation goes to far or is confusig, but the point is to be clear on what is ontological and what is ontic and not mistake the two, as I think it's happening in the usual model of values.

[-]nshepperd6y100

I don't see the usual commonsense understanding of "values" (or the understanding used in economics or ethics) as relying on values being ontologically fundamental in any way, though. But you've the fact that they're not to make a seemingly unjustified rhetorical leap to "values are just habituations or patterns of action", which just doesn't seem to be true.

Most importantly, because the "values" that people are concerned with then they talk about "value drift" are idealized values (ala. extrapolated volition), not instantaneous values or opinions or habituations.

For instance, philosophers such as EY consider that changing one's mind in response to a new moral argument is not value drift because it preserves one's idealized values, and that it is generally instrumentally positive because (if it brings one's instantaneous opinions closer to their idealized values) it makes one better at accomplishing their idealized values. So indeed, we should let the EAs "drift" in that sense.

On the other hand, getting hit with a cosmic ray which alters your brain, or getting hacked by a remote code execution exploit is value drift because it does not preserve one's idealized values (and is therefore bad, according to the usual decision theoretic argument, because it makes you worse at accomplishing them). And those are the kind of problems we worry about with AI.

[-]Gordon Seidoh Worley6y20

I don't see the usual commonsense understanding of "values" (or the understanding used in economics or ethics) as relying on values being ontologically fundamental in any way, though. But you've the fact that they're not to make a seemingly unjustified rhetorical leap to "values are just habituations or patterns of action", which just doesn't seem to be true.

Right, I think people are pointing at something else when they normally talk about values but that cluster is poorly constructed and doesn't cut reality at the joint in the same way our naive notions of belief, morals, and much else cut reality slightly askew. I'm suggesting this as a rehabilitative framing of values that is a stronger, more consistent meaning for "value" than the confused cluster of things people are normally pointing at. Although to be clear even the naive confused notion of value I'm trying to explode and rebuild here is still a fundamentally ontological thing, unless you think people mean something by "value" more like signals in the brain serving as control mechanisms to regulate feedback systems.

To your concern about an unjustified leap, this is a weakness of my current position: I don't yet have a strong ability to describe my own reasoning to bring most people along, and is one of the points of working out these ideas: so I can see what inferences do seem intuitive to people and which don't and use that information to iterate on my explanations.

Most importantly, because the "values" that people are concerned with then they talk about "value drift" are idealized values (ala. extrapolated volition), not instantaneous values or opinions or habituations.

To the extent that I think "value" is a confused concept, I think "idealized value" is consequently also confused, perhaps even more so because it is further distanced from what is happening on the ground. I realize idealized value feels intuitive to many folks, and at one time it did seem intuitive to me, but I am similarly suspicious that it is cleanly pointing to a real thing and is instead a fancy thing we have constructed as part of our reasoning that has no clear correlate out in the world. That is, it is an artifact of our reasoning process, and while that's not inherently bad, it also means it's something almost purely subjective and can easily become unhinged from reality, which makes me nervous about using it as a justification for any particular policy we might want to pursue.

[-]Vladimir_Nesov6y*60

The strongest argument against value drift (meaning the kind of change in current values that involves change in idealized values) is instrumental usefulness of future values that pursue idealized present values. This says nothing about terminal value of value drift, and a priori we should expect that people hold presence of value drift as a terminal value, because there is no reason for the haphazard human values to single out the possibility of zero value drift as most valuable. Value drift is just another thing that happens in the world, like kittens. Of course valuable value drift must observe proper form even as it breaks idealized values, since most changes are not improvements.

The instrumental argument is not that strong when your own personal future values don't happen to control the world. So the argument still applies to AIs that have significant influence over what happens in the future, but not to ordinary people, especially not to people whose values are not particularly unusual.

[-]Gurkenglas6y30

Your prior assumes that each concept is assigned a value which is unlikely to be zero, rather than that there is a finite list of concepts we care about one way or the other, which value drift is not necessarily likely to land on.

[-]papetoast3y*30

(I am currently on the path of learning how values actually work and figuring out what I should really do.)

It has been a few days since I read this post so I may be misrepresenting you, but I think this post committed a similar mistake to people who think that arguing with another person to change their mind is meaningless given that we don't have free will, because given a deterministic future, that person will just automatically change their mind. But it doesn't work like that, because the act of arguing is part of the deterministic process that eventually causes the person to change their mind. (I could not find the exact EY post that this appeared on: https://www.lesswrong.com/tag/free-will-solution) Similarly, even though we can let our values drift freely, controlling how the values drift is also part of the drifting process.

[-]Gordon Seidoh Worley3y20

I eventually got less confused about values. I still think there's something unnecessary in worry about value drift, and I could probably make a better argument for that now but I've got other projects on my plate.

Anyway, since you're thinking about how values actually work, this post captures a bunch of what I figured out and links to other things, but it's also now a couple years old and I'd probably say things differently than I did at the time.

[-]avturchin6y10

It is normal for human values to evolve. If my values were fixed at me at 6 years old, I would be regarded mentally ill.

However, there are normal human speed and directions of value evolution, and there are some ways of value evolution which could be regarded as too quick, too slow, or going in a strange direction. In other words, the speed and direction of the value drift is a normative assumption. For example, i find normal that a person is fascinated with some philosophical system for years and then just move to another one. If a person changes his ideology everyday or is fixed in "correct one" form 12 years old until 80, I find it less mentally healthy.

The same way I more prefer an AI which goals are evolving in millions of years – to the AI which is evolving in seconds or is fixed forever.

[-]Gurkenglas6y30

Human values evolve in human ways. A priori, an AI's value drift would almost surely take it in alien, worthless-to-us directions. A non-evolving AI sounds easier to align - we only need to hit the human-aligned region of valuespace once instead of needing to keep hitting it.

LESSWRONG
LW

LESSWRONG
LW

4

4

4

Whence drifting values?

Valuing valuing

Values adrift

Steady on