# 5

Crossposted from the EA forum

I've been asked variations of this question a few times recently, as I'm studying value drift for my undergraduate thesis, so I thought I would seek out others' thoughts on this.

I suppose part of this depends on how we define value drift. I've seen value drift defined as broadly as changes in values (from the Global Optimum podcast) and as narrowly as becoming less motivated to do altruistic things over time (from Joey Savoie's forum post). While the latter seems almost certainly net-negative, how the former plays out is a little less clear to me.

This leads me to wonder if there might be different kinds of value drift that may be varying degrees of good or bad.

Thoughts?

New Comment

# 2 Answers sorted by top scoring

DanArmak

### May 05, 2019

70

I'll open the discussion for the broadest definition of 'value drift' - 'changes in values over time'.

'Good' and 'bad' are only defined relative to some set of values.

A simplistic (but technically correct answer): if you had values A, and then changed to have different values B, from the viewpoint of A this is bad *by definition*, no matter what A and B actually are. And from the viewpoint of B it's good by definition. Values are always optimal according to themselves. (If they're in conflict with one another, there should be a set of optimal balances defined by some meta-values you also need to hold.)

A more complex and human-like scenario: you're not perfectly rational. Knowing this, and wanting to achieve a certain goal, it might be useful to "choose" a set of values other than the trivial set "this goal is good", to influence your own future behavior. Just as it can be instrumentally rational to choose some false beliefs (or to omit some true ones), so it can be instrumentally rational to choose a set of values in order to achieve something those values don't actually claim to promote.

A contrived example: you value donating to a certain charity. If you join a local church and become influential, you could convince others to donate to it. You don't actually value the church. If you were perfectly rational, you could perfectly pretend to value it and act to optimize your real values (the charity). But humans tend to be bad at publicly espousing values (or beliefs) without coming to really believe them to some degree. So you'll get value drift towards really caring about the church. But the charity will get more donations than if you hadn't joined. So from the point of view of your original values (charity above all), the expected value drift to (charity + church) is an instrumentally good choice.

Agreed. In the abstract, value drift is just a change over time, within an agent or across a line of replicated agents. And from the point of view of any given set of values, any other set of values is "bad". If the other set were good, the agent would seek to adopt them, right?

So the question really is whether value diversity has meta-value. diversity over time isn't that much different than diversity across agents.