Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

In this post, I introduce the last of three premises—the claim that AI systems are (and will become increasingly) capable of affecting people’s value change trajectories. With all three premises in place, we can then go ahead articulating the Value Change Problem (VCP) in full. I will briefly recap the full account, and then give an outlook on what is yet to come in post 4 and 5, where we discuss the risks that come from failing to take VCP seriously.

Premise three: AI systems can affect value change trajectories

The third and final premise required to put together the argument for the Value Change Problem is the following: AI systems are (and will become increasingly) capable of affecting people’s value change trajectories.

I believe the case for this is relatively straightforward. In the previous post, we have seen several examples of how external factors (e.g. other individuals, societal and economic structures, technology) can influence an individual's trajectory of value change, and that they can do so in ways that may or may not be legitimate. The same is true for AI systems. 

Value change typically occurs as a result of moral reflections/deliberation, or learning of new information/making new experiences. External factors can affect these processes—e.g. by affecting what information we are exposed to, by biasing our reflection processes towards some rather than other conclusions,etc.—, thereby influencing an individual's trajectory of value change. AI systems are another such external factor capable of similar effects. Consider for example the use of AI systems in media, advertisement or education, as personal assistants, to help with learning or decision making, etc. From here, it’s not a big step to recognise that, with the continued increasing in capabilities and deployment of these systems, the overall effect AI systems might come to have over our value change trajectories.

Posts 4 and 5 will discuss all of this in more detail, including by proposing specific mechanisms by which AIs can come to affect value change trajectories, as well as the question when they are and aren’t legitimate. 

As such, I will leave discussing of the third premise and this and swiftly move on to putting together the full case for the Value Change Problem: 

Putting things together: the Value Change Problem

Let us recap the arguments so far. First, I have argued that human values are malleable rather than fixed. In defence of this claim, I have argued that humans typically undergo value change over the course of their lives; that human values are sometimes uncertain, underdetermined or open-ended, and that some ways in which humans typically deal with this involves value change; and, finally, that transformative experiences (as discussed by Paul (2014)) and aspiration (as discussed by Callard (2018)), too, represent examples of value change. 

Next, I have argued that some cases of value change can be (il)legitimate. In support of this claim, I have made an appeal to intuition by providing examples of cases of value change which I argue most people would readily accept as legitimate and illegitimate, respectively. I then strengthened the argument by proposing a plausible evaluative criteria—namely, the degree of self-determination involved in the process of value change—which lends further support and rational grounding to our earlier intuition. 

Finally, I argued that AI systems are (and will become increasingly) capable of affecting people’s value change trajectories. (While leaving some further details to posts 4 and 5.)

Putting these together, we can argue that ethical design of AI systems must be taken seriously and find ways to address the problem of (il)legitimate value change. In other words, we ought to avoid building AI systems that disrespect or exploit the malleability of human values, such as by causing illegitimate value changes or by preventing legitimate ones. I will refer to this as the ‘Value Change Problem’.

What does it mean for AI design to take the problem of (il)legitimate value change seriously?  Concretely, it means that ethical AI design has to try to i) understand the ways in which AI systems do or can cause value change, ii) understand when a case of value change is legitimate or illegitimate and iii) build systems that do not cause illegitimate value change, and permit (or enable) legitimate value change.

In the remaining two posts, I will discuss in some more depth the risks that may result from inadequately addressing the VCP. This gives raise to two types of risks: risks from causing illegitimate value change, and risks from preventing legitimate value change. For each of these I want to ask: What is the risk? What are plausible mechanisms by which these risks manifest? What are ways in which these risks manifest already today, and what are the ways in which they are likely to be exacerbated going forward, as AI systems become more advanced and more widely deployed?

In the first case—risks from causing illegitimate value change—, leading with the example of recommender systems today, I will argue that performative predictors can come to affect that which they set out to predict—among others, human values. In the second case—risks from preventing legitimate value change—, I will argue that value collapse—the idea that hyper-explication of values tends to weaken our epistemic attitudes towards the world and our values—can threaten the possibility of self-determined and open-ended value exploration and, consequently, the possibility of legitimate value change. In both cases, we should expect (unless appropriate countermeasures are taken) the same dynamic to be exacerbated—both in strength and scope—with the development of more advanced AI systems, and their increasingly pervasive deployment.

Brief excursion: Directionality of Fit

 A different way to articulate the legitimacy question I have described here is in terms of the notion of ‘Directionality of Fit’. In short, the idea is that instead of asking whether a given case of value change is (il)legitimate, we can ask which ‘direction of fit’ ought to apply. Let me explain. 

Historically, ‘directionality of fit’ (or ‘direction of fit’) was used to refer to the distinction between values and beliefs. (The idea came up (although without mentioning the specific term) in Anscombe’s Intention  (2000) and was later discussed by Searl (1985) and Humberstone (1992).) According to this view, beliefs are precisely those things which change to fit the world, while values are those things which the world should be fitted to. 

However, once one accepts the premise that values are malleable, the ‘correct’ (or desirable) direction of fit ceases to be clearly defined. It raises the question of when exactly values should be used as a template for fitting the world to them, and when it is acceptable or desirable for the world to change the values. If I never accept the world to change my values, I forgo any possibility for value replacement, development or refinement. However, as I've argued in part before and will discuss in some more detail in post 5, I might reason to consider myself morally harmed if I lose that ability to freely undergo legitimate value change. 

Finally, this lens also makes more salient the intricate connection between values and beliefs: the epistemic dimensions of value development, as well as the ways values affect our epistemic attitudes and pursuits.

New Comment
4 comments, sorted by Click to highlight new comments since:
[-]cousin_itΩ360

Yeah. It also worries me that a lot of value change has happened in the past, and much of it has been caused by selfish powerful actors for their ends rather than ours. The most obvious example is jingoism, which is often fanned up by power-hungry leaders. A more subtle example is valuing career or professional success. The part of it that isn't explained by money or the joy of the work itself, seems to be an irrational desire installed into us by employers.

Agree! Examples abound. You can never escape your local ideological context - you can only try to find processes that have some hope at occasionally pumping into the bounds of your current ideology and press beyond it - no reliably receipt (just like there is no reliably receipt to make yourself notice your own blind spot) - but there is the hope for things that in expectation and intertemporally can help us with this. 

Which poses a new problem (or clarifies the problem we're facing): we don't get to answer the question of value change legitimacy in a theoretical vacuum -- instead we are already historically embedded in a collective value change trajectory, affecting both what we value but also what we (can) know. 

I think that makes it sound a bit hopeless from one perspective, but on the other hand, we probably also shouldn't let hypothetical worlds we could never have reached weight us down -- there are many hypothetical worlds we still can reach that it is worth fighting for.

I really like this articulation of the problem!

To me, a way to point to something similar is to say that preservation (and enhancement) of human agency is important (value change being one important way that human agency can be reduced). https://www.alignmentforum.org/s/pcdHisDEGLbxrbSHD/p/Qi77Tu3ehdacAbBBe

One thing I've been trying to argue for is that we might try to pivot agent foundations research to focus more on human agency instead of artificial agency. For example, I think value change is an example of self-modification, which has been studied a fair bit for artificial agents.

Yes, there are both subtle/slow ways for this to be a problem and abrupt dramatic ways for this to be a problem.

Taken to an extreme by a highly persuasive model, the outcome of people being 'brainwashed' to dramatically different points of view clearly not in keeping with their own or society's good.... that's a quite overt and near-term-plausible bad outcome.