Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Given that AI systems are becoming increasingly capable and widely deployed, we have to think carefully about the multifarious effects their adoption will have on our individual and collective lives. In this sequence, I will focus specifically on how AI can come to substantially affect what it is that we value, as well as the mechanisms by which this may occur. 

This sequence introduces the “Value Change Problem” (VCP). I will argue why I think it’s a critical aspect of ethical AI design, and outline what risks we might expect if we fail to properly address it. 

This is a shortened and slightly adapted version of academic work I have submitted for publication elsewhere. The original work was written mainly with an academic philosophy audience in mind. If you’re interested in reading the original (longer) version, feel free to reach out.

The core claim of VCP can be summarised as follows: 

AI alignment must address the problem of (il)legitimate value change; that is, the problem of making sure AI systems neither cause value change illegitimately, nor forestall legitimate cases of value change in humans and society.

Outline (or: how to read this sequence?)

This case resides on three core premises: 

  1. human values are malleable (post 1);
  2. some instances of value change are (un)problematic (post 2); 
  3. AI systems are (and will become increasingly) capable of affecting people’s value change trajectories (post 3). 

From this, I conclude that we ought to avoid building AI systems that disrespect or exploit the malleability of human values, such as by causing illegitimate value changes or by preventing legitimate ones. I will refer to this as the ‘Value Change Problem’ (VCP). After having established the core case for VCP, the remaining two posts discuss in some more depth the risks that may result from inadequately addressing the VCP. 

If you are already on board with each of the premises, you may instead want to directly skip to reading the discussion of risks (posts 4 and 5).

I consider two main categories of risks: (post 4) risks from causing illegitimate value change, as well as (post 5) risks from (illegitimately) preventing legitimate value change. For each of these I want to ask: What is the risk? What are plausible mechanisms by which these risks manifest? What are ways in which these risks manifest already, and what are the ways in which they are likely to be exacerbated going forward, as AI systems become more advanced and more widely deployed? 

In the first case—risks from causing illegitimate value change—, leading with the example of recommender systems today, I will argue that performative predictors can come to affect that which they set out to predict—among others, human values. 

In the second case—risks from preventing legitimate value change—, I will argue that value collapse—the idea that hyper-explication of values tends to weaken our epistemic attitudes towards the world and our values—can threaten the possibility of self-determined and open-ended value exploration and, consequently, the possibility of legitimate value change. In both cases, we should expect (unless appropriate countermeasures are taken) the same dynamic to be exacerbated—both in strength and scope—with the development of more advanced AI systems, and their increasingly pervasive deployment.

The different posts are more or less self-contained and can be read on their own. 

I welcome thoughts, constructive criticism, and ideas for relevant future work.


In one sense, this analysis is intended to be a contribution to our understanding of the risk landscape related to building increasingly capable, autonomous and widely deployed AI systems. 

In another sense, I believe that some of the reflections on the nature of values contained in this discussion are relevant on the path to proposals for ambitious value learning by helping us characterise (some of) the outlines/shapes of an adequate theory of value. 

I believe that the problem of (il)legitimate value change is an integral part of the problem of aligning advanced AI systems. As such, the question we need to ask about the ethical design of advanced AI systems should not be limited to the question of what values we want to embed in them, but also how we decide on the forms of value change that are (il)legitimate, and how to design and deploy systems such that do not disrespect or exploit value malleability. 

Note that, while I hope to have been able to make initial steps towards a satisfactory characterisation of the Value Change Problem, more work is needed to improve our understanding of the specific ways in which AI systems can (and already do) cause value change, when cases of value change are legitimate or illegitimate, and how to build AI systems that reliably avoid causing illegitimate value change and potentially promote legitimate value change. 


I am grateful for useful comments and discussions to my thesis supervisor 
Ioannis Votsis, my colleagues at ACS (in particular Simon and Clem), Hunter Muir, Tsvi BT, TJ and likely others I’m forgetting here. 


Anderson, E. (1995). Value in ethics and economics. Harvard University Press.

Anscombe, G.E.M., (2000). Intention. Harvard University Press. (First edition published 1957)

Bengio, Y. (2023). How rogue AIs may arise. Retrieved at: 

Bilgrami, A. (2015). The ambitions of classical liberalism: Mill on truth and liberty. Revue internationale de philosophie 2: 175-182.

Bostrom, N. (2012). The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines, 22(2), 71-85.

Callard, A. (2016). Proleptic reasons. Oxford Studies in metaethics, 11, 129-154.

Callard, A. (2018). Aspiration: The agency of becoming. Oxford University Press.

Chechkin, G. A., Piatnitskiĭ, A. L., & Shamev, A. S. (2007). Homogenization: methods and applications. Vol. 234. American Mathematical Soc.

Critch, A., & Krueger, D. (2020). AI research considerations for human existential safety (ARCHES). arXiv preprint arXiv:2006.04948.

Critch, A., & Russell, S. (2023). TASRA: A taxonomy and analysis of societal-scale risks from AI. arXiv preprint arXiv:2306.06924.

DiFranzo, D., & Gloria-Garcia, K. (2017). Filter bubbles and fake news. XRDS: Crossroads, The ACM Magazine for Students 23.3: 32-35.

Fuchs, A. E. (2001). Autonomy, slavery, and Mill's critique of paternalism. Ethical Theory and Moral Practice 4: 231-251.

Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and machines 30.3: 411-437.

Hardt, M., Jagadeesan, M., & Mendler-Dünner, C. (2022). Performative power. Advances in Neural Information Processing Systems35, 22969-22981.

Hazrati, N., & Ricci, F. (2022). Recommender systems effect on the evolution of users’ choices distribution. Information Processing & Management 59.1: 102766.

Hendrycks, D. (2023). Natural selection favors AIs over humans. arXiv preprint arXiv:2303.16200.

Hendrycks, D., Carlini, N., Schulman, J., & Steinhardt, J. (2021). Unsolved problems in ML safety. arXiv preprint arXiv:2109.13916.

Humberstone, I.L. (1992). Direction of fit. Mind. Vol. 101, No. 401, 59–83.

Kant, I. (1993). Grounding for the metaphysics of morals (J. W. Ellington, Trans.). Hackett Publishing. (Original work published 1785.)

Mill, J. S. (2002). On liberty. Dover Publications. (Original work published 1859.)

Nguyen, C. T. (2022). Value collapse. [Video] The Royal Institute of Philosophy Cardiff, Annual Lecture 2022. Retrieved from: 

Paul, L. A. (2014). Transformative experience. OUP Oxford.

Perdomo, J., Zrnic, T., Mendler-Dünner, C., & Hardt, M. (2020). Performative prediction. Proceedings of the 37th International Conference on Machine Learning, PMLR 119:7599-7609, 2020. 

Pettit, P. (2011). A Theory of freedom. Cambridge: Polity Press.

Porter, T. M. (1996). Trust in numbers: The pursuit of objectivity in science and public life. Princeton University Press.

Rawls, J. (2001). Justice as fairness: A restatement. Belknap Press.

Russell, S. J. (2019). Human compatible: artificial intelligence and the problem of control. Viking.

Searle, J.R. (1985). Expression and meaning: Studies in the theory of speech acts. Cambridge University Press.

Sen, A. (2002). Rationality and freedom. The Belknap Press of Harvard University Press. Cambridge, Massachusetts

Shevlane, T., Farquhar, S., Garfinkel, B., Phuong, M., Whittlestone, J., Leung, J., Kokotajlo, D., Marchal, N., Anderljung, M., Kolt, N,, Ho, L., Siddarth, D., Avin, S., Hawkins, W., Kim, B., Gabriel, I., Bolina, V., Clark, J., Bengio, Y., Christiano, P., Dafoe A. (2023). Model evaluation for extreme risks. arXiv preprint arXiv:2305.15324.

Simon, H. A. (1990). Bounded rationality. Utility and probability: 15-18.

Swierstra, T. (2013). Nanotechnology and technomoral change. Etica & Politica15(1), 200-219.

Taylor, C. (1985). What’s wrong with negative liberty. In Philosophy and the Human Sciences: Philosophical Papers. Vol. 2, Cambridge: Cambridge University Press. 211–29.

Von Neumann, J., & Morgenstern, O. (1944). Theory of games and economic behavior. Princeton: Princeton University Press.

Wimsatt, W. C. (2007). Re-engineering philosophy for limited beings: Piecewise approximations to reality. Harvard University Press.

New Comment