Epistemic status: This is basically a new categorisation scheme for, and analysis of, ideas that other people have proposed previously (both in relation to moral philosophy and in relation to AI alignment). I’m not an expert on the topics I cover here, and I’d appreciate feedback or comments in relation to any mistakes, unclear phrasings, etc. (and just in general!).
We are often forced to make decisions under conditions of uncertainty. This may be empirical uncertainty (e.g., what is the likelihood that nuclear war would cause human extinction?), or it may be moral uncertainty (e.g., is the wellbeing of future generations morally important?).
But what if you don’t believe that “morally important” is a coherent concept? What if you’re a moral antirealist and/or subjectivist, and thus reject the idea that there are any (objective) moral facts? Would existing work on moral uncertainty (see my prior posts) still be relevant to you?
I think that a lot of it is, to a large extent, for the reasons discussed in this footnote. But I think that, to directly discuss how that work is relevant to antirealists and/or subjectivists, it would help to speak not of moral uncertainty but of value uncertainty (VU; i.e., uncertainty about what one “values” or “prefers”). Doing so also helps us to categorise different types of VU, and potential ways of resolving each of these types of VU. A final benefit is that such an analysis of VUs also has substantial relevance for moral realists, and for work on AI alignment.
So this post will:
- Clarify what I mean by “values” in this context
- Name, briefly describe, and briefly suggest responses to four types of VU, and two types of situations that actually aren’t VU, but could look as though they are
- Return to the question of who and what these ideas are relevant for
I should clarify a few points about what I mean by “values” in this post:
I mean what a person actually values (or would if they knew more, or will in the future, or would if “idealised”), not what a person’s explicitly endorsed moral theory suggests they should value.
For example, after a bunch of caveats and moral uncertainty, I roughly identify as a classical utilitarian. However, in reality, psychologically speaking, I also value things other than whatever maximally increases the wellbeing of conscious beings.
- On the other hand, I don’t believe that people actually have any specific, neat, ongoing set of values, of the sort that could be used to form a utility function or something like that (see this thread). So this post is basically about how to move towards better understanding whatever’s the closest equivalent a person does have to a specific, neat, ongoing set of values.
I’ll often talk about “a person’s” values, which could mean one’s own values, someone else’s values, a group’s values, or humanity as a whole’s values.
Types of value uncertainty
I’ll now name, briefly describe, and briefly suggest responses to four (overlapping) types of VU, and then two types of situations that aren’t VU but could appear to be VU. I hope to later write a post for each type of VU (and the two related situations), where I’ll go into more detail and highlight more connections to prior work (e.g., by Kaj Sotala and Justin Shovelain).
Note that this is not the only (and perhaps not the best) way to categorise types of VU or frame this sort of discussion. It also may leave out important types. I’m open to feedback about all of this, and in fact one motivation for summarising this categorisation scheme here and then later writing more about each type is that doing so allows those later posts to be influenced by feedback on this one.
Description/cause: Present VU is uncertainty about a person’s (or group’s) current values. This occurs when multiple different sets of underlying values could explain the data (i.e., the behaviours you’ve observed from the person), essentially creating a standard curve fitting problem.
One cause of Present VU is a lack of knowledge about something about the person other than their values, such as:
- What decision theory they’re using and how rational they are. E.g., does the fact that Alice left studying till the last minute reflect that she values being stressed, or just reflect hyperbolic discounting?
- Their capabilities. E.g., does Kasparov’s loss to DeepBlue indicate that he values losing, or just that he didn’t know how to win?
- Their beliefs. E.g., does this person’s continued smoking indicate that they value smoking more than they value additional years of life, or just that they don’t know the effects of smoking?
But Present VU can also occur when, even holding constant all of the above factors, different sets of values would lead to the same behaviours (e.g., breathing, or following convergent instrumental goals) in a given circumstance.
This type of VU seems similar to the focus of inverse reinforcement learning. However, with Present VU, the “learner” may not be an AI. (In fact, Present VU may involve you trying to learn your own values, in which case it could perhaps be thought of as “Introspective” VU.)
Potential ways to resolve this: Gather more data, or do more thinking, regarding the person’s decision theory, rationality, capabilities, and/or beliefs.
Think about the assumptions you’re making about those factors. Try making different assumptions, and/or “minimal” assumptions. (Similar ideas, and some difficulties with them, have been discussed before by Armstrong and Worley, among others.)
Observe more of the person’s behaviours, ideally under different circumstances.
(Similar data and thinking regarding other people’s behaviours, rationality, etc. may also help to some extent. E.g., evidence about the degree to which people in general hyperbolically discount things could help you interpret the behaviour of some other person for whom there is no such data.)
Description/cause: Informational VU is uncertainty about what a person’s (or group’s) values would be if their knowledge or beliefs improved.
We could divide the potential sources of improved knowledge or beliefs into three categories:
- New experiences. E.g., Alan values not eating refried beans, because they look like sewage, but he’d reverse this value if he ever actually tried eating them.
- Learning new facts relevant to certain values. E.g., Betty doesn’t value the wellbeing of octopi, but would if she learned more about their neurology and behaviour.
- Improved ontologies (related to ontological crises). E.g., Cameron values “his future self’s” wellbeing, so he’d have to find a new way to map that value onto the world if he accepted the idea that people don’t fundamentally retain the “same identity” over time (relevant discussion here). See also this and this.
Potential ways to resolve this: Think about (or use models/simulations to work out) what new experiences, new facts, or improvements in ontologies would be most likely to affect the person’s values, and how the person’s values would change in response. (This could perhaps be informed by ideas from value of information analysis and sensitivity analysis.)
Try to expose the person to, or teach them about, these (or other) new experiences, new facts, and improvements in ontologies. (Note that this would involve not just predicting but actually causing changes in values. This should be done cautiously, if at all.)
For resolving uncertainty about how your own values would change if your knowledge or beliefs improved, this might look like just learning a lot, particularly about things you’re especially uncertain about and that seem especially relevant to your values.
Description/cause: Predictive VU is uncertainty about what a person’s (or group’s) values will be in the future.
This overlaps with Present and Informational VU, because:
- Learning more about a person’s current values obviously helps you predict what their values will be in the future.
- One source of the changes that will occur in a person’s values is improvements that will occur in the person’s knowledge and beliefs. Thus, a subset of what Informational VU is about is also a subset of what Predictive VU is about.
But Predictive VU also includes uncertainty about changes that will occur in a person’s values for other reasons, such as:
- changes in knowledge and beliefs that aren’t improvements (e.g., learning misinformation about immigrants)
- being persuaded via the peripheral route (e.g., seeing what attractive or high-status people are doing)
- changes in life circumstances (e.g., nearing the end of life, and thus coming to place greater value on spending quality time with good friends and family, and less value on maintaining large networks)
- what could be called “biological” changes (e.g., changes in hormone levels that lead to changes in libido, risk-taking, etc.)
Potential ways to resolve this: The potential methods for resolving Present and Informational VU are relevant for parts of Predictive VU. E.g., thinking about what a person will learn about, and how it will affect their values, can help resolve parts of both Informational and Predictive VU.
Also, for all parts of this VU, techniques that are effective for prediction in general (e.g., reference class forecasting) should be useful. E.g., an aspiring effective altruist could predict that their values are likely to shift away from “typical EA values” over time, based on data indicating that that’s a common pattern, as well as the more general end-of-history illusion.
Description/cause: Idealised VU is uncertainty about:
- what values a person’s (or group’s) “idealised self” or “ideal advisor” would have or advise;
- what values a person would have after “a process of idealisation” or after reaching “reflective equilibrium”;
- what values a person would have after “coherent extrapolated volition” (CEV);
- what set of consistent values is closest to a person’s current, inconsistent values (see, e.g., this post); or
- something else along these lines.
(Note that it seems to me that it’s hard to actually specify those key terms, and I won’t properly try to do so here; more details can be found in the links.)
This overlaps with the other types of VU, in that:
- Again, knowing about a person’s current values obviously provides a useful starting point when trying to extrapolate out from there.
- The idealisation/extrapolation process would likely involve some improvements in beliefs and knowledge, and perhaps some of the changes that actually will occur for the person in the future.
Potential ways to resolve this: This depends substantially on what we mean by the hard-to-specify terms involved. Also, it might be impossible or highly impractical to actually work out what values would result from the idealisation or extrapolation process, due to issues such as limited computing power.
But we can perhaps try to approximate such an idealisation or extrapolation process, or predict approximately what it would result in, using methods like:
Engaging in more moral reflection
Thinking about what apparent “moral progress” in the past has looked like, and what changes from current values might result from similar processes of change
- E.g., if a person approves of the moral circle expansions that have occurred so far, perhaps we should expect that that values of the person’s idealised self would reflect an even more expanded moral circle.
(For details, see the sources linked to at the start of this subsection.)
Situations that could look like value uncertainty
I’ll now briefly discuss two other types of situations in which a person (or group) actually isn’t uncertain about their values, but could appear to be, or could even believe themselves to be.
Description/cause: Value conflict (VC) is when some or all of the values a person (or group) actually has are in conflict with each other. It’s like the person has multiple, competing utility functions, or different “parts of themselves” pushing them in different directions.
E.g., Dana is someone whose values include both maximising welfare and absolutely respecting people’s rights; it’s not simply that she’s uncertain which value she actually has deep down.
In some ways, the results of this can be the same as the results of VU (particularly Present VU). For this reason, the person’s situation may be misdiagnosed as VU by themselves or by others. (E.g., Dana may try to figure out which of those somewhat conflicting values she “really” has, rather than realising that she really has both.)
Potential ways to respond: It seems unclear whether VC is a “problem”, as opposed to an acceptable result of the fragility and complexity of our value systems. It thus also seems unclear whether and how one should try to “solve” it. That said, it seems like three of the most obvious options for “solving” it are to:
Engage in, approximate, or estimate the results of “idealisation” with regards to the conflicting values
(Note that, if one is trying to help someone else “solve” their VC, one might instead encourage or help that person to use this or the following options)
Use approaches similar to those meant for decision-making under moral uncertainty (see also this), except that here the person is actually certain about their values, so the “weight” given to the values is based on something like how “important” those values feel, rather than degree of belief in them
Embrace moral pluralism
- E.g., decide to keep as values each of the conflicting values, and just give them a certain amount of “say” or “weight” in your decision-making.
- This may not always work, and it’s unclear how to decide on how to allocate “say” in any case.
(Related discussion can be found in the “Moral pluralism” section of this post.)
If the goal is just to understand the person or predict their behaviours (rather than helping them to “resolve” their conflict), then one might instead think about, model, or simulate what would happen if the person used one or more of the above options.
Merely professed VU (or VC)
Description/cause: Merely professed VU (or merely professed VC) is when a person claims to be uncertain about their values (or to have multiple, conflicting values), despite this not being the case. They may do this for game-theoretic, signalling, or bargaining reasons.
An example of merely professed VU: Eric is certain about his values, and really wants to influence you to have similar values. But he also thinks that, if you believe that he’s uncertain and open to changing his mind, you’ll be more open to talking about values with him and to changing your mind. Thus, he feigns VU.
An example of merely professed VC: Fatma actually knows that she values only her own wellbeing, but she wishes to gain resources from altruists. To do so, she claims that there’s “part of her” that values benefiting only herself, and “another part of her” that values helping others.
(Perhaps this sort of thing could also play out on an unconscious level, so that the person themselves genuinely believes that they have VU or VC. But then it seems hard to disentangle this from actual VU or VC.)
Potential ways to respond: I haven’t thought much about this, and I think how to respond would depend a lot on what one wishes the responses to achieve and on the specific situation. It seems like often one should respond in the same ways that are generally useful when someone may be lying to you or trying to manipulate you.
Who and what are these ideas useful for?
Antirealists and/or subjectivists
As noted in the introduction, one purpose of this post is to explicitly discuss the ways in which (something like) moral uncertainty is relevant for moral antirealists and/or subjectivists.
Roughly speaking (see Joyce for details), a moral antirealist is someone who accepts one of the three following claims:
Noncognitivism: The position that moral sentences are neither true nor false; they are not beliefs or factual claims. For example, moral sentences might express one’s emotions (e.g., “Murder is bad” might mean something like “Murder - boo!”).
Error theory: The position that moral sentences are meant to be beliefs or factual claims, but are just never true, as there are simply no moral facts. Error theory is similar to “nihilism”.
Subjectivism (or non-objectivism): “moral facts exist and are mind-dependent (in the relevant sense)” (Joyce). In other words, moral claims can be true, but their truth or falsity depends on someone’s judgement, rather than being simply an objective fact about the universe.
- A moral subjectivist may be a moral relativist; e.g., they could believe that what’s morally true for me could be different from what’s morally true for you. But they don’t have to be a relativist; e.g., they could believe that the same things are morally true for everyone, but that moral truth depends on one particular person’s judgement, or on the judgement of an “ideal observer”.
(In contrast, a moral realist is someone who rejects all three of those claims. Thus, moral realists believe that moral sentences do (at least sometimes) reflect beliefs or factual claims, that they can sometimes be true, and that their truth or falsity is objective.
Of these types of antirealism, VU (and VC) is most clearly relevant in the case of subjectivism. For example, many subjectivists think that their own values (or their future or idealised values) are at least part of what determines the truth or falsity of moral claims. For these people, resolving uncertainty about their own values should seem very important. Other subjectivists may want to resolve uncertainty about the present, future, or idealised values of their society, of humanity as a whole, of all intelligent life, or something else like that (depending on what they think determines moral truth).
It’s less clear what special relevance VU would have for noncognitivists or error theorists (ignoring the argument that they should be metaethically uncertain about those positions). That said:
- It’s possible that all work on moral uncertainty should be meaningful to at least some types of noncognitivists (see Sepielli, though see Bykvist and Olson for counterarguments).
- Resolving VU should be important to people who reject the idea of subjective morality, as long as they accept that the existence of some form of normativity (see my earlier post for discussion of the overlaps and distinctions between morality, normativity, and related concepts).
- My impression is that most “antirealists” in the LessWrong and EA communities are either moral subjectivists or at least accept the existence of some form of normativity (see bmgarfinkel for somewhat relevant discussion)
For moral realists, standard work on moral uncertainty is already clearly relevant. That said, VU still has additional relevance even for moral realists, because:
- For most plausible moral theories, making things go better according to (some) conscious beings’ values is morally good. In that case, resolving VUs would help you do morally good things.
- Moral philosophy (whether realist-leaning or not) very often uses intuitions about moral matters as pivotal data when trying to work out what’s morally correct. Thus, it could be argued that progress in resolving VUs would lead to better moral theories (and thus better behaviours), even by moral realists’ lights.
- In order to do morally good things, it will often be useful to be able to predict the behaviours of oneself (e.g., should I donate now in case my values drift?) and of others. As values influence behaviours, resolving VU helps you make such predictions.
- Some moral realists may also believe that there’s room for prudential (self-interested, non-moral) “should”s, and may therefore want to understand their own values so that they can more effectively do what’s “prudentially right”.
Ideally, we want our AIs to act in accordance with what we truly value (or what we’d value after some process of idealisation or CEV). Depending on definitions, this may be seen as the core of AI alignment, as one important part of AI alignment, or as at least a nice bonus (e.g., if we use Paul Christiano’s definition).
As such, recognising and resolving VUs (and VCs) seems of very clear relevance to AI alignment work. This seems somewhat evidenced by how many VU-related ideas I found in previous alignment-related work (e.g., value learning, inverse reinforcement learning, Stuart Armstrong’s research agenda, and CEV). Indeed, a major reason why I’m interested in the topic of VU is its relevance to AI alignment, and I hope that this post can provide useful concepts and framings for others who are also interested in AI alignment.
As mentioned earlier, please do comment if you think there are better categorisations/framings for this topic, better names, additional types worth mentioning, mistakes I’ve made, or whatever.
My thanks to Justin Shovelain and David Kristoffersson of Convergence Analysis for helpful discussions and feedback on this post.
Firstly, even someone “convinced” by antirealism and/or subjectivism probably shouldn’t be certain about those positions. Thus, such people should probably act as if metaethically uncertain, and that requires concepts and responses somewhat similar to those discussed in existing work on moral uncertainty. (See this post’s section on “Metaethical uncertainty”.)
Relevantly, MacAskill writes: “even if one endorsed a meta-ethical view that is inconsistent with the idea that there’s value in gaining more moral information, one should not be certain in that meta-ethical view. And it’s high-stakes whether that view is true — if there are moral facts out there but one thinks there aren’t, that’s a big deal! Even for this sort of antirealist, then, there’s therefore value in moral information, because there’s value in finding out for certain whether that meta-ethical view is correct.”
Secondly, a lot of existing work on moral uncertainty isn’t (explicitly) premised on moral realism.
Thirdly, in practice, many similar concepts and principles will be useful for:
- a moral realist who wants to act in accordance with what’s “truly right”, but who doesn’t know what that is
- an antirealist or subjectivist who wants to act in accordance with their “fundamental” or “idealised” values, but who doesn’t know what those are.
Here I use the term “behaviour” very broadly, to include not just our “physical actions” but also what decisions we make, what we say, and what we think (at least on a conscious level). This is because any of these could provide data about underlying values. So some examples of what I’d count as “behaviours” include:
- going for a run
- choosing to pursue a career in existential risk reduction
- saying you love a certain band
- being unable to stop yourself from thinking about ice cream.
I haven’t listed a specific type of VU for uncertainty about what a person’s values would be if their knowledge of beliefs changed, whether for the better or not. This is because I don’t see that as being particularly worth knowing about in its own right, separate from the other types of VU. But the next type of VU (Predictive VU) does incorporate uncertainty about what a person’s values will be after the changes that will occur to their knowledge or beliefs (whether or not these changes are improvements). ↩︎
Note the (somewhat fuzzy) distinction from Present VU:
- Informational VU is partly about what a person’s values would be if they did have certain new experiences
- Present VU can be partly resolved by exposing people to new experiences and seeing how they behave, as this lets you gather more data about what their values already were.
One could argue that it doesn’t make sense to talk of “a person’s values changing”. Such arguments could be based on the idea that an “agent” is partly defined by its values (or utility function, or whatever), or the idea that people don’t fundamentally retain the “same identity” over time anyway. For this post, I wish to mostly set aside such complexities, and lean instead on the fact that it’s often useful to think and speak as if “the same person” persists over time (and even despite partial changes in values).
But I do think those complexities are worth acknowledging. One reason is that they can remind us to not take it for granted that a person will or should currently care about “their future self’s values” (or “their future self’s” ability to act on their values). This applies especially if the person doesn’t see any reason to care about other people’s values (or abilities to achieve their values). Ruairi Donnelly discusses similar points, and links this to the concept of value drift.
However, some philosophers classify subjectivists as moral realists instead of as antirealists. To account for this, some sources distinguish between minimal moral realism (which includes subjectivists) and robust moral realism (which excludes them). This is why I sometimes write “antirealists and/or subjectivists”.) ↩︎