I have published a paper in the Journal of Artificial Intelligence and Consciousness about how to take into account the interests of non-human animals and digital minds in A(S)I value alignment.

For the published version of the paper see (PLEASE CITE THIS VERSION): https://www.worldscientific.com/doi/10.1142/S2705078523500042 

For a pdf of the final draft of the paper see: https://philpapers.org/rec/MORTIA-17 

Below I have copy-pasted the body of the paper, for those of you who are interested, though please cite the published version at: https://www.worldscientific.com/doi/10.1142/S2705078523500042 

Cross-Posted at the EA Forum: https://forum.effectivealtruism.org/posts/pNHH953sgSConBmzF/taking-into-account-sentient-non-humans-in-ai-ambitious 

Summary

Abstract: Ambitious value learning proposals to solve the AI alignment problem and avoid catastrophic outcomes from a possible future misaligned artificial superintelligence (such as Coherent Extrapolated Volition [CEV]) have focused on ensuring that an artificial superintelligence (ASI) would try to do what humans would want it to do. However, present and future sentient non-humans, such as non-human animals and possible future digital minds could also be affected by the ASI’s behaviour in morally relevant ways. This paper puts forward Sentientist Coherent Extrapolated Volition, an alternative to CEV, that directly takes into account the interests of all sentient beings. This ambitious value learning proposal would significantly reduce the likelihood of risks of astronomical suffering from the ASI’s behaviour, and thus we have very strong pro-tanto moral reasons in favour of implementing it instead of CEV. This fact is crucial in conducting an adequate cost-benefit analysis between different ambitious value learning proposals.

Keywords: The Alignment Problem · Coherent Extrapolated Volition · Suffering risk · Sentientism · Digital Minds · Non-human Animals

Apart from humans, non-human animals, but also sentient digital minds may be very negatively affected by the behaviour of future AI systems. This gives us reasons to take into account their interests when deciding how to design such systems. In this paper, I focus specifically on Coherent Extrapolated Volition, but many of the considerations I present can also be applied to other ambitious value-learning proposals. 

More research along these lines, which takes seriously (1) the impact of artificial superintelligence(s) in the future and (2) the moral considerability of non-human animals and possible future digital minds is necessary if we want key decision-makers such as AI companies or governments to be adequately informed about how to ensure that the development of powerful AI systems is safe and beneficial for all sentient beings.

1. Introduction 

The development of Artificial Superintelligence (ASI) (or Artificial General Intelligence poses serious existential and suffering risks (Bostrom, 2014; Yudkowsky, 2008; Sotala & Gloor, 2017; Tomasik, 2015a; Baumann, 2017a). To prevent such catastrophes, we must solve the alignment problem. We must ensure that (if we do so) the first ASI we create has certain values so that it does not result in those outcomes and so that it does that which we intend it to do. The value specification problem is part of the more general alignment problem (Gabriel, 2020; Christian, 2020). It is the question of what values to align an ASI with, and it is where this paper will focus instead of other more technical issues such as inner alignment. 

Non-human sentient beings, such as domesticated non-human animals, animals living in the wild and possible future sentient digital minds, have usually been neglected in discussions about what values should be implemented into the first ASI to prevent catastrophic consequences. However, all sentient beings matter and the interests of non-human sentient beings should also be taken into account in any value learning proposal. 

In this paper, I contend that in a future in which we could both implement some of the standard ambitious value learning proposals or alternative value learning proposals that directly took into consideration the interests of sentient non-humans, we would have both strong and very strong pro-tanto moral reasons to do the latter. However, in practice, as laid out in Section 5, it is not completely clear which kind of ambitious value learning proposal would be best to try to implement. I turn to the example of Coherent Extrapolated Volition (CEV) one of the most popular ambitious value-learning proposals and argue that it is not ideal since there is (at least) a sufficiently non-negligible chance that by not sufficiently including all sentient beings, it may lead to risks of astronomical suffering as a result of the own ASI’s actions. Although here I only focus on CEV, similar arguments to the ones that I use here, are also applicable to many of the other ambitious value learning proposals out there and I may discuss and analyze other ambitious value learning proposals in future research. 

2. Why the standardly proposed version of Coherent Extrapolated Volition is problematic 

Coherent Extrapolated Volition (CEV) is Eliezer Yudkowsky’s value learning proposal of implementing into an ASI the goal of fulfilling what we (humans) would agree we would want if given much longer to think about it, in more ideal circumstances (Bostrom, 2014). That is, what we (humans) would wish “if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted” (Yudkowsky, 2004). See (Yudkowsky, 2004) for a more detailed explanation of the ideal circumstances required for adequate coherent extrapolation. Some, including (Yudkowsky, 2016) may argue that this would, in fact, be enough for the ASI to adequately take into account the interests of non-human sentient beings if they ought to be taken into account. Similarly, it has also been contended that animals’ well-being is also included in humans’ preferences and that this together with the less myopic decision-making of the ASI (compared to human decision-making), might be enough (Russell, 2019). It has also been held that it might be difficult to sustain that humans should build an ASI that cares about the interests of sentient non-humans directly since then it would care more about them than what humans actually do (Russell, 2019). According to these views, if it is indeed the case that if humanity had more time to think about it in ideal circumstances, it would have the goal of adequately taking into account their interests, then by coherently extrapolating humanity’s volition the ASI would take into account the interests of sentient non-humans. I will argue this answer is not sufficiently satisfactory for us not to have strong reasons to try to directly include the interests of sentient non-humans in the ASI’s goals if we are able to do so. 

It has plausibly been argued that for any value learning proposal, there are some aspects of the proposal that cannot be left to the ASI to be figured out, such as ‘standing, concerning whose ethics views [or morally relevant interests] are included; measurement, concerning how their views [or morally relevant interests] are identified; and aggregation, concerning how individual views [or morally relevant interests] are combined to a single view that will guide AI behaviour’ (Baum, 2020). As it has been argued, the alignment problem has an essentially normative component, different kinds of technical solutions could be used for loading certain values into the reward function of a future A(S)I likely have different implications on exactly what kind of values we can implement, and thus, “there is no way to ‘bracket out’ normative questions altogether” (Gabriel, 2020). Ultimately, the designers of the A(G)I that may become an ASI, cannot abstain from facing ethical decisions regarding in what way, what and whose values or morally relevant interests they include in the ASI’s goals. I contend that we should take seriously the possibility that Yudkowsky’s popular proposal does not adequately take into account moral considerations pertaining to the moral considerability of sentient non-humans. 

Even though the standard CEV proposal adequately avoids directly specifying the values that should guide the ASI’s behaviour (and as a consequence plausibly avoids substantial value lock-in), it nonetheless specifies which sentient beings have their volitions coherently extrapolated. The standard proposal only includes presently existing humans. In this sense, the standard CEV proposal excludes all other sentient beings and gives a specific answer to the standing problem. I contend that there are very strong reasons to suggest that CEV’s standardly proposed solution to the standing problem is much more morally undesirable than an ambitious value learning proposal that takes the interests of all sentient beings more into account. From now forward I will refer to this kind of alternative proposal that I will further develop in Section 3 as Sentientist Coherent Extrapolated Volition (SCEV)

CEV excludes the volitions of sentient non-humans from the extrapolation base, in section 2.1. I will show why their mere exclusion is unjustified in the same way that it would be unjustified to exclude groups of humans. This in itself gives us strong moral reasons to implement SCEV instead of CEV if we are able to do so. However, in section 2.2. I contend that we have very strong moral reasons to do so. There is (at least) a non-negligible probability that an adequate implementation of the standard CEV proposal results in the ASI causing or allowing the occurrence of risks of astronomical suffering (s-risks). These are risks of events that bring about suffering in cosmically significant amounts, vastly exceeding all suffering that has existed on Earth so far (Sotala & Gloor, 2017; Althaus & Gloor, 2016). “Significant” here means relative to expected future suffering (Althaus & Gloor, 2016). Taking the badness of s-risks into account, this non-negligible probability is sufficiently large for CEV to be a much more undesirable solution to the value specification problem than SCEV. 

It is important to note and keep in mind that the reasons in favour of SCEV that I present and discuss in this section and throughout this paper are pro-tanto in the sense that (as discussed in Section 5) they could be overridden or defeated by other pro-tanto reasons against the implementation of SCEV. As a result, it may not be clear whether trying to implement SCEV instead of CEV is the right choice all things considered. 

2.1. Why we would have strong moral reasons to implement SCEV instead of CEV: an analogy with CEO-CEV and men-CEV 

To see why it is unjustified for sentient non-humans to be excluded from the extrapolation base, we can look at different alternative (and fictional) CEV proposals. Take, for example, the CEO-CEV proposal: the proposal of only applying CEV, that is, of only coherently extrapolating the volition, of the CEO of the research lab that first develops ASI or AGI. Or, for example, take the men-CEV proposal of only coherently extrapolating men’s volitions. These would clearly be substantially more morally undesirable ambitious value learning proposals to employ in comparison to CEV if we could choose to implement any one of them. We have strong reasons against only coherently extrapolating the volitions of a small subset of humans that is unrepresentative of humanity as a whole. Probably the most plausible reason against pursuing such paths as ambitious value learning proposals (if the possibility of implementing normal CEV is present) is that there is clearly (at least) a sufficiently large non-negligible probability that these proposals would result in the ASI’s behaviour having negative consequences for the humans not directly included in the extrapolation base of the CEV. People who are not men or people who have sufficiently different preferences from the AI lab CEO’s preferences could find themselves in a post-ASI world where their interests are not sufficiently taken into account, and as a result, suffer negative consequences. Their interests could be very neglected as a result of only being present in the ASI’s utility function to the extent to which men or the AI lab’s CEO cared about them once they thought enough about them in ideal circumstances. 

Even though it is possible that if the AI lab’s CEO or men had sufficient time to think under the ideal conditions, then, they would want to take equally into account the interests of the people outside the extrapolation base as much as theirs, we still would not accept these proposals as adequate proposals for ambitions value learning compared to normal CEV. Because there is a sufficient likelihood that this possibility does not turn out to be the case. Whilst these value learning proposals might plausibly be better than not implementing any values into the ASI, they are significantly worse than CEV because there is a sufficiently large non-negligible probability that they would result in substantial negative consequences for the people excluded from the extrapolation base in each case. This would give us strong reasons to rule out CEO-CEV and men-CEV in a decision-making situation where we could also choose to implement the normal CEV proposal that takes all humans into account. 

I contend that for similar kinds of reasons, CEV is also an inadequate proposal in comparison to SCEV. The normal CEV proposal, like CEO-CEV and men-CEV, excludes a subset of moral patients from the extrapolation base. It excludes all sentient non-human beings. And as in the previous examples, if CEV were to be adequately implemented, the interests of sentient non-humans would only be in the ASI’s utility function and would only be taken into account to the extent to which humans cared about them once they thought enough about them in ideal circumstances. Sentient non-humans are moral patients, they could be affected in morally undesirable ways, as could the people excluded from the extrapolation base in the previous examples. This could occur as a mere result of their interests not being sufficiently important to the individuals in the extrapolation base even if they had enough time to think about it under ideal circumstances. This is why if we were able to do so, instead of implementing CEV, we would have strong moral reasons to implement SCEV: a proposal that took the interests of all sentient beings adequately into account. Below, in Section 4, I address two possible objections that contend that there is a relevant difference between excluding other humans and excluding sentient non-humans. 

2.2. Why we would have very strong moral reasons to implement SCEV instead of CEV: significantly reducing risks of astronomical suffering 

We do not only have strong moral reasons to implement SCEV instead of CEV. There are further reasons to believe that the difference in expected negative consequences between implementing CEV instead of SCEV is much larger than the difference in expected negative consequences between implementing CEO-CEV or men-CEV instead of CEV. If we could do so, we would have much more pressing reasons to aim for SCEV instead of the standard CEV proposal, than the reasons we have to aim for the standard CEV proposal instead of CEO-CEV or men-CEV. Then, if we could do so, we would have very strong moral reasons to implement SCEV instead of CEV. 

2.2.1. The greater lack of moral consideration for sentient non-humans 

It is likely that any given group of humans (without any particular psychosocial conditions) cared much more about the interests of the rest of humans if they had enough time to think about it under ideal circumstances than, the extent to which all humans would care about the interests of all sentient non-humans if they had enough time to think about it under ideal circumstances. It is likely that in the first case the humans in the extrapolation base granted much more consideration to the humans outside the extrapolation base, than the amount of consideration that, in the second case, all humans would grant to sentient non-humans outside the extrapolation base. 

Of course, we cannot be sure that is the case since we do not specifically know how coherently extrapolating the volitions of any subset of or all humans would work in practice. And we do not know with certainty what any subset of or all humans would agree that they want, if given much longer to think about it, in more ideal circumstances. However, it is not the case that we can say nothing about which outcomes from different types of CEV are more or less likely. Indeed, we can say something. For instance, humans tend to have much more consideration for those who are closest and more similar to them (Caviola et al., 2018; Dhont, Hodson, & Leite, 2016), and this may affect the process of coherently extrapolating volitions in a negative manner for sentient non-humans. This is due to the fact that sentient non-humans, such as domesticated non-human animals, animals living in the wild and possible future digital minds are much more distant (in a broad social, emotional, cognitive, psychological and even spacial and temporal sense) to humans than other humans. 

Furthermore, we should also update on the fact that (at least partially) as a result of not equally taking into consideration the interests of all sentient beings, currently and in the past, humans have killed and caused immense amounts of unbearable suffering to an immense number of non-human sentient beings for the almost negligible benefit of taste pleasure. To see how much we should update on this fact, imagine possible worlds in which the AI lab’s CEO or men caused immense amounts of suffering to the humans excluded from the extrapolation base in each case. And did so in proportional quantities to the amount of suffering that humans in this world cause to other animals. Imagine they did this via, for instance, torturing the humans outside the extrapolation base in each case. Imagine, that, as humans do to factory-famed non-human animals, they caused unbearable suffering to the humans outside the extrapolation base throughout their lives and killed almost all of them before they could even reach adulthood. In these possible worlds, we would have more reasons to believe that it is more likely that implementing CEO-CEV or men-CEV would lead to worse consequences than the ones we expect it to have in the actual world. Analogously, it seems plausible that it is significantly more likely that in this world, in a world where human psychology has (at least in part) led humans to commit atrocities, such as factory farming, to other non-human sentient beings, it is also the case that the standard proposal of CEV leads to more negative consequences for non-human sentient beings than in a world in which our human psychologies had not played a role in committing atrocities towards other sentient beings. 

2.2.2. The nature and different kinds of the worst outcomes: risks of astronomical suffering 

As we have seen there is (at least) a non-negligible chance that the interests of sentient non-humans would be highly disregarded by an ASI with the implementation of the standard CEV proposal. And that the likelihood of this (whatever it is), is higher than the likelihood that the interests of the humans not included in the extrapolation base are highly neglected in an ASI-controlled future where only the volitions of some humans are coherently extrapolated. A civilization with an ASI with either of those kinds of ambitious value learning proposals is (at least somewhat) likely to want to expand and settle space and make use of advanced AI and simulation technologies. In such a future where the interests of sentient non-humans would be highly disregarded, there is a significant probability that sentient non-humans would suffer astronomically as a result of the actions the humans would want to take and would likely be taken by the ASI in following their CEV. 

Wild animal suffering might be spread through the universe by such civilisations. This could be a result of direct panspermia or intentionally terraforming planets because of some of the humans’ values to spread life through the universe or for their aesthetic enjoyment of nature (Tomasik, 2018; O'Brien, 2021). But if no preventive measures are put in place it may also occur as a result of not adequately decontaminating spaceships, which could result in some microbes or other living organisms being transported accidentally to other planets which could over the course of many years result in the emergence of sentient lifeforms in those planets. Since it is plausible that in the lives of these future non-human animals suffering would be very prevalent and possibly even predominate, as in most of the lives of currently existing non-human animals living in nature (Ng, 1995; Groff & Ng, 2019; Tomasik, 2020; O'Brien, 2021) if this were to occur it could lead to astronomical amounts of suffering. 

A civilization with an ASI that takes into account the interests of some or all humans but not those of sentient non-humans might also make abundant use of advanced AI and simulation technologies. This could either knowingly or accidentally result in astronomical amounts of suffering. An ASI that neglects the interests of artificial forms of sentience might accidentally create vast quantities of sentient subroutines. These are instrumentally valuable functions, subprograms, robot overseers, robot scientists or subagents inside the ASI’s program and structure (Tomasik, 2019a). In the same way that the emergence of phenomenal experience and the capacity for suffering has plausibly been evolutionary instrumentally useful in guiding behaviour, it might be instrumentally useful for the ASI in controlling the behaviour of its internal process in order to achieve its goals. By undergoing a complex optimization process, the ASI might instrumentally value the creation of suffering as has done natural selection, a complex process optimizing for gene reproduction (Sotala & Gloor, 2017). 

Furthermore, an ASI-controlled civilisation where CEV is implemented and the interests of sentient non-humans are highly disregarded might also result in Mind Crime, the creation of vast quantities of digital minds in simulations, including suffering ones. It might create immense numbers of ancestor simulations of past human history, of natural environments or of possible future evolutionary paths for research purposes (Bostrom, 2014, pp. 125-26; Sotala & Gloor, 2017). In many of these, if the simulations are sufficiently detailed and complex, “human” digital minds and digital minds of both wild or domesticated non-human animals might emerge thus multiplying wild animal suffering and the subjective experiences of being factory-farmed (Tomasik, 2015b; Baumann, 2017a). It could also intentionally create or allow other humans to intentionally create simulations with a specially significant amount of suffering, for entertainment or to satisfy sadistic preferences. In wanting to explore all the space of possible minds it might also try to simulate various different distant, weird and alien forms of sentience compared to ours (Tomasik, 2015b). The possibility that any of these kinds of non-human sentient beings came to exist on immense scales and that their interests were neglected by the CEV-aligned ASI in power aggravates the risk of astronomical suffering. 

And, as it has very plausibly been argued elsewhere, ‘nearly all plausible value systems will want to avoid suffering risks and for many value systems, suffering risks are some of the worst possible outcomes and thus some of the most important to avoid’ (Sotala and Gloor, 2017). On any value system that holds that it is very important to prevent undesired and intense suffering, it will be of extreme importance to prevent astronomical amounts of undesired and intense suffering. Then, it need not be the case that the likelihood of these outcomes is very high, for it to be of immense importance to prevent them. If there is a non-negligible or higher chance of the occurrence of s-risks it is still of great moral importance to reduce it since their occurrence would be extremely morally undesirable. 

2.2.3. Why (even without taking into account the higher disregard for their interests) sentient non-humans would more likely be the victims of s-risks 

Due to differences between the kinds of beings that biological humans and sentient non-humans are, even if it were not the case that sentient non-humans are more likely to be disregarded, it would still be significantly more likely that sentient non-humans would be harmed much more by implementing CEV that disregarded their interests, than excluded humans by implement CEO-CEV or men-CEV that disregarded their interests. This is so because it is easier to cause astronomical suffering to sentient non-humans than it is to cause astronomical suffering to excluded (biological) humans. 

An ASI can cause actions that result in astronomical suffering or it can allow actions by other agents that cause astronomical suffering. In both cases, whether the action is performed by the ASI or by other agents, the fact that the action results in astronomical suffering can be intentional to a greater or lesser extent. When the actions that result in astronomical suffering are maximally intentional, the agent that performs them wants to cause astronomical suffering, when they are maximally unintentional, the agent does not even know that they are causing astronomical suffering. In the middle areas of this scale of the intentionality and unintentionally of the action, there are cases in which the agent knows that they are causing astronomical suffering but acts anyway in order to achieve its goals, and there are cases in which the agent is uncertain about whether they are causing astronomical suffering but also acts anyway. There are a few characteristics that many kinds of sentient non-humans have which make it more likely and easier for them to endure astronomical suffering. Biological humans excluded in CEO-CEV or men-CEV lack these characteristics, and thus it is more difficult for them to be the victims of astronomical suffering. Here are some of the relevant differences: 

● Resource efficiency of reproduction or replication: Plausably much fewer resources are required for suffering digital minds to be copied relative to the resources required to sustain them than the resources required for suffering humans to reproduce relative to the resources required to sustain them (Sotala, 2012; Shulman & Bostrom 2021). This makes the agent causing the astronomical suffering more likely to intentionally replicate digital minds and more easy for them to do so unintentionally, than with (biological) humans. 

● Lack of intervention in sustaining a suffering population: Plausablly less intervention is required to sustain a suffering population of digital minds or (biological) non-human animals living in nature than a suffering population of (biological) humans. In the former case, the agent causing the astronomical suffering can just let the suffering continue indefinitely (either more intentionally or more unintentionally). In the latter case, the processes of natural selection, predation, and r-selection in reproductive strategies continue without supervision and lack of intervention (either more intentionally or more unintentionally). However, more intervention is required in the biological human case to prevent them from trying to take control over that which causes them suffering, prevent them from trying to use resources to decrease the disvalue in their lives and increase the value in it and prevent them from trying to cease to exist via ceasing to reproduce or by suicide. 

● Uncertainty about the presence of suffering in some cases: In the mentioned case in which the agent is uncertain about whether they are causing astronomical suffering but also acts anyway, they may not choose to act in such a way if they knew that they were causing astronomical suffering. It is (and might probably continue to be) much more unclear whether certain animals living in nature or certain possible digital minds are enduring suffering than whether a given (biological) human is enduring suffering. The latter one may be able to communicate this fact and resembles us much more, and thus would be less likely to be the victims in the cases the agent causing the suffering is uncertain about whether they are doing so. 

All these factors together with the fact that (biological) humans would be less likely to be disregarded make it substantially less likely that they would be the victims of s-risks compared to the likelihood that sentient non-humans would be such victims in a future controlled by an ASI with an ambitious value leaning proposal that did not directly coherently extrapolate their volitions. 

In conclusion, there are two ways of justifying why we would have pressing reasons to implement an ambitious value learning proposal such as SCEV instead of the standard CEV proposal if we were able to do so. On the one hand, in the same way that there are strong moral reasons to implement CEV instead of a proposal for ambitious value learning such as CEO-CEV or men-CEV, there are also strong moral reasons to implement SCEV instead of CEV. On the other hand, since there is (at least) a non-negligible chance that the interests of sentient non-humans are highly disregarded by an ASI with the standard CEV proposal, and in such a future where the interests of sentient non-humans are highly disregarded it is substantially likely that sentient non-humans suffer s-risks, there is (at least) a non-negligible chance that the implementation of the standard CEV proposal results in s-risks for sentient non-humans from the own ASI’s actions. Since s-risks are plausibly the worst possible outcomes and extremely morally undesirable under plausible normative views, even a (at least) non-negligible chance of their occurrence is very bad. Then, the standard CEV proposal that excludes the immense majority of present and future sentient beings is very morally undesirable if a better alternative is available. This increased likelihood of s-risks from the ASI’s behaviour due to implementing certain kinds of ambitious value learning (as I shall argue in the following section) can be significantly reduced by implementing a truly sentientist solution to the value specification problem. 

3. An alternative sentientist proposal for ambitious value learning 

Here, I shall present Sentientist Coherent Extrapolated Volition, an ambitious value learning alternative to CEV. This proposal consists of coherently extrapolating the volitions of all currently existing and future sentient beings. That is, to include all moral patients in the extrapolation base of the CEV. 

Sentientist Coherent Extrapolated Volition: the goal of fulfilling what all (affectable i.e. present and future) sentient beings would agree that they want, if given much longer to think about it, in more ideal circumstances. 

Even though this is very ambitious, the standard CEV proposal also is, even if it is a bit less so. And if the standard CEV proposal is in-principle doable, I see no reason to believe that this proposal is not in-principle doable as well. For the standard CEV proposal to be realized, it would have to be determined what humans’ desires would look like if we knew more, thought faster, were more the people we wished we were and had grown up farther together among other things (Yudkowsky, 2014). This would require determining what we would want if we had different capabilities than the ones we currently have. It would require determining what we would value if we had superhuman capabilities in many respects, such as, among other things, not being affected by cognitive biases, and being able to process and comprehend more amounts and different kinds of information (Yudkowsky, 2014). 

Concretely, this would or could consist of (among other things) creating a “detailed model of each individual mind, in as much detail as necessary to guess the class of volition-extrapolating transformations defined by “knew more,” “thought faster,” etc.” (Yudkowsky, 2004). The objective would be to approximate and understand the kinds of idealised changes that each mind would undergo if volition-extrapolating upgrades (such as knowing more, and having more time to think) were performed on it. It is important to understand that the proposal is not for these volition-extrapolating upgrades to be directly performed by the ASI on the minds of the individuals included in the extrapolation base by physically intervening in their brains. Instead, the proposal would be to perform these changes on detailed models of the minds of all the individuals in the extrapolation base which would be generated by the ASI. Neither CEV nor SCEV proposes that the volition-extrapolating upgrades are directly performed on the minds of the individuals in the extrapolation base. This would just result in all of them (humans and non-humans) being suddenly and directly uplifted and enhanced to similar levels of cognitive capability as that of the ASI. Once the ASI had generated sufficiently detailed models of each individual mind, and had applied volition-extrapolating upgrades to each of these models, their resulting coherent preferences would be the ones that would guide the ASI’s behaviour. 

If with more advanced technology, these specific processes are in principle doable and understandable in human minds it is likely that they could also be done with more simple minds such as those of many non-human animals. And, it is also plausible that we could at least have a more-than-zero probabilistic approximation of what these idealized changes would look like for possible future digital minds after having undergone volition-extrapolating upgrades. Then, it seems likely that if the process required for coherently extrapolating the volitions of humans is theoretically possible, it is also so with the volitions of sentient non-humans. And even if we could not in practice reach a fully idealized state of extrapolation or a state of full coherence with and between (both human and non-human) volitions, what is relevant is that we get as close as we can to achieving these. Doing so would be much more preferable than a proposal that did not result in any coherence between volitions, and that did not extrapolate the volitions at all. 

All present and future sentient beings, including humans, both domesticated and non-domesticated non-human animals and possible future digital minds, have goals by the mere fact of being sentient. At least, in the broadest sense possible, they have desires and preferences to have positive phenomenal experiences and be free from negative ones. And many of them might have goals much richer and more complex than these. All of these goals form part of each of their volitions, which could and should be coherently extrapolated. They could in principle be extrapolated as human volitions can even if many sentient non-humans possess certain cognitive capacities to a lesser degree than humans usually do. This is so because as with human volitions, we could also (among other things) determine the changes that different kinds of sentient non-human minds undergo when volition-extrapolating transformations are performed on them. Even if many sentient non-humans do not possess the capacity to do certain types of abstract thinking about their goals or reflect on their values, humans also do not possess the capacities necessary to determine the content of their extrapolated volitions. In both cases, volition-extrapolating transformations would enhance the capacities of both sentient humans and non-humans to adequately determine the content of their extrapolated volitions. 

In modelling what changes in preferences would occur if volition-extrapolating transformations were applied to the volitions of existing non-human animals, such as squirrels, SCEV may arrive at models of extrapolated volitions with weird preferences. We currently have no idea what they would look like. However, this is not enough to suggest that we should not include squirrels or other non-humans in the extrapolation base. In fact, it has been argued that there are good reasons to uplift the (cognitive) capacities of non-human animals (at least) to similar levels than the ones had by most humans for democratic reasons (Paez & Magaña, 2023), for Rawlsian liberal-egalitarian reasons (Dvorsky 2008), and anti-speciesist reasons (Chan, 2009). And, as I will argue in Section 4.3. We have no good reasons to believe that implementing SCEV or enhancing their capabilities would cause them to cease to exist, but rather, can be done while preserving their identity (Paez & Magaña, 2023). Furthermore, as I show in Section 4.1. it is implausible to believe that including non-human animals in the extrapolation base could result in us being harmed by the weird preferences their extrapolated volitions might have. And thus, since, it remains the case that by not including them we are likely increasing the risk of astronomical suffering, we have strong reasons to do so even if we are highly uncertain about the content of their extrapolated volitions. 

3.1. A solution to the standing problem 

Since, as mentioned above, we cannot leave the solution to the standing problem (the issue of whose interests or volitions are included in the extrapolation base) to be figured out by the ASI, we should take into account uncertainty when making such a decision ourselves. What is relevant in determining what entities should morally be taken into account, and thus, also be included in some way or another in the extrapolation base, is not whether or not we can be sure that they possess the relevant traits for moral patienthood. Against what Yudkowsky has contended, the fact that we could be wrong about whether a given entity deserves moral consideration is not a sufficient reason to exclude them from our moral consideration or form the extrapolation base (Yudkowsky, 2016). Instead, what is relevant is whether there is a non-negligible chance that they possess the relevant traits for moral patienthood. This is so because if there is some chance that a given entity can be harmed in morally relevant ways by our actions, it is wrong to perform those actions all else equal. Because some chance of harm is worse than no harm. Thus, what is relevant in determining which entities should be morally taken into account is whether there is a non-negligible chance that they are sentient (Sebo, 2018). Furthermore, as we have seen, being included or not in the extrapolation base can indeed directly affect the extent to which (in expectation) the interests of given beings are respected and taken into account by the ASI’s behaviour. Then, all entities for which there is a non-negligible chance that they are sentient should be included in the extrapolation base. We have very strong reasons to believe that there is at least a non-negligible probability that almost all present and future humans, non-human animals and possible future digital minds are moral patients, and thus should directly be included in the extrapolation base. This is so because of three main reasons. First, there is a wide consensus and strong evidence on the sentience of vertebrate animals and many invertebrate animals (Low, 2012; Proctor et al., 2013; The Universal Declaration on Animal Welfare, 2007; Waldhorn, 2019a; Waldhorn, 2019b), and there is wide agreement that artificial entities, such as digital minds, with the capacity for sentience will or could be created in the future (Harris & Anthis, 2021). Secondly, sentience in the sense understood above as the capacity of having positively or negatively valenced phenomenally conscious experiences is widely regarded and accepted as a sufficient condition for moral patienthood (Clarke, S., Zohny, H. & Savulescu, J., 2021). And thirdly, it is also the case that it seems very plausible and it is also widely accepted that the intrinsic value of good things or bad things or the intrinsic moral importance of respecting and considering the interests of moral patients cannot be discounted solely because they occur in the future, see (Greaves & MacAskill, 2021: p.18; Ord, 2020: p.52; Beckstead, 2013: p.18) for acceptance of this view in the literature. Then, all present and future entities for which there is (at least) a non-negligible chance that they are sentient should directly be included in the extrapolation base. 

3.2. How to include the extrapolated volitions of future not-yet-existing sentient beings 

According to SCEV, apart from currently existing sentient beings, all future sentient beings should have their volitions directly included in the extrapolation base, since then the probability of s-risks from the ASI’s actions would be significantly reduced. But, how exactly should the volitions of future not-yet-existing sentient beings be directly included in the extrapolation base? At any given point in time t, the ASI should take those actions that would in expectation most fulfil the coherent extrapolated volition of all sentient beings that exist in t. It is the case that many (or almost all) actions that the ASI could or would take would change the kinds and number of sentient beings that would exist after it had performed the given action(s). Due to this fact, after any given action the ASI would have to incorporate the extrapolated volitions of the new currently existing sentient beings if there are any, and then decide how to act again based on the coherent extrapolated volitions of all existing beings at that point in time. 

It is important to realize that other kinds of proposals about how to take into account future not-yet-existing sentient beings, might have disastrous consequences. For instance imagine the following somewhat more straightforward proposal: the ASI should take those actions that would in expectation most fulfil the coherent extrapolated volition of all beings that would in expectation exist after its action. If this proposal were to be implemented, it would likely result in a reward hacking of the utility function. If the utility function of the ASI would be to maximize the fulfilment of the coherent extrapolated volition of all the sentient beings that would in expectation exist after its actions, then, the ASI could just create many digital minds with very simple extrapolated volitions to satisfy. If a sufficiently large number of these kinds of digital minds were created by the ASI’s actions the force of their extrapolated volitions could be many orders of magnitude greater than the force of the extrapolated volitions of currently existing human and non-human animals in guiding the ASI’s behaviour. This could result in the ASI disregarding sentient human and non-human animals and only performing those actions that would most satisfy the easy-to-satisfy preferences of the created digital minds. And these easy-to-satisfy preferences need not be preferences that seemed valuable to us (nor to our extrapolated volitions) in any respect. This alternative proposal of how to include future not-yet-existing sentient beings in the extrapolation base would likely lead to very undesirable outcomes. It is naive and flawed, because, at any given point in time it lets the ASI itself determine what new sentient beings come into existence (independently of the preferences of the extrapolated volitions of the already included individuals). Contrary to this, on the proposal I have put forward, what sentient beings come into existence, and shape the ASI’s behaviour by being included once they come to exist, is entirely dependent on the preferences of the extrapolated volitions of the currently included individuals. Since the extrapolated volitions would know how SCEV works, they would be aware that in preferring a certain action or another they are also preferring the creation of some kinds of minds or others. Furthermore, they would also be aware that once they come to exist, they will be included in the extrapolation base and thus also affect the ASI’s behaviour. Then, the extrapolated volitions would know better than to perform actions leading to scenarios similar to the one in which immense numbers of digital minds with invaluable and easy-to-satisfy preferences have been created and have complete control over the ASI’s behaviour. Because of this, in the original proposal I uphold, the reward hacking is prevented. 

This proposal would adequately take into account the interests of future sentient beings since, once they began to exist, their interests would directly be included in the ASI’s utility function. Contrary to what seems to be the case in the standard CEV proposal, the interests of future not-yet-existing sentient beings, once they exist, would not be taken into account merely to the extent to which the extrapolated volitions of currently existing individuals desire to do so. And, by taking directly into account the interests of future sentient non-humans in this manner, the non-negligible chance of s-risks from the ASI’s actions as a result of an adequate implementation of the standard CEV proposal would be significantly reduced. 

The s-risks from the own ASI’s actions are substantially lower when implementing SCEV because it is no longer the case that their prevention could only be caused by the extent to which humans who had their volitions coherently extrapolated cared about the suffering of the sentient non-human victims of the s-risks. Rather, the astronomical amount of volitions of the sentient non-humans themselves would directly be included in the extrapolation base. On this proposal, there still is (at least) a non-negligible chance that humans’ extrapolated volitions would want to perform actions that would (without further intervention) lead to s-risks such as direct panspermia, or the use of simulation technologies. As a consequence, there is still some chance that these kinds of actions were actually performed by the ASI. However, once they were being performed, once new sentient beings such as non-human animals living in “natural” environments on other planets or digital minds came into existence, their extrapolated volitions would be directly included in the extrapolation base. And since their volitions would be extremely opposed to suffering immensely, the ASI would almost certainly prevent the astronomical suffering. Once the would-be victims of s-risks were to come into existence, their interests in not suffering would be directly taken into account and reflected in the ASI’s behaviour. The astronomical suffering would almost certainly be prevented by the ASI and it would as with any other beings already present in the extrapolation base, try to fulfil the extrapolated volitions of the new existing beings. If the volitions of future sentient non-humans were included in the extrapolation base, as SCEV proposes, the occurrence of s-risks from the ASI’s actions (i.e. what it causes or allows) would almost become impossible. Therefore, we have very strong pro-tanto moral reasons to implement SCEV to guide the behaviour of a possible ASI instead of only coherently extrapolating the volitions of currently existing humans (a minority of all the affectable moral patients expected to exist). 

Finally, it should also be noted that this proposal of SCEV (as CEV) is not intended as a realist theory of morality, it is not a description of the metaphysical nature of what constitutes the ‘good’. I am not proposing a metaethical theory but merely what would be the most morally desirable ambitious value learning proposal for an ASI. It is not the case that if moral realism is true, SCEV or CEV would necessarily arrive at this moral truth. Thus, (as CEV) SCEV and the arguments in favour of it are compatible both with a realist and an anti-realist conception of meta-ethics. 

4. Objections 

In this fourth section, I shall present and respond to three possible objections against the (pro-tanto) rejection of CEV and my proposal of Sentientist Coherent Extrapolated Volition. 

4.1. The Risk from Predators’ Violent Predispositions 

Yudkowsky has contended that a reason against directly including sentient non-humans such as non-human animals in the extrapolation base is that it may result in consequences we do not like as a result of how different the non-human volitions are from ours (Yudkowsky, 2016). Then, it might be argued that it is not the case that SCEV is preferable to CEV since there is also the possibility that it might lead to very morally undesirable goals for the ASI to pursue, such as violent goals, since, for instance, many non-human animals living in the wild have goals and preferences in favour of predation. Given the very considerable degree of uncertainty we currently face regarding the exact nature of what would be CEV’s extrapolation process in practice, we cannot completely rule out the possibility that this could have significant negative consequences. Then, it seems possible that the ASI could have weird, bad or violent goals as a result of some of the desires had by non-human animals living in the wild. 

Even though it is true that we cannot be completely certain about this, I believe it is very plausible that this risk and its badness are very significantly lower than the s-risk from implementing the normal CEV proposal. When implementing CEV, the only possible preferences against sentient non-humans enduring astronomical suffering are those had by humans who to some extent care about the suffering of sentient non-humans. While, in implementing SCEV, all the volitions of the victims of the possible s-risks as a result of the violent preferences of predators, would count against the occurrence of such s-risks. By the mere fact of being sentient beings and being included in the extrapolation base, the volitions of the possible victims of such strange s-risks would primarily and almost entirely consist of the desire to prevent such suffering. Indeed as mentioned above, the desire to be free from immense amounts of intense and unbearable suffering is probably the strongest possible desire any sentient being can have, and thus since in an s-risk there would be astronomical amounts of beings having these desires, this would count extremely heavily against the occurrence of any kind of possible s-risks including the ones due to predatory dispositions. 

Due to empirical uncertainty about what entities are sentient and black swans, it is likely that we can never be sure that the future occurrence of s-risks is completely impossible while sentient beings continue to exist and we expand through the universe or have sufficient computational power. But plausibly, one of the best ways to ensure that this does not happen would be to have an ASI that would truly take into consideration the interests of any of the victims of the possible s-risks. Demanding more certainty against the occurrence of s-risks than this is unwarranted, since it may indeed not be possible. Even if there is a very tiny probability that an s-risk could occur as a result of the predators’ corrupted preferences, this probability is negligible. It would be unreasonable and too demanding to dismiss an ambitious value-learning proposal because it cannot with complete certainty rule out the possibility of the future occurrence of any kind of s-risk. And, since the probability and badness of the occurrence of bad outcomes as a result of predator’s violent preferences is significantly lower than the (at least) non-negligible probability of the occurrence of s-risks as a result of implementing CEV, we would still have very strong reasons in favour of choosing SCEV if we could do so. 

4.2. Democratic illegitimacy and being jerkish 

Another possible counterargument to the general project of this paper of arguing for the importance of including the interests of sentient non-humans to guide the behaviour of a future ASI is that doing so would be anti-democratic. Most people indeed would likely not want nor vote in favour of implementing a value alignment proposal that equally takes into account the interests of non-human sentient beings. 

This argument, however, assumes that only humans have claims in favour of having their interests represented in democratic procedures. This might be a result of speciesist bias or as a result of holding the previously mentioned view that only the interests of moral agents matter in a morally relevant way. However, all sentient beings could be positively or negatively affected by an ASI and accordingly, all of them have claims in favour of having their interests taken into account. It is unreasonable to believe, as we have seen, that those interests could be adequately represented only to the extent to which currently existing humans that are moral agents care about them. Thus, in this case, the principle of the all-affected interests in democratic theory clearly applies (Christiano & Sameer, 2022), and all sentient beings would have strong democratic claims in favour of having their interests directly represented. 

This kind of objection is raised by Yudkowsky in a different manner. In his view, programmers or the implementors of a value learning proposal into an ASI should try not to be “jerks”. And in doing so they should try to be fair and take into account the preferences of people that do not want all sentient beings to be included, by not directly including all sentient beings in the extrapolation base. Doing so, in Yudkowsky’s view, would not be impartial since it would be rooting for the preferences of people who, for instance, care more about non-human animals (Yudkowsky, 2016). Contrary to this, however, Yudkowsky holds that it would, in fact, be jerkish to not include humans who, as sentient non-humans, are powerless in relation to determining what value alignment proposal is implemented. These humans possibly include children, existing people who've never heard about AI or people with severe physical or cognitive disabilities unable to act on and express their own views on the topic. 

However, as seen above, it is not the case that there are no reasons to include sentient non-humans since they too can be positively or negatively affected in morally relevant ways by being included in the extrapolation base or not. The fact that many of the parties that have some power over which value learning proposal is implemented (i.e. some humans) do not care about these reasons does not mean that they hold no weight. It is not completely clear what Yudkowsky means by “jerkish” or “to be jerks”, however, if it is to be understood colloquially, in the same manner, that it would be unfair and jerkish for powerless humans to be excluded, it is also unfair and jerkish for sentient non-humans to be excluded. Today, due to a certain amount of societal moral progress and expansion in the group of beings that society takes seriously into moral consideration, we tend to seriously include both powerful and powerless humans, and thus it seems intuitively jerkish to exclude them from the extrapolation base. But a few years ago it plausibly would not have felt this way. Imagine that the decision of which beings should be included in the extrapolation base had taken place many years ago. If they could do so, it would have been much more preferable that when considering what entities to include, these past decision-makers actually looked at the most plausible moral reasons in deciding what to do instead of doing that which was most socially and intuitively acceptable, which would have resulted in excluding many powerless humans. The same is preferable right now and in the future. If someone is able to decide what entities to include in the extrapolation base, they should look at the moral reasons in favour of including various kinds of entities even if they might not be the most intuitively or socially acceptable to include from the perspective of those in power. And as we have seen there are indeed strong reasons to include all sentient beings. A proposal that excludes sentient beings, then, is not impartial, but rather, would be a jerkish and unfair proposal. 

Indeed, if Yudkowsky is right that it is important to exclude some sentient beings from the extrapolation base, this presumably is because the kinds and number of beings included in it can likely affect the ASI’s behaviour. Then, this means that, in expectation, it is not the case that the interests of sentient non-humans would equally be taken into consideration in the ASI’s behaviour independently of whether they are directly included in the extrapolation base or not. 

Furthermore, Yudkowsky also contends that only including existing humans in the extrapolation base is more simple (Yudkowsky, 2016). While this is the case, it is not a reason in favour of the moral desirability of the proposal if we are able to successfully include all sentient beings. The fact that CEV is simpler than SCEV does not make it more morally desirable all else equal. Since doing nothing or only coherently extrapolating one volition is simpler than CEV but it is not more morally desirable. 

Because of this, against the objection, it is not the case that pursuing this proposal would be anti-democratic, jerkish or unfair, but rather it would be a much more democratic, nicer and fair option. Implementing CEV instead of SCEV would not do justice to the direct democratic and moral claims of sentient non-humans. 

4.3. Would SCEV preserve the identity of sentient beings? 

It has been argued that the implementation of CEV would produce undesirable results because it would cause humans to cease to exist (Yampolskiy, 2022). Roman Yampolskiy has argued that by implementing CEV “[w]e would essentially agree to replace ourselves with an enhanced version of humanity as designed by AI”, since, with the quick extrapolation jump involved in CEV, there would not be any continuity with our identity, and we would cease to exist (Yampolskiy, 2022). He presents the following argument: “we can say that current version of humanity is H₀, the extrapolation process will take it to H₁₀₀₀₀₀₀₀. A quick replacement of our values by values of H₁₀₀₀₀₀₀₀ would not be acceptable to H₀ and so necessitate actual replacement, or at least rewiring/modification of H₀ with H₁₀₀₀₀₀₀₀, meaning modern people will seize to exist”. This argument can also be applied to criticise SCEV. It is indeed the case that some sorts of radical enhancements to the capabilities of humans, animals and digital minds can cause them to cease to exist by not being identity-preserving. However, radical enhancements to the individuals in the extrapolation base need not occur at all by implementing SCEV, and if they do (as it has been argued) they could occur in ways that are identity-preserving and aligned with the good of each of the individuals (Bostrom 2008: p.123–126; Paez & Magaña, 2023: p.23–25). 

As Yampolskiy suggests, it is indeed the case that by performing volition-extrapolating upgrades to the models of the minds and volitions of the individuals in the extrapolation base, the resulting models will have different preferences and values from our own. However, this is in part what we are trying to accomplish when implementing a proposal such as CEV or SCEV, we are trying to prevent value lock-in (Yudkowsky, 2004; MacAskill, 2022). If we were to implement current values to direct the behaviour of an ASI, this would almost certainly constitute an immense moral tragedy, since all future generations would be stuck with antiquated values. It seems clear that it would have been a tragedy if past human civilizations from Ancient History or the Middle Ages locked in their values, and they persisted and were imposed for all of humanity’s future by an ASI. Similarly, we should not believe that current values constitute the peak of moral progress and that there is no more progress to be made (Yudkowsky, 2004). 

There would likely be a very significant distance between the values and preferences of the models of the coherently extrapolated volitions of sentient beings and the values and preferences of currently existing beings. However, since it prevents value-lock-in, the existence of this distance is preferable to its non-existence, and it would not cause currently existing beings to stop to exist. What would occur is that models of our minds and volitions would be created to guide the ASI’s behaviour. These models would know more, reason better, and be more cooperative. As a result, they would necessarily have better justified, more coherent, and reasonable ethical views than the once we have. These better-informed, more thought-through and more justified ethical views would then guide the ASI’s behaviour. It is implausible to believe, as this objection seems to imply, that what these better-informed, more thought-through and more justified ethical views would necessarily prefer is to kill all sentient beings and create new ones. One would only have reasons to believe that SCEV or CEV would recommend this if one already believes that killing all sentient beings or all humans and replacing them with new ones is what ought to be done. Disagreeing and discussing what ethical views or preferences would be had by the models is the same as discussing what ethical views are more reasonable, coherent, well-informed, etc. And the view that what ought to be done instead of any other physically possible option, is to kill all sentient beings and replace them with new ones is highly implausible and nowhere seriously defended. It is much more reasonable to believe that instead, the better-informed and more reasonable models could prefer for sentient beings not to be enhanced at all. Or could prefer that we could be enhanced in just, inclusive, equitable, autonomy-respecting and identity-preserving ways as it is prescribed by contemporary transhumanism (The Transhumanist Declaration, 2009; Savulescu & Bostrom, 2009). If this were done, the enhancements performed need not cause current sentient beings to cease to exist, since, as it has been argued, radical enhancements can be identity-preserving both for human and non-human animals if certain conditions are met (Paez & Magaña, 2023: p.23–25). 

There are two major sets of theories about the kind of entities sentient individuals are and about their persistence conditions over time (Paez & Magaña, 2023: p.23–25). On the Somatic account, sentient beings are living organisms, and their persistence over time consists in maintaining the integrity of their metabolic processes (Olson, 1997; van Inwagen, 1990). On this view, there would be ways of performing, radical enhancements both onto human and non-human animals that would preserve their identity, one could enhance many of their capabilities without disrupting their metabolic processes. This account is about individuals with a biological substrate and thus does not apply to digital minds. But an analogous account, applied to digital minds could claim that maintaining the integrity of certain physical structural features of the digital substrate would be necessary and sufficient for them to persist over time. An account like this would also allow enhancements to be compatible with the persistence of digital minds over time. On the other major account, the Psychological Account, sentient beings are psychological entities who continue to exist only if they have psychological continuity over time (DeGrazia 2005; McMahan, 2002; Parfit, 1984). As it has been argued (Bostrom 2008: p.123–126), this account is also compatible with enhancements being identity-preserving if the following conditions are met: the changes are in the form of adding new capacities or enhancement of old ones, without sacrificing preexisting capacities. They are implemented gradually over an extended period of time, and the new capacities do not prevent the preexisting capacities from being periodically exercised. Furthermore, the subject retains her old memories and many of her basic desires and dispositions and the subject retains many of her old personal relationships and social connections. And finally, in the case of humans and some sufficiently cognitively sophisticated digital minds each step of the transformation process is freely and competently chosen by the subject and the transformation fits into the life narrative and self-conception of the subject (Bostrom 2008: p.123–126). In the case of non-human animals and other digital minds who cannot freely choose and comprehend each step of the uplifting process, instead of these final conditions, “one may, alternatively, require the uplifting process not to undermine their control over their lives, irrespective of whether animals [or digital minds] can understand that possibility” (Paez & Magaña, 2023: p.23–25). This, then would not undermine their control over their existence but rather make it more robust. For a more developed discussion on why enhancements can be made compatible with non-human animals preserving their identity, see (Paez & Magaña, 2023: p.23–25) from where this paragraph is based. 

Since we have good reasons to believe that radical enhancements to sentient beings can be identity-preserving, it is implausible to believe that instead of performing this kind of enhancements, if any, the more reasonable, coherent and well-informed ethical views had by the extrapolated models of sentient beings would prefer to kill all sentient beings by performing non-identity-preserving enhancement interventions instead. There are no good reasons to believe that this is preferable, and even if for some reason one were to believe so, it then would not constitute an objection to implementing SCEV, since, one would presumably prefer and welcome that which one believes to be preferable. The fact that SCEV would inscribe the values of S₁₀₀₀₀₀₀₀ (to represent all sentient beings instead of only humans) into the behaviour of the ASI, does not imply either the replacement or rewiring/modification of S₀ with S₁₀₀₀₀₀₀₀. Thus, we have no good reason to believe that SCEV would not preserve the identity of sentient beings. 

5. Why it is unclear whether trying to implement Sententist Coherent Extrapolated Volition is best all things considered 

I have argued that there are strong pro-tanto reasons to implement SCEV instead of CEV if we could do so successfully due to the fact that it would significantly reduce the non-negligible chance of s-risks from the own ASI’s behaviour that there is from an adequate implementation of CEV. However, in practice, it is not clear that these reasons are not overridden or defeated by other reasons against SCEV. In this section, I will lay out other possible pro-tanto reasons that go against trying to implement SCEV or a similar value learning proposal instead of implementing one which is closer to CEV. 

First, there is the risk that the SCEV would not be implemented exactly as intended. When trying to implement SCEV there is always the possibility that we would not get everything right, and that, by accident, there could be unintended consequences. For some kinds of accidents from near misses that could occur, it seems plausible that the more morally desirable the ambitious value learning proposal, the worse the accidents that may result from trying to implement it. This is so because there are some possible kinds of near-miss accidents where the goals of the ASI identify those entities and things which can sustain value, but affect them in strange or bad ways contrary to desirable behaviour the ASI was intended to have as stipulated in the morally desirable ambitious value learning proposal. There are plausibly many possible ways in which these kinds of accidents could occur. One specific accident of this kind is if, through a value learning proposal such as SCEV, the interests of digital minds are taken into account and valued and as a consequence, many of them are created (as argued by Shulman & Bostrom, 2021), but we accidentally do not weight their suffering adequately. It is possible that even if the SCEV proposal intended to care about the suffering of digital minds, it accidentally was not able to adequately detect suffering and disvaluble states that many forms of alien digital minds might suffer (Vinding, 2018; Tomasik, 2019b). Or that it may not adequately weigh their interests by accident. This could potentially result in astronomical suffering. Such a kind of accident would be less likely if a less morally desirable ambitious value learning proposal were implemented since it would be less likely to create astronomical amounts of digital minds with the potential of suffering astronomically by accident. Another possible accident of this kind is the case of SignFlip, where the ambitious value learning proposal identifies a very morally desirable target or goal to maximize, but, by accident, the opposite target or goal is maximized (Tomasik, 2019b). In the case of SCEV, this would consist of maximizing the opposite of the coherent extrapolated volition of all sentient beings which would be extremely morally undesirable. Since there is some probability of the occurrence of these kinds of accidents, and some of the badness of the accidents would be mitigated by implementing a less morally desirable value learning proposal instead of SCEV, this gives us some other reasons against trying to implement a value learning proposal such as SCEV which comes very close or completely captures that which we value. 

There are also further reasons against trying to implement SCEV in practice related to relevant game-theoretic considerations in specific decision-making situations. One possible and likely decision-making situation is one in which we ought to decide between implementing CEV or SCEV to a single ASI, but where the conditions for implementing SCEV instead of CEV are not fully optimal, where it is not the case that only one united decision-making entity is able to decide between the proposals, but rather, where there are already some decision makers that strongly prefer CEV instead of SCEV. In many of these kinds of decision-making situations, it may be much less desirable to try to implement a value learning proposal that adequately takes into account the interests of all sentient beings. Implementing SCEV may have even worse consequences in expectations for sentient non-humans due to a backlash from the opposition that may arise in trying to pursue this proposal. There is then, a strong reason against trying to implement SCEV due to the fact that, in practice, it might be net-negative in expectation. 

However, it is not completely clear that this will be the case, and we might indeed find ourselves in future in which the pro-tanto reasons against trying to implement SCEV do not outweigh the pro-tanto reasons in favour of doing it. And, even if we can conceive of plausible future scenarios in which it seems to be the case that trying to pursue a proposal similar to CEV instead of SCEV would be preferable all things considered, to take that decision as a result of an adequate cost-benefit analysis of the different proposals it is crucial to understand the strong pro-tanto reasons in favour of SCEV that I have laid out in this paper. 

6. Conclusion 

In this paper, I have shown why we have some very strong pro-tanto reasons in favour of implementing SCEV instead of CEV. This is the case even if, all things considered, it is still ultimately unclear whether what is best is to try to implement SCEV or another proposal more similar to CEV. The action-relevant implications of what I have laid out in this paper, however, are not all contingent on the realization of the future in which we can fully decide what ambitious value learning proposals to try to implement. Even if we could be sure such a future will not be realized, there would still be some practical implications that follow. Determining how much more morally desirable SCEV is than CEV due to the strong pro-tanto reasons presented in the paper, is crucial to adequately make tradeoffs between different considerations in non-ideal decision-making situations where a value learning proposal has to be implemented. Furthermore, it is likely that the mere fact of having in mind what a possibly ideal scenario of adequate value specification would look like is useful in determining what we should realistically strive for if we could only reach more modest goals. Research into how different alignment and value learning proposals for possible future transformative AIs such as an AGI or ASI could affect sentient non-humans (who constitute the immense majority of present and future sentient beings expected to exist) is highly neglected. More research along these lines is required if we want to ensure that the far future goes well for all sentient beings. 

Acknowledgements: 

I am deeply grateful, for their suggestions, encouragement, and support to Olle Häggström, @Anthony DiGiovanni, Magnus Vinding, @JuliaHP, @Lukas_Gloor, Aluenda Diana Smeeton and an anonymous reviewer of the Journal of Artificial Intelligence and Consciousness. In part, this paper was conducted as part of an impact project at Future Academy, I am also very grateful to the organizers for having given me this opportunity.

References 

Althaus D. & Gloor L. (2016). Reducing Risks of Astronomical Suffering: A Neglected Priority. Center on Long-Term Risk. https://longtermrisk.org/reducing-risks-of-astronomical-suffering-a-neglected-priority/  

Baum, S. D. (2020). Social Choice Ethics in Artificial Intelligence. AI & Society, 35(1): 165–176. DOI: 10.1007/s00146-017-0760-1. 

Baumann, T. (2017a). S-risks: An introduction. Center for Reducing Suffering. https://centerforreducingsuffering.org/research/intro/ 

Beckstead, N. (2013). On the Overwhelming Importance of Shaping the Far Future. PhD, Rutgers University. https://rucore.libraries.rutgers.edu/rutgers-lib/40469/ 

Bostrom, N. (2008). Why I Want to be a Post-Human When I Grow Up. In Medical Enhancement and Posthumanity. The International Library of Ethics, Law and Technology, edited by B. Gordijn and R. Chadwick, 2: 107–136. DOI: 10.1007/978-1-4020-8852-0_8 

Bostrom, N. & Savulescu, J. (eds.) (2009). Human Enhancement. Oxford University Press. 

Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford: Oxford University Press. 

Caviola, L., Everett, J. A. C., & Faber, N. S. (2018). The moral standing of animals: Towards a psychology of speciesism. Journal of Personality and Social Psychology, 116(6): 1011–1029. DOI: 10.1037/pspp0000182 

Chan, S. (2009). Should we enhance animals? Journal of Medical Ethics, 35 (11): 678–683. DOI: 10.1136/jme.2009.029512 

Christiano, T. & Sameer B. (2022). Democracy. The Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/archives/spr2022/entries/democracy/ 

Christian, B. (2022). The Alignment Problem: Machine Learning and Human Values. W. W. Norton & Company 

Clarke, S., Zohny, H. & Savulescu, J. (2021) Rethinking Moral Status (eds.). Oxford: Oxford University Press. 

DeGrazia, D. (2005). Human Identity and Bioethics. New York: Cambridge University Press. 

Dhont, K., Hodson, G., & Leite, A. C. (2016). Common ideological roots of speciesism and generalized ethnic prejudice: The social dominance human-animal relations model (SD-HARM): The social dominance human-animal relations model. European Journal of Personality, 30(6): 507–522. DOI: 10.1002/per.2069 

Dvorsky, G. (2008). All Together Now: Developmental and ethical considerations for biologically uplifting nonhuman animals. Journal of Evolution and Technology, 18(1): 129–142. Available at: https://jetpress.org/v18/dvorsky.htm 

Gabriel, I. (2020). Artificial Intelligence, Values, and Alignment. Minds & Machines, 30: 411–437. DOI: 10.1007/s11023-020-09539-2 

Greaves, H. & MacAskill, W. (2021). The Case for Strong Longtermism. Global Priorities Institute. https://globalprioritiesinstitute.org/hilary-greaves-william-macaskill-the-case-for-strong-longtermism-2/ 

Groff, Z. & Ng, Y. (2019). Does suffering dominate enjoyment in the animal kingdom? An update to welfare biology. Biology and Philosophy, 34(40). DOI: 10.1007/s10539-019-9692-0 

Harris, J. & Anthis, J.R. (2021) The Moral Consideration of Artificial Entities: A Literature Review. Science and Engineering Ethics, 27: 53. DOI: 10.1007/s11948-021-00331-8 

Low, P. (2012). The Cambridge Declaration on Consciousness. The Francis Crick Memorial Conference on Consciousness in Human and non-Human Animals. Cambridge. https://fcmconference.org/img/CambridgeDeclarationOnConsciousness.pdf

McMahan, J. (2002). The Ethics of Killing: Problems at the Margins of Life. Oxford: Oxford University Press. 

Ng, Y. (1995). Towards Welfare Biology: Evolutionary Economics of Animal Consciousness and Suffering. Biology and Philosophy, 10(3): 255–85. DOI: 10.1007/BF00852469 

O'Brien, G.D. (2022). Directed Panspermia, Wild Animal Suffering, and the Ethics of World-Creation. Journal of Applied Philosophy, 39(1): 87–102. DOI: 10.1111/japp.12538 

Olson, E. T. (1997). The Human Animal. Personal Identity without Psychology. Oxford: Oxford University Press. 

Ord, T. (2020). The Precipice. Hachette Books. 

Paez, E. & Magaña, P. (2023). A democratic argument for animal uplifting. Inquiry: An Interdisciplinary Journal of Philosophy. DOI: 10.1080/0020174X.2023.2248618 

Parfit, D. (1984). Reasons and Persons. Oxford: Oxford University Press. 

Proctor H. S., Carder G. & Cornish A. R. (2013). Searching for Animal Sentience: A Systematic Review of the Scientific Literature. Animals (Basel), 3(3): 882–906. DOI: 10.3390/ani3030882 

Russell, S. (2019). Human Compatible: Artificial Intelligence and the Problem of Control. Viking. 

Sebo, J. (2018). The Moral Problem of Other Minds. The Harvard Review of Philosophy, 25(1): 51–70. DOI: 10.5840/harvardreview20185913 

Shulman, C. & Bostrom, N. (2021). Sharing the World with Digital Minds. In Clarke, S., Zohny, H. & Savulescu, J. (eds.) Rethinking Moral Status. Oxford: Oxford University Press. 

Soares, N. (2016). The Value Learning Problem. 2nd International Workshop on AI and Ethics, AAAI-2016. Phoenix, Arizona. https://intelligence.org/files/ValueLearningProblem.pdf 

Sotala, K. (2012). Advantages of Artificial Intelligences, Uploads, and Digital Minds. International Journal of Machine Consciousness 4 (1): 275–291. DOI: 10.1142/S1793843012400161 

Sotala, K. & Gloor, L. (2017). Superintelligence as a Cause or Cure for Risks of Astronomical Suffering. Informatica 41(2017): 389–400. https://www.informatica.si/index.php/informatica/article/view/1877 

The Transhumanist Declaration (2009). Humanity+. Available at: https://www.humanityplus.org/the-transhumanist-declaration 

The Universal Declaration on Animal Welfare, (2007). World Society for the Protection of Animals: https://www.worldanimalprotection.ca/sites/default/files/media/ca_-_en_files/case_for_a_udaw_tcm22-8305 .pdf 

Tomasik, B. (2015a). Artificial Intelligence and Its Implications for Future Suffering. Center on Long-Term Risk. https://longtermrisk.org/artificial-intelligence-and-its-implications-for-future-suffering 

Tomasik, B. (2015b). Reducing Risks of Astronomical Suffering: A Neglected Priority. Center on Long-Term Risk. https://longtermrisk.org/risks-of-astronomical-future-suffering/#Some_scenarios_for_future_suffering 

Tomasik, B. (2018). Will Space Colonization Multiply Wild-Animal Suffering? Essays on Reducing Suffering. https://reducing-suffering.org/will-space-colonization-multiply-wild-animal-suffering/ 

Tomasik, B. (2019a). What Are Suffering Subroutines? Essays on Reducing Suf ering. http://reducing-suffering.org/whatare-suffering-subroutines/ 

Tomasik, B. (2019b). Astronomical suffering from slightly misaligned artificial intelligence Essays on Reducing Suffering. https://reducing-suffering.org/near-miss/ 

Tomasik, B. (2020). The Importance of Wild-Animal Suffering. Center on Long-Term Risk. https://longtermrisk.org/the-importance-of-wild-animal-suffering/ 

van Inwagen, P. (1990). Material Beings. Ithaca: Cornell University Press. 

Vinding, M. (2018). Moral Circle Expansion Might Increase Future Suffering. https://magnusvinding.com/2018/09/04/moral-circle-expansion-might-increase-future-suffering/ 

Waldhorn, D. R. (2019a) Invertebrate sentience, summary of findings, part 1. Rethink Priorities. https://rethinkpriorities.org/publications/invertebrate-sentience-summary-of-findings-part-1 

Waldhorn, D. R. (2019b) Invertebrate sentience, summary of findings, part 2. Rethink Priorities. https://rethinkpriorities.org/publications/invertebrate-sentience-summary-of-findings-part-2 

Yampolskiy, R. V. (2022). On the Controllability of Artificial Intelligence: An Analysis of Limitations. Journal of Cyber Security and Mobility, 11(3): 321–404. DOI: 10.13052/jcsm2245-1439.1132 

Yudkowsky, E. (2004). Coherent extrapolated volition. Singularity Institute for Artificial Intelligence. https://intelligence.org/files/CEV.pdf 

Yudkowsky, E. (2008). Artificial intelligence as a positive and negative factor in global risk. In Bostrom, N. & Ćirković, M.M. (eds.) Global Catastrophic Risks. New York: Oxford University Press. 

Yudkowsky, E. (2016). Coherent extrapolated volition (alignment target). Arbital: AI Alignment. https://arbital.com/p/cev/

New Comment
31 comments, sorted by Click to highlight new comments since: Today at 6:34 AM
[-]simon5mo119

A thought experiment: the mildly xenophobic large alien civilization.

Imagine at some future time we encounter an expanding grabby aliens civilization. The civilization is much older and larger than ours, but cooperates poorly. Their individual members tend to have a mild distaste for the existence of aliens (such as us). It isn't that severe, but there are very many of them, so their total suffering at our existence and wish for us to die outweighs our own suffering if our AI killed us, and our own will to live.

They aren't going to kill us directly, because they co-operate poorly, individually don't care all that much, and defense has the advantage over offense.

But, in this case, the AI programmed as you proposed will kill us once if finds out about these mildly xenophobic aliens. How do you feel about that? And do you feel that, if I don't want to be killed in this scenario, that my opposition is unjustified? 

I like this thought experiment, but I feel like this points out a flaw in the concept of CEV in general, not SCEV in particular. 

If the entire future is determined by a singular set of values derived from an aggregation/extrapolation of the values of a group, then you would always run the risk of a "tyranny of the mob" kind of situation. 

If in CEV that group is specifically humans, it feels like all the author is calling for is expanding the franchise/inclusion to non-humans as well. 

Yes, and - other points may also be relevant:

(1) Whether there are possible scenarios like these in which the ASI cannot find a way to adequately satisfy all the extrapolated volition of the included beings is not clear. There might not be any such scenarios.

(2) If these scenarios are possible, it is also not clear how likely they are.

(3) There is a subset of s-risks and undesirable outcomes (those coming from cooperation failures between powerful agents) that are a problem to all ambitious value-alignment proposals, including CEV and SCEV.

(4) In part, because of 3, the conclusion of the paper is not that we should implement SCEV if possible all things considered, but rather that we have some strong pro-tanto reasons in favour of doing so. It still might be best not to do so all things considered.

Regarding NicholasKees' point about mob rule vs expansion, I wrote a reply that I moved to another comment.

In response to the points in the immediate parent comment:

You have to decide, at some point, what you are optimizing for. If you optimize for X, Y will potentially be sacrificed. Some conflicts might be resolvable but ultimately you are making a tradeoff somewhere.

And while you haven't taken over yet, other people have a voice as to whether they want to get sacrificed for such a trade-off. 

You argue that CEV should be expanded to SCEV in order to avoid "astronomical suffering (s-risks)". This seems to be a circular argument to me. We are deciding upon a set of beings to assign moral value to. By declaring that pain in animals is suffering that we have a moral duty to take into account, so we have to include them in the set of being we design our AIs to assign moral value too, you are presupposing that animals in fact have moral value: logically, your argument is circular. One could equally consistently declare that, say, non-human animals have no moral worth, so we are morally free to disregard their pain and not include it in our definition of "suffering" (or if you want to define the word 'suffer' to be a biological rather than a moral term, that we have no moral responsibility to care about their suffering because they're not in the set of beings that we have assigned moral worth to). Their pain carries moral value in our decision if and only if they have moral value, and this doesn't help us decide what set of beings to assign moral value to. This would clearly be a cold, heartless position, but it's just as logically consistent as the one you propose. (Similarly, one could equally logically consistently do the same for just males, or just people whose surname is Rockerfeller.) So what you are giving is actually an emotional argument "seeing animals suffer makes me feel bad, so we should do this" (which I have some sympathy for, it does the same thing to me too), rather than the logical argument that you are presenting it as.

This is a specific instance of a general phenomenon. Logical ethical arguments only make any sense in the context of a specific ethical system, just like mathematical logical arguments only make sense in the context of a specific set of mathematical axioms. Every ethical system prefers itself over all alternatives (by definition in its opinion the others all get at least some things wrong). So any time anyone makes what sounds like a logical ethical argument for preferring one ethical system over another, there are only three possibilities: their argument is a tautology, there's a flaw in their logic, or it's not in fact a logical ethical argument, it just sounds that way (normally it's in fact an emotional ethical argument argument). (The only exception to this is pointing out if an ethical system is not even internally logically consistent, i.e. doesn't make sense even on its own terms: that's a valid logical ethical argument.)

If that didn't make sense to you, try the first four paragraphs of the first post in my sequence on Ethics, or for a lot more detail see Roko's The Terrible, Horrible, No Good, Very Bad Truth About Morality and What To Do About It. You cannot use logical ethical arguments to choose between ethical systems: you're just pulling on your own bootstraps if you try. If you don't want to just pick an ethical system arbitrarily, you have to invoke something that boils down to something along the lines of  "I do/don't feel good about this rule, or its results, or I'm going to pick an ethical system that seems fit-for-purpose for a particular society". So basically the only way to make a decision about something like SCEV is based on feelings: does it offend the moral instincts that most humans have, and how would most humans feel about the consequences if a society used this ethical system (which generally depend a lot on what society we're talking about)? So you do need to think through consequences like providing vegetarian meals for predators and healthcare and birth control for insects, before picking an ethical system.

I am arguing that given that 

1. (non-human animals deserve moral consideration, and s-risk are bad (I assume this))

We have reasons to believe 2: (we have some pro-tanto reasons to include them in the process of value learning of an artificial superintelligence instead of only including humans). 

There are people (whose objections I address in the paper) that accept 1 but do not accept 2. 1 is not justified for the same reasons as 2. 2 is justified for the reasons I present in the paper. 1 is justified by other arguments about animal ethics and the badness of suffering that are intentionally not present in the paper, I cite the places/papers where 1 is argued instead of arguing for it myself in the paper which is standard practice in academic philosophy.

The people who believe 1 but not 2, do not only have different feelings than me, but their objections to my view are (very likely) wrong, as I show when responding to those objections in the objections section.

Given the importance of the word 'sentient' in your Sentientist Coherent Extrapolated Volition proposal, it would have been helpful if you had clearly defined this term. You make it clear that your definition of it includes non-human animals, so evidently you don't mean the same thing as "sapient". In a context including animals 'sentient' is most often used to mean something like "capable of feeling pain, having sensory impressions, etc," That doesn't have a very clear lower cutoff (is an amoeba sentient?), but would presumably include, for example, ants, which clearly have senses and experience pain. Would an ant get an equal amount of value/moral worth as a human in SCEV (as only seems fair)? It's estimated that the world population of ants is roughly 20 quadrillion, so if so, ants alone would morally outweigh all humans by a factor of well over a million. So basically, all our human volitions are just a rounding error to the AI that we're building for the ants and other insects. Even just domestic chickens (clearly sentient) outnumber humans roughly three-fold. This seems even more problematic than what one might dub the "votes for man-eating tigers" concern around predators that you do mention above. In general, prey species significantly outnumber their predators, and will all object strongly to being eaten, so I assume the predators would all have to go extinct under your proposal, unless the AIs could arrange to provide them all with nutritious textured vegetable protein substitutes? If so, how would one persuade the predators to eat these, rather than their usual prey? How could AIs create an ecosystem that has minimized suffering-per-individual, without causing species-diversity loss? Presumably they should also provide miniature healthcare for insects, too. Then what about population control, to avoid famine, now they eliminated predation and disease? You are proposing abolishing "nature, red-in-tooth-and-claw" and replacing it with what's basically a planetary-scale zoo — have you considered the practicalities of this?

Or, if the moral weight is not equal per individual, how would you determine an appropriate ratio? Body mass? Synapse count? A definition and decision for this seems rather vital as part of your proposal.

Regarding how to take into account the interests of insects and other animals/digital minds see this passage I have to exclude form publication: [SCEV would apply an equal consideration of interests principle] "However, this does not entail that, for instance, if there is a non-negligible chance that dust mites or future large language models are sentient, the strength of their interests should be weighted the same as the strength of the interests of entities that we have good reasons to believe that it is very likely that they are sentient. The degree of consideration given to the interests or the desires of each being included in the extrapolation base should plausibly be determined by how likely it is that they have such morally relevant interests as a result of being sentient. We should apply something along the lines of Jeff Sebo’s Expected value principle, which is meant to determine the moral value of a given entity in cases of uncertainty about whether or not it is sentient (Sebo, 2018). In determining to what extent the interests, preferences and goals of a given entity (whose capacity for sentience we are uncertain about) should be included in the extrapolation base of SCEV, we should first come up with the best and most reliable credence available about whether the entity in question has morally relevant interests as a result of being sentient. And then we should multiply this credence by the strength (i.e. how bad it would be that those interests were frustrated/how good it would be that they were satisfied) that those interests would have if they were morally relevant as a result of the entity being sentient. The product of this equation should be the extent to which these interests are included in the extrapolation base. When determining our credence about whether the entity in question has morally relevant interests as a result of being sentient, we should also take into account the degree to which we have denied the existence of morally relevant interests to sentient beings different from us in the past. And we should acknowledge the biases present in us against reasonably believing in the extent to which different beings possess capacities that we would deem morally relevant. "

Regarding intervening in ecosystems, and how to balance the interests/preferences of different animals, I expect that unless the extrapolated volition of non-human animals chose/prefer that the actual animals are uplifted, something like this is what they would prefer: https://www.abolitionist.com/?_gl=1*1iqpkhm*_ga*NzU0NDU1ODY0LjE3MDI5MjUzNDY.*_ga_1MVBX8ZRJ9*MTcwMjkyNTM0NS4xLjEuMTcwMjkyNTUwOS4wLjAuMA.. It does not seem morally problematic to intervene in nature etc, and I believe ether are good arguments to defend this view.
 

This seems to me like a very key omission. I'm puzzled that you didn't restore it, at least on Less Wrong, even if you had to, for some unexplained reason (involving reviewers, I would assume) omit it from your academic publication. I urge you to do so.

However, suppose that, in the near future, biologists established beyond all reasonable doubt that dust mites (for example) did, in fact, sense pain, experience physiological symptoms of distress, and otherwise have senses, and are thus definitely sentient under the standard definition (a relatively simple program of neurological and biochemical experiments, apart from the particularly fine positioning of electrodes required). Once that uncertainty had been eliminated (and doing so is of course a rather urgent matter under your proposed ethical system), would their moral value then deserve equal consideration to that of humans? You say "SCEV would apply an equal consideration of interests principle", so I assume that means yes?

Obviously the same limited resources that could support a single human could support many millions of ants. So under your proposed SCEV using equal moral weight, AIs would clearly be strongly morally obligated to drive the human species extinct (as soon as it could do without us, and one would hope humanely). Or, if you added ethical rights for a species as a separate entity, or a prohibition on extinction, drive us down to a minimal safe breeding population. Allowing for genetically-engineered insemination from digital genetic data, that would be a small number of individuals, perhaps , certainly no more then . (While a human is a good source of vegetarian skin flakes for feeding dust mites, these could more efficiently be vat cultured.)

The screed at your link https://www.abolitionist.com/?_gl=1*1iqpkhm*_ga*NzU0NDU1ODY0LjE3MDI5MjUzNDY.*_ga_1MVBX8ZRJ9*MTcwMjkyNTM0NS4xLjEuMTcwMjkyNTUwOS4wLjAuMA.. makes emotionally appealing reading. But it spends only two short paragraphs near the end on this issue, and simply does not attempt to address the technical problems of setting up vegetarian food distribution, healthcare, birth control, retirement facilities, legal representation, and so on and so forth for ~20 quadrillion ants, or indeed a possibly much larger number of dust mites, let alone all the rest of the sentient species (many of them still undiscovered) in every ecosystem on Earth. It merely observes that this will be easier with AI than without it. Nor does it even begin to address how to construct stable ecologies without predation, or diseases, or parasitism. The practical conundrums of supporting the ethical rights of both parasitical species and their natural hosts, for example, are even more intractable that those of predators and their prey that you briefly alluded to in your paper. (I for one fully support President Carter's effort to drive the guinea worm extinct, even if, as seems very likely to me, guinea worms are sentient: their lifecycle inherently requires them both injuring and causing agonizing pain to humans.) I look forward with interest to reading your future proposals for implementing these inevitable practical consequences of your ethical philosophical arguments.

Please bear in mind, during your career as an AI-aware academic moral philosopher, that we may well have superintelligent AI and need to give it at least a basic outline of an ethical system within the next decade or so, quite possibly without the ability to ever later significantly change our minds once we see the full consequences of this decision, so getting this right is a matter of both vital importance and urgency. As Eleizer Yudkowski has observed, we are now doing Moral Philosophy on a tight deadline. Please try not to come up with a system that will drive us all extinct — this is not merely a philosophical debate.

These are great points, thank you! 

Remember that what the SCEV does is not directly that which the individuals included in it directly want, but what they would want after an extrapolation/reflection process that converged in the most coherent way possible. This means that almost certainly, the result is not the same as if there were no extrapolation process. If there were no extrapolation process, one real possibility is that something like what you suggest, such as sentient dust mites or ants taking over the utility function would indeed occur. But with extrapolation it is much less clear, that the models of the ants' extrapolated volition may want to uplift the actual ants to a super-human level, as might our models of human extrapolated volition want to do with us humans. Furthermore given that SCEV would try to maximize coherence between satisfying the various volitions of the included beings, the superintelligence would cause human extinction or similar, only if it were physically impossible for it, independently of how much it was able to self-improve, to cause a more coherent result that respected more humans volitions, this seems unlikely, but is not impossible, so this is something to worry about if this proposal where implemented.

However, importantly, in the paper, I DO NOT argue that we should implement SCEV instead of CEV. I only argue that we have some strong (pro-tanto) reasons to do so, even if we should not ultimately do so, because there are other even stronger (pro-tanto) reasons against doing so. This is why I say this in the conclusion: "In this paper, I have shown why we have some very strong pro-tanto reasons in favour of implementing SCEV instead of CEV. This is the case even if, all things considered, it is still ultimately unclear whether what is best is to try to implement SCEV or another proposal more similar to CEV."

This is truly what I believe and this is why I have put this conclusion in the paper instead of one that states that we SHOULD implement SCEV, I believe this is wrong and thus I did not put it, even if it would have made the paper less complex and more well-rounded. 

I completely agree with you and with the quote, that getting this right is a matter of both vital importance and urgency, and I take this and the possibility of human extinction and s-risks very seriously when conducting my research, it is precisely because of this that I have shifted from doing standard practical/animal ethics to this kind of research. It is great that we can agree on this. Thanks again for your thought-provoking comments, they have lowered my credence in favour of implementing SCEV all things considered (even if we do have the pro-tanto reasons I present in the paper).

Rereading this, I'm sorry for dumping all of these objections on you as once (and especially if I sounded like they were obvious). I did actually think about an ethical system along the lines of the one you propose for O(6 months), and tried a variety of different ways to fix it, before regretfully abandoning it as unworkable.

On the non-equal moral weight version, see if you can find one that doesn't give the AIs perverse incentives to mess with ecosystems. I couldn't, but the closest I found involved species average adult mass (because biamass is roughly conserved), probability of reaching adulthood (r-strategy species are a nightmare), and average adult synapse count, My advice is that making anything logarithmic feels appealing but never seems to work.

since they too can be positively or negatively affected in morally relevant ways

 

taboo morality. 


So people want X

and would want X if they were smarter, etc.

But you say, they should want Y. 

But you are a person. You are in the group of people who would be extrapolated by CEV. If you would be extrapolated by CEV:

  • you would either also want X, in which case insisting on Y is strange
  • or you would be unusual in wanting Y, enough so that your preference on Y is ignored or excessively discounted.

in which case it's not so strange that you would want to insist on Y. But the question is, does it make sense for other people to agree with this?


There is, admittedly, one sense in which Y = higher scope of concern is different from other Ys. And that is, at least superficially it might seem an equivalent continuation of not wanting lower scope of concern.

If someone says, "I don't want my AI to include everyone in its scope of concern, just some people" (or just one), then other people might be concerned about this.

They might, on hearing or suspecting this, react accordingly, like to try to band together to stop that person from making that AI. Or to rush to make a different AI at all costs.  And that's relevant because  they are actually existing entities we are working together with on this one planet.

So, a credible pre-commitment to value everyone is likely to be approved of, to lead to co-operation and ultimate success.

Also, humans are probably pretty similar. There will be a great deal of overlap in those extrapolated values, and probably not extreme irreconcilable conflict.

But, valuing non-human sentient agents is very different. They are not here (yet). And they might be very, very different.

When you encounter a utility monster that claims it will suffer greatly if you don't kill yourself, will you just do that? 

If someone convinces you "life is suffering" will you kill all life in the universe? even if suffering living things want to survive?


Now, once those non-human agentic sentients are here, and they don't already do what we want, and their power is commensurate with ours, we may want to make deals, implicitly or explicitly, to compromise. Thus including them in the scope of concern.

And if that makes sense in the context, that's fine...

But if you pre-emptively do it, unconditionally, you are inviting them to take over.

Could they reciprocate our kindness voluntarily? Sure for some tiny portion of mind-design space that they won't be in.


In your view, Y is obviously important. At least, so it seems to you right now. You say: if we don't focus on Y, code it in right from the start, then Y might be ignored. So, we must focus on Y, since it is obviously important.

But when you step outside what you and other people you are game-theoretically connected with, and the precommitments you reasonably might make:

Well then, anyone can say Y is the all-important thing about anything obviously important to them. A religious person might want an AI to follow the tenets of their religion.

This happens to be your religion. 

To the people downvoting/disagreeing, tell me:

Where does your belief regarding suffering come from?

Does it come from ordinary human values?

  • great, CEV will handle it.

Does it come from your own personal unique values?

  • the rest of humanity has no obligation to go along with that

Does it come from pure logic that the rest of us would realize if we were smart enough?

  • great, CEV will handle it.

Is it just a brute fact that suffering of all entities whatsoever is bad, regardless of anyone's views? And furthermore, you have special insight into this, not from your own personal values, or from logic,  but...from something else?

  • then how are you not a religion? where is it coming from?

It is not clear to me exactly what "belief regarding suffering" you are talking about, what you mean by "ordinary human values"/"your own personal unique values". 

As I argue in Section 2.2., there is (at least) a non-negligible chance that s-risks occur as a result of implementing human-CEV, even if s-risks are very morally undesirable (either in a realist or non-realist sense).

Please read the paper, and if you have any specific points of disagreement cite the passages you would like to discuss. Thank you

Suppose that my definition of "suffering" (as a moral term) was "suffering by a human" and my definition of "s-risk" was "risk of massive risk of suffering by humans", and my definition of 'human' was a member of the biological species Homo sapiens (or a high-fidelity upload of one). You tell me we have to pay attention to animal suffering and animal s-risks, and I say "while the biological phenomenon of pain in humans and in animals is identical, in my ethical system human have moral weight and animals don't. So animal pain is not, morally speaking, suffering, and risk of it is not s-risk." You say "oh yes it is", and I say "by your ethical systems axioms, yes, but not by mine". How do you then persuade me otherwise, using only ethics and logic, when you and I don't operate in the same ethical system? You're just saying "I have axiom A", and my reply is "good for you, I have axiom B".

You can't use logic here, because you and your interlocutor don't share the same axiom system. However, you can say "A society that used my proposed ethical system would produce outcome X, whereas a society using yours would produce outcome Y, and pretty-much every human finds X cute and fluffy and Y nauseating, that's just the way human instincts are. So even though all you care about is humans, my ethical system is better." That's a valid argument that might win, ethical logic isn't. You have to appeal to instinct and/or aesthetics, because that's all you and your interlocutor (hopefully) agree on.

Hi Roger, first, the paper is addressed to those who already do believe that all sentient beings deserve moral consideration and that their suffering is morally undesirable. I do not argue for these points in the paper, since they are already universally accepted in the moral philosophy literature.

This is why, for instance, write the following: "sentience in the sense understood above as the capacity of having positively or negatively valenced phenomenally conscious experiences is widely regarded and accepted as a sufficient condition for moral patienthood (Clarke, S., Zohny, H. & Savulescu, J., 2021)".

Furthermore, it is just empirically not the case that people cannot be convinced "only by ethics and logic": for instance, many people reading Peter Singer's Animal Liberation, as a result, changed their views in light of the arguments he provided in the first chapter and came to believe that non-human animals deserve equal moral consideration of interests. Changing one's ethical views when presented with ethical arguments is a standard practice that occurs to moral philosophers when researching and reading moral philosophy. Of course, there is the is/ought to gap, but this does not entail that one cannot convince someone that the most coherent version of their most fundamental ethical intuitions do not, in fact, lead where they believe they lead but instead that they lead to somewhere else, to a different conclusion. This happens all the time between more philosophers, one presents an argument in favour of a view, and in many instances, many philosophers are convinced by that argument and change their view.

In this paper, I was not trying to argue that non-human animals deserve moral consideration or that s-risks are bad, as I said, I have assumed this. What I try to argue is that if this is true, then, in some decision-making situations we would have some strong pro-tanto moral reasons to implement SCEV. In fact, I do not even argue that conclusively, what we should do is try to implement SCEV.

the paper is addressed to those who already do believe that all sentient beings deserve moral consideration and that their suffering is morally undesirable.

I think you should state these assumptions more clearly at the beginning of the paper, since you appear to be assuming what you are claiming to prove. You are also making incorrect assumptions about your audience, especially when posting it to Less Wrong. The idea that Coherent Extrapolated Volition, Utilitarianism, or "Human Values" applies only to humans, or perhaps only to sapient beings, is quite widespread on Less Wrong.

I do not argue for these points in the paper, since they are already universally accepted in the moral philosophy literature

I'm not deeply familiar with the most recent few decades of the moral philosophy literature, so I won't attempt to argue this in a recent context, if that is what you in fact mean by "the moral philosophy literature" (though I have to say that I do find any claim of the form "absolutely everyone who matters agrees with me" inherently suspicious). However, Philosophy is not a field that has made such rapid recent advances such that one can simply ignore all but the last few decades, and for the moral philosophy literature of the early 20th century and the preceding few millennia (which includes basically every philosopher named in a typical introductory guide to Moral Philosophy), this claim is just blatantly false, even to someone from outside the academic specialty. For example, I am quite certain that Nietzsche, Hobbes, Thomas Aquinas and Plato would all have variously taken issue with the proposition that humans and ants deserve equal moral consideration, if ants can be shown to experience pain (though the Jains would not). Or perhaps you would care to cite quotes from each of them clearly supporting your position? Indeed, for much of the last two millennia, Christian moral philosophy made it entirely clear that they believed animals do not have souls, and thus did not deserve the same moral consideration as humans, and that humans held a unique role in God's plan, as the only creature made in His image and imbued with souls. So claiming that your position is "already universally accepted in the moral philosophy literature" while simply ignoring O(90%) of that literature appears specious to me. Perhaps you should also briefly outline in your paper which portions of or schools from the moral philosophy literature in fact agree with your unstated underlying assumption?

What I mean by "moral philosophy literature" is the contemporary moral philosophy literature, I should have been more specific, my bad. And in contemporary philosophy, it is universally accepted (though of course, the might exist one philosopher or another who disagrees) that sentience in the sense understood above as the capacity of having positively or negatively valenced phenomenally conscious experiences is sufficient for moral patienthood. If this is the case, then, it is enough to cite a published work or works in which this is evident. This is why I cite Clarke, S., Zohny, H. & Savulescu, J., 2021. You can go see this recently edited book on moral status that this claim is assumed thought and in the book you can find the sources for its justification. 

OK, note to self: If we manage to create a superintelligece, and give us access to the contemporary moral philosophy literature, it will euthanize us all and feed us to ants. Good to know!

I do not think this follows, the "consensus" is that sentience is sufficient for moral status. It is not clearly the case that giving some moral consideration to non-human sentient beings would lead to the scenario you describe. Though see: https://www.tandfonline.com/doi/full/10.1080/21550085.2023.2200724 

"Some", or "pro-tanto" unspecified amount of moral consideration, I agree in principle. "Equal" or even "anywhere within a few orders of magnitude of equal", and we go extinct. Ants need ~10,000,000 times less resources per individual than humans, so if you don't give humans around ~10,000,000 times the moral value, we end up extinct in favor of more ants. For even tinier creatures, the ratios are even larger. Explaining why moral weight ought to scale linearly with body weight over many orders of magnitude is a challenging moral position to argue for, but any position that doesn't closely approximate that leads to wildly perverse incentives and the "repugnant conclusion". The most plausible-sounding moral argument I've come up with is that moral weight should be assigned somewhat comparably per-species at a planetary level, and then shared out (equally?) per individual member of a species, so smaller more-numerous species end up with a smaller share per individual. However, given my attitude of ethical system design, I view these sorts of arguments as post-facto political-discussion justifications, and am happy to do what works, and between species of very different sizes, the only thing that works is that moral weight scales roughly linearly with adult body weight (or more accurately, resource needs).

I enjoyed Jeff Sebo's paper, thank-you for the reference, and mostly agree with his analysis, if not his moral intuitions — but I really wish he had put some approximate numbers in on occasion to show just how many orders of magnitude the ratios can be between the "large" and "small" things he often discusses. Those words conjure up things within an order of magnitude of each other, not many orders-of-magnitude apart. Words like "vast" and "minute" might have been more appropriate, even before he got on to discussing microbes. But I loved Pascal's Bugging.

Overall, thank-you for the inspiration: Due to your paper and this conversation, I'm now working on another post for my AI, Alignment and Ethics sequence where I'll dig in more depth into this exact question, of the feasibility or otherwise of granting moral worth to sentient animals, from my non-moral-absolutist ethical-system design viewpoint, This one's a really hard design problem that requires a lot of inelegant hacks. My urgent advice would be to steer clear of it, at least unless you have very capable ASI assistance and excellent nanotech and genetic engineering, plus some kind of backup plan in case you made a mistake and persuaded your ASIs to render humanity extinct. Something like an even more capable ASI running the previous moral system ready to step in under prespecified circumstances comes to mind, but then how do you get it to not step in due to ethical disagreement?.

I am glad to hear you enjoyed the paper and that our conversation has inspired you to work more on this issue! As I mentioned I now find the worries you lay out in the first paragraph significantly more pressing, thank you for pointing them out! 

It is not clear to me exactly what "belief regarding suffering" you are talking about, what you mean by "ordinary human values"/"your own personal unique values". 

Belief regarding suffering: the belief that s-risks are bad, independently of human values as would be represented in CEV.

Ordinary human values: what most people have.

Your own personal unique values: what you have, but others don't.

Please read the paper, and if you have any specific points of disagreement cite the passages you would like to discuss. Thank you

In my other reply comment, I pointed out disagreements with particular parts of the paper you cited in favour of your views. My fundamental disagreement though, is that you are fundamentally relying on an unjustified assumption, repeated in your comment above:

even if s-risks are very morally undesirable (either in a realist or non-realist sense)

The assumption being that s-risks are "very morally undesirable", independently of human desires (represented in CEV). 

Okay, I understand better now. 

You ask: "Where does your belief regarding the badness of s-risks come from?"

 And you provide 3 possible answers I am (in your view) able to choose between:

  1. "From what most people value" 2. "From what I personally value but others don't" or 3. "from pure logic that the rest of us would realize if we were smart enough".

However, the first two answers do not seem to be answers to the question. My beliefs about what is or is not morally desirable do not come from "what most people value" or "what I personally value but others don't". In one sense my beliefs about ethics, as everyone's beliefs about ethics, come from various physical causes (personal experiences, conversations I have had with other people, papers I have read) such as in the formation of all other kinds of beliefs. There is another sense in which my beliefs about ethics, seem to me to be justified by reasons/preferences. This second sense, I believe is the one you are interested in discussing. And what is exactly the nature of the reasons or preferences that make me have certain ethical views is what the discipline of meta-ethics is about. To figure out or argue for which is the right position in meta-ethics is outside the scope of this paper, which is why I have not addressed it in the paper. Below I will reply to your other comment and discuss more the meta-ethical issue.

Hi simon, 

it is not clear to me which of the points of the paper you object to exactly, and I feel some of your worries may already be addressed in the paper. 

For instance, you write: "And that's relevant because  they are actually existing entities we are working together with on this one planet." First, some sentient non-humans already exist, that is, non-human animals. Second, the fact that we can work or not work with given entities does not seem to be what is relevant in determining whether they should be included in the extrapolation base or not, as I argue in sections 2., 2.1., and 4.2.

For utility-monster-type worries and worries about the possibility that "misaligned" digital minds would take control see section 3.2.

You write: "Well then, anyone can say Y is the all-important thing about anything obviously important to them. A religious person might want an AI to follow the tenets of their religion." Yes, but (as I argue in 2.1 and 2.2) there are strong reasons to include all sentient beings. And (to my knowledge) there are no good reasons to support any religion. As I argue in the paper and has been argued elsewhere, the first values you implement will change the ASI's behaviour in expectation, and as a result, what values to implement first cannot be left to the AI to be figured out. For instance, because we have better reasons to believe that all sentient beings can be positively or negatively affected in morally relevant ways than to believe that only given members of a specific religion matter, it is likely best to include all sentient beings than to include only the members of the religion. See Section 2.

[-]simon5mo109

Thanks for the reply.

We don't work together with animals - we act towards them, generously or not.

That's key because, unlike for other humans, we don't have an instrumental reason to include them in the programmed value calculation, and to precommit to doing so, etc. For animals, it's more of a terminal goal. But if that terminal goal is a human value, it's represented in CEV. So where does this terminal goal over and above human values come from?

Regarding 2:

There is (at least) a non-negligible probability that an adequate implementation of the standard CEV proposal results in the ASI causing or allowing the occurrence of risks of astronomical suffering (s-risks).

You don't justify why this is a bad thing over and above human values as represented in CEV.

Regarding 2.1:

The normal CEV proposal, like CEO-CEV and men-CEV, excludes a subset of moral patients from the extrapolation base.

You just assume it, that the concept of "moral patients" exists and includes non-humans. Note, to validly claim that CEV is insufficient, it's not enough to say that human values include caring for animals - it has to be something independent of or at least beyond human values. But what? 

Regarding 4.2:

However, as seen above, it is not the case that there are no reasons to include sentient non-humans since they too can be positively or negatively affected in morally relevant ways by being included in the extrapolation base or not.

Again, existence and application of the "moral relevance" concept over and above human values just assumed, not justified.

regarding 3.2:

At any given point in time t, the ASI should take those actions that would in expectation most fulfil the coherent extrapolated volition of all sentient beings that exist in t.

Good, by focusing at the particular time at least you aren't guaranteeing that the AI will replace us with utility monsters. But if utility monsters do come to exist or be found (e.g. utility monster aliens) for whatever reason, the AI will still side with them, because:

Contrary to what seems to be the case in the standard CEV proposal, the interests of future not-yet-existing sentient beings, once they exist, would not be taken into account merely to the extent to which the extrapolated volitions of currently existing individuals desire to do so.

Also, I have to remark on:

Finally, it should also be noted that this proposal of SCEV (as CEV) is not intended as a realist theory of morality, it is not a description of the metaphysical nature of what constitutes the ‘good’. I am not proposing a metaethical theory but merely what would be the most morally desirable ambitious value learning proposal for an ASI.

You assert your approach is "the most morally desirable" while disclaiming moral realism. So where does that "most morally desirable" come from?

And in response to your comment:

Yes, but (as I argue in 2.1 and 2.2) there are strong reasons to include all sentient beings. And (to my knowledge) there are no good reasons to support any religion.

The "reasons" are simply unjustified assumptions, like "moral relevance" existing (independent of our values, game theoretic considerations including pre-commitments, etc.) (and yes, you don't explicitly say it exists independent of those things in so many words, but your argument doesn't hold unless they do exist independently).

unlike for other humans, we don't have an instrumental reason to include them in the programmed value calculation, and to precommit to doing so, etc. For animals, it's more of a terminal goal.

 

First, it seems plausible that, we (in fact) do not have instrumental reason to include all humans. As I argue in section 4.2. There are some humans such as: " children, existing people who've never heard about AI or people with severe physical or cognitive disabilities unable to act on and express their own views on the topic" who, if included, would also only be included in because of our terminal goals, because they too matter. 

If your view is that you only have reasons to include those, whom you have instrumental reasons to include, on your view: the members of an AGI lab that developed ASI ought to include only themselves if they believe (in expectation) that they can successfully do so. This view is implausible, it is implausible that this is what they would have most moral reasons to do. 

Whether this is implausible or not is a discussion about normative and practical ethics, and (a bit contrary, to what you seem to believe) these kinds of discussions can be had, are had all the time inside and outside academia and are fruitful in many instances. 

if that terminal goal is a human value, it's represented in CEV

As I argue in Section 2.2, it is not clear that by implementing CEV, s-risks would be prevented for certain. Rather, there is a non-negligible chance that they are not. If you want to argue that s-risks would be prevented for certain, please address the object-level arguments I present. If you want to argue that the occurrence of s-risks would not be bad, you want to argue for a particular view in normative and practical ethics. As a result, you should argue for it presenting arguments to justify certain views in these disciplines.

You don't justify why this is a bad thing over and above human values as represented in CEV.

This seems to be the major point of disagreement. In the paper, when I say s-risks are morally undesirable, i.e. bad, I use bad and morally undesirable as it is commonly used in analytic philosophy, and outside academia, when for example someone, says "Hey, you can't do that, that's wrong".

What exactly I, you or anyone else mean when we utter the words "bad", "wrong", and "morally undesirable" is the main question in the field of meta-ethics. Meta-ethics is very difficult and contrary to what you suggest, I do not reject/disclaim moral realism, neither in the paper nor in my belief system. But I also do not endorse it. I am agnostic regarding this central question in meta-ethics, I suspend my judgment because I believe I have not sufficiently familiarized myself yet with the various arguments in favour or against the various possible positions. See: https://plato.stanford.edu/entries/metaethics/

This paper is not about metaethics, it is about practical ethics, and some normative ethics. It is possible to do both practical ethics and normative ethics while being agnostic or not being correct about metaethics, as is exemplified by the whole academic fields of practical and normative ethics. In the same way that it is possible to attain knowledge about physics, for instance, without having a complete theory of what knowledge is. 

If you want, you can try to show that my paper that talks about normative ethics is incorrect based on considerations regarding metaethics but to do so, it would be quite helpful if you were able to present an argument with premises and a conclusion, instead of asking questions.

Thank you for specifically citing passages of the paper in your comment.

[-]simon4mo109

If your view is that you only have reasons to include those, whom you have instrumental reasons to include, on your view: the members of an AGI lab that developed ASI ought to include only themselves if they believe (in expectation) that they can successfully do so. This view is implausible, it is implausible that this is what they would have most moral reasons to do. 

 

I note that not everyone considers that implausible, for example Tamsin Leake's QACI takes this view.

I disagree with both Tamsin Leake and with you: I think that humans-only, but only humans, makes the most sense. But for concrete reasons, not for free-floating moral reasons.

I was writing the following as a response to NicholasKees' comment, but I think it belongs better as a response here:


...imagine you are in a mob in such a "tyranny of the mob" kind of situation, with mob-CEV. For the time being, imagine a small mob.

You tell the other mob members: "we should expand the franchise/function to other people not in our mob".

OK, should the other mob members agree?

  • maybe they agree with you that it is right that the function should be expanded to other humans. In which case mob-CEV would do it automatically.
  • Or they don't agree. And still don't agree after full consideration/extrapolation.

If they don't agree, what do you do? Ask Total-Utility-God to strike them down for disobeying the One True Morality?

At this point you are stuck, if the mob-CEV AI has made the mob untouchable to entities outside it.

But there is something you could have done earlier. Earlier, you could have allied with other humans outside of the mob, to pressure the would-be-mob members to pre-commit to not excluding other humans.

And in doing so, you might have insisted on including all humans, not specifically the humans you were explicitly allying with, even if you didn't directly care about everyone, because:

  • the ally group might shift over time, or people outside the ally group might make their own demands
  • if the franchise is not set to a solid Schelling point (like all humans) then people currently inside might still worry about the lines being shifted to exclude them.

Thus, you include the Sentinelese, not because you're worried about them coming over to demand to be included, but because if you draw the line to exclude them then it becomes more ambiguous where the line should be drawn, and relatively low (but non-zero) influence members of the coalition might be worried about also being excluded. And, as fellow humans, it is probably relatively low cost to include them - they're unlikely to have wildly divergent values or be utility monsters etc.


You might ask, is it not also a solid Schelling point to include all entities whatsoever?

First, not really, we don't have good definitions of "all sentient beings", not nearly as good as "all humans". It might be different if, e.g., we had time travel, such that we would also have to worry about intermediate evolutionary steps between humans and non-human-animals, but we don't.

In the future, we will have more ambiguous cases, but CEV can handle it. If someone wants to modify themselves into a utility monster, maybe we would want to let them do so, but discount their weighting in CEV to a more normal level when they do it.

And second, it is not costless to expand the franchise. If you allow non-humans preemptively you are opening yourself up to, as an example, the xenophobic aliens scenario, but also potentially who-knows-what other dangerous situations since entities could have arbitrary values.

And that's why expanding the franchise to all humans makes sense, even if individuals don't care about other humans that much, but expanding to all sentients does not, even if people do care about other sentients.


In response to the rest of your comment:

If you want to argue that s-risks would be prevented for certain, please address the object-level arguments I present. 

If humans would want to prevent s-risks, then they would be prevented. If humans would not want to prevent s-risks, they would not be prevented.

If you want to argue that the occurrence of s-risks would not be bad, you want to argue for a particular view in normative and practical ethics.

You're the one arguing that people should override their actual values, and instead of programming an AI to follow their actual values, do something else! Without even an instrumental reason to do so (other than alleged moral considerations that aren't in their actual values, but coming from some other magical direction)!

Asking someone to do something that isn't in their values, without giving them instrumental reasons to do so, makes no sense. 

It is you who needs a strong meta-ethical case for that. It shouldn't be the objector who has to justify not overriding their values! 

An interesting synchronicity: I just posted a sequence AI, Alignment, and Ethics on some rather similar ideas (which I've been thinking about for roughly a decade). See in particular Parts 3. Uploading, 4. A Moral Case for Evolved-Sapience-Chauvinism and 5. The Mutable Values Problem in Value Learning and CEV for some alternative suggestions on this subject.

Thank you! I will for sure read these when I have time. And thank you for your comments!