# Example population ethics: ordered discounted utility

AI Alignment Forum

# Ω 6

Here I'll present an old idea for a theory of population ethics. This post exists mainly so that I can have something to point to when I need this example.

Given a total population , each with total individual utility over the whole of their lives, order them from lowest utility to the highest so that implies . These utilities are assumed to have a natural zero point (the "life worth living" standard, or similar).

Then pick some discount factor , and define the total utility of the world with population (which is the total population of the world across all time) as

• .

This is a prioritarian utility that gives greater weight to those least well off. It is not average utilitarianism, and would advocate creating a human with utility larger than than all other humans (as long as it was positive), and would advocate against creating a human with negative utility (for a utility in between, it depends on the details). In the limit , it's total utilitarianism. Increasing someone's individual utility always improves the score. It (sometimes) accepts the "sadistic conclusion", but I've argued that that conclusion is misnamed (the conclusion is a choice between two negative outcomes, meaning that calling it "sadistic" is a poor choice - the preferred outcome is not a good one, just a less bad one). Killing people won't help, unless they will have future lifetime utility that is negative (as everyone that ever lived is included in the sum). Note that this sets up a minor asymmetry between not-creating people and killing them.

Do I endorse this? No; I think a genuine population ethics will be more complicated, and needs a greater asymmetry between life and death. But it's good enough for an example in many situations that come up.

# Ω 6

New Comment

It does recommend against creating humans with lives barely worth living, and equivalently painlessly killing such people as well. If your population is a single person with utility 1000 and γ=.99, then this would recommend against creating a person with utility 1.

EDIT: I realised I wasn't clear that the sum was over everyone that ever lived. I've clarified that in the post.

Actually, it recommends killing only people who's future lifetime utility is about going to go negative, as the sum is over all humans in the world in total.

You're correct on the "not creating" incentives.

Now, this doesn't represent what I'd endorse (I prefer more asymmetry between life and death), but it's good enough as an example for most cases that come up.

It's interesting. A few points:

Is there a natural extension for infinite population? It seems harder than most approaches to adapt.

I'm always suspicious of schemes that change what they advocate massively based on events a long time ago in a galaxy far, far away - in particular when it can have catastrophic implications. If it turns out there were 3^^^3 Jedi living in a perfect state of bliss, this advocates for preventing any more births now and forever.

Do you know a similar failure case for total utilitarianism? All the sadistic/repugnant/very-repugnant... conclusions seem to be comparing highly undesirable states - not attractor states. If we'd never want world A or B, wouldn't head towards B from A, and wouldn't head towards A from B (since there'd always be some preferable direction), does an A-vs-B comparison actually matter at all?

Total utilitarianism is an imperfect match for our intuitions when comparing arbitrary pairs of worlds, but I can't recall seeing any practical example where it'd lead to clearly bad decisions. (perhaps birth-vs-death considerations?)

In general, I'd be interested to know whether you think an objective measure of per-person utility even makes sense. People's take on their own situation tends to adapt to their expectations (as you'd expect, from an evolutionary fitness point of view). A zero-utility life from our perspective would probably look positive 1000 years ago, and negative (hopefully) in 100 years. This is likely true even if the past/future people were told in detail how the present-day 'zero' life felt from the inside: they'd assume our evaluation was simply wrong.

Or if we only care about (an objective measure of) subjective experience, does that mean we'd want people who're all supremely happy/fulfilled/... with their circumstances to the point of delusion?

Measuring personal utility can be seen as an orthogonal question, but if I'm aiming to match my intuitions I need to consider both. If I consider different fixed personal-utility-metrics, it's quite possible I'd arrive at a different population ethics. [edited from "different population utilities", which isn't what I meant]

I think you're working in the dark if you try to match population ethics to intuition without fixing some measure of personal utility (perhaps you have one in mind, but I'm pretty hazy myself :)).

Is there a natural extension for infinite population? It seems harder than most approaches to adapt.

None of the population ethics have decent extensions to infinite populations. I have a very separate idea for infinite populations here. I suppose the extension of this method to infinite population would use the same method as in that post, but use instead of (where and are the limsup and liminf of utilities, respectively).

I'm always suspicious of schemes that change what they advocate massively based on events a long time ago in a galaxy far, far away - in particular when it can have catastrophic implications. If it turns out there were 3^^^3 Jedi living in a perfect state of bliss, this advocates for preventing any more births now and forever.

You can always zero out those utilities by decree, and only consider utilities that you can change. There are other patches you can apply. By talking this way, I'm revealing the principle I'm most willing to sacrifice: elegance.

Do you know a similar failure case for total utilitarianism? All the sadistic/repugnant/very-repugnant... conclusions seem to be comparing highly undesirable states - not attractor states. If we'd never want world A or B, wouldn't head towards B from A, and wouldn't head towards A from B (since there'd always be some preferable direction), does an A-vs-B comparison actually matter at all?

If A is repugnant and C is now, you can get from C to A by doing improvements (by the standard of total utilitarianism) every step of the way. Similarly, if B is worse than A on that standard, there is a hypothetical path from B to A which is an "improvement" at each step (most population ethics have this property, but not all - you need some form of "continuity").

It's possible that the most total-ut distribution of matter in the universe is a repugnant way; in that case, a sufficiently powerful AI may find a way to reach that.

In general, I'd be interested to know whether you think an objective measure of per-person utility even makes sense.

a) I don't think it makes sense in any strongly principled way, b) I'm trying to build one anyway :-)

You can always zero out those utilities by decree, and only consider utilities that you can change. There are other patches you can apply. By talking this way, I'm revealing the principle I'm most willing to sacrifice: elegance.

It's been a long time since you posted this, but if you see my comment, I'd be curious about what some others patches one could apply are.  I have pretty severe scrupulosity issues around population ethics and often have trouble functioning because I can't stop thinking about them.  I dislike pure total utilitarianism, but I have trouble rejecting it precisely because of "galaxy far far away" type issues.  I spend a lot of time worrying about the idea that I am forced to choose between two alternatives:

1) That (to paraphrase what you said in your critique of total utilitarianism) it is a morally neutral act to kill someone if you replace them with someone whose lifetime utility is equal to the first person's remaining lifetime utility (and on a larger scale, the Repugnant Conclusion), or

2.That the human race might be obligated to go extinct if it turns out there is some utopia in some other branch of the multiverse, or the Andromeda Galaxy, or in some ancient, undiscovered fallen civilization in the past. Or that if the Earth was going to explode and I could press a button to save it, but it would result in future generations living slightly lower quality lives than present generations, I shouldn't push the button.

I'd really like to know some ways that I can reject both 1 and 2. I really admire your work on population ethics and find that your thinking on the subject is really closely aligned with my own, except that you're better at it than me :)

Hey there!

I haven't been working much on population ethics (I'm more wanting to automate the construction of values from human preferences so that an AI could extract a whole messy theory from it).

My main thought on these issues is to set up a stronger divergence between killing someone and not bringing them into existence. For example, we could restrict preference-satisfaction to existing beings (and future existing beings). So if they don't want to be killed, that counts as a negative if we do that, even if we replace them with someone happier.

This has degenerate solutions too - it incentivises producing beings that are very easy to satisfy and that don't mind being killed. But note that "create beings that score max on this utility scale, even if they aren't conscious or human" is a failure mode for average and total utilitarianism as well, so this isn't a new problem.

So if they don't want to be killed, that counts as a negative if we do that, even if we replace them with someone happier.

I have that idea as my "line of retreat."  My issue with it is that it is hard to calibrate it so that it leaves as big a birth-death asymmetry as I want without degenerating into full-blown anti-natalism. There needs to be some way to say that the new happy person's happiness can't compensate for the original person's death without saying that the original person's own happiness can't compensate for their own death, which is hard.  If I calibrate it to avoid anti-natalism it becomes such a small negative that it seems like it could easily be overcome by adding more people with only a little more welfare.

There's also the two step "kill and replace" method, where in step one you add a new life barely worth living without affecting anyone.   Since the new person exists now, they count the same as everyone else, so then in the second step you kill someone and transfer their resources to the new person. If this process gives the new person the same amount of utility as the old one, it seems neutral under total utilitarianism. I suppose under total preference utilitarianism its somewhat worse, since you now have two people dying with unsatisfied preferences instead of one, but it doesn't seem like a big enough asymmetry for me.

I feel like in order to reject the two step process, and to have as big an asymmetry as I want, I need to be able to reject "mere addition" and accept the Sadistic Conclusion. But that in turn leads to "galaxy far far away issues" where it becomes wrong to have children because of happy people in some far off place. Or "Egyptology" issues where its better for the world to end than for it to decline so future people have somewhat worse lives, and we are obligated to make sure the Ancient Egyptians didn't have way better lives than ours before we decide on having children.  I just don't know. I want it to stop hurting my brain so badly, but I keep worrying about how there's no solution that isn't horrible or ridiculous.

This has degenerate solutions too - it incentivises producing beings that are very easy to satisfy and that don't mind being killed.

For this one, I am just willing to just decree that creating creatures with a diverse variety of complex human-like psychologies is good, and creating  creatures with weird minmaxing unambitious creatures is bad (or at least massively sub-optimal). To put it another way, Human Nature is morally valuable and needs to be protected.

Another resource that helped me on this was Derek Parfit's essay "What Makes Someone's Life Go Best."  You might find it helpful, it parallels some of your own work on personal identity and preferences. The essay describes which of our preferences we feel count as part of our "self interest" and which do not. It helped me understand things, like why people general feel obligated to respect people's "self interest" preferences (i.e. being happy, not dying), but not their "moral preferences" (i.e. making the country a theocracy, executing heretics).

Parfit's "Success Theory," as he calls it, basically argues that only preferences that are "about your own life" count as "welfare" or "self interest."  So that means that we would not be making the world a better place by adding lives who prefer that the speed of light stay constant, or that electrons keep having negative charges. That doesn't defuse the problem entirely, you could still imagine creating creatures with super unambitious life goals. But it gets it part of the way, the rest, again, I deal with by "defending Human Nature."

I'm more wanting to automate the construction of values from human preferences

I had a question about that. It is probably a silly question since my understanding of decision and game theory is poor. When you were working on that you said that there was no independence of irrelevant alternatives.  I've noticed that IIA is something that trips me up a lot when I think about population ethics.  I want to be able to say something like "Adding more lives might be bad if there is still the option to improve existing ones instead, but might be good if the existing ones have already died and that option is foreclosed." This violates IIA because I am conditioning whether adding more lives is good on whether there is another alternative or not.

I was wondering if my brain might be doing the thing you described in your post on no IIA, where it is smashing two different values together and getting different results if there are more alternatives. It probably isn't I am probably just being irrational, but reading that post just felt familiar.

Thanks. I'll check out the infinite idea.

On repugnance, I think I've been thinking too much in terms of human minds only. In that case there really doesn't seem to be a practical problem: certainly if C is now, continuous improvements might get us to a repugnant A - but my point is that that path wouldn't be anywhere close to optimal. Total-ut prefers A to C, but there'd be a vast range of preferable options every step of the way - so it'd always end up steering towards some other X rather than anything like A.

I think that's true if we restrict to human minds (the resource costs of running a barely content one being a similar order of magnitude to those of running a happy one).

But of course you're right as soon as we're talking about e.g. rats (or AI-designed molecular scale minds...). I can easily conceive of metrics valuing 50 happy rats over 1 happy human. I don't think rat-world fits most people's idea of utopia.

I think that's the style of repugnance that'd be a practical danger: vast amounts of happy-but-simple minds.

I think that's the style of repugnance that'd be a practical danger: vast amounts of happy-but-simple minds.

Yep, that does seem a risk. I think that's what the "muzak and potatoes" formulation of repugnance is about.

It seems odd to me that it is so distribution-dependent. If there is a large number of people, with a large gap between the highest and the lowest, then it's worth killing (potentially most people) just to move the high utility individual down the preference ordering. One solution might be to fix the highest power of γ (for any population), and approach it across the summation in a way weighted by the flatness of the distribution.

Another issue is that two individuals with the same unweighted utility can become victims of the ordering, although that could be patched by grouping individuals by equal unweighted utility, and then summing over the weighted sums of the group utilities.

EDIT: I realised I wasn't clear that the sum was over everyone that ever lived. I've clarified that in the post.

Killing people with future lifetime non-negative utility won't help, as they will still be included in the sum.

Another issue is that two individuals with the same unweighted utility can become victims of the ordering

No. If , then . The ordering between identical utilities won't matter for the total sum, and the individual that is currently behind will be prioritised.

My mistake with respect to the sum being over all time, thank you for clarifying.

No. If a=b, then a+γb=b+γa. The ordering between identical utilities won't matter for the total sum, and the individual that is currently behind will be prioritised.

While the ordering between identical utilities does not affect the total sum, it does affect the individual valuation. a can be prioritized over b just by the ordering, even though they have identical utility. Unless I am missing something obvious.

a can be prioritized over b just by the ordering, even though they have identical utility.

Nope. Their ordering is only arbitrary as long as they have exactly the same utility. As soon as a policy would result in one of them having higher utility than the other, their ordering is no longer arbitrary. So if we ignore other people means the term in the sum is . If , it's . If , it can be either term (and they are equal).

(I can explain in more detail if that's not enough?)

I have realized that I am coming off like I don't understand algebra, which is a result of my failure to communicate. As unlikely as I am making it sound, I understand what you are saying and already knew it.

What I mean is this:

Despite a = b, it could "look like" a < b or b > a if you didn't have access to the world but only to the (expanded) sum. If you can ask for the difference between the total sum and the sum ignoring a, but not for the actual value of a.

I can't think of a non-pathological case where this would actually matter, but it seems like a desirable desideratum that a = b will always "look like" a = b regardless of what kind of (sufficiently fine-grained) information that you have.

EDIT : After reading your above comment about willingness to sacrifice elegance, I kind of wish I hadn't said anything at all, considering my comments are all in the interest of what I would consider elegance. To be sure, I think elegance is a legitimate practical concern, but I wouldn't have engaged with you initially had I known your view.

Hum, not entirely sure what you're getting at...

I'd say that always "looks like ", in the sense that there is a continuity in the overall ; small changes to our knowledge of and make small changes to our estimate of .

I'm not really sure what stronger condition you could want; after all, when , we can always write

as:

• .

We could equivalently define that way, in fact (it generalises to larger sets of equal utilities).

Would that formulation help?

[+][comment deleted]30