I've been thinking lately about the Repugnant Conclusion. For those who are not already aware, it's a problem in Population Ethics where one is seemingly forced to say that a world entirely populated with happy, well-off people is less preferable (all else being equal) than a world consisting of a comparatively larger number of folk who experience a lower quality-of-life.

This doesn't sound so bad at first (many philosophers would presumably be fine with reducing their quality of life on the condition that more babies with mild depression or something be bought into existence), until you realize that this can be applied iteratively. At some point, the larger (but less-well-off-per-individual) world is incredibly populous, but consists only of people who all have lives barely worth living. This world is "objectively" better than our first world, according to many formal ethical frameworks.[1]

Okay, but that isn't so bad, is it? After all, "lives barely worth living" are still worth living! It's not like we're talking about a world full of suicidal people...right? Well, enter the so-called Very Repugnant Conclusion: 

For any perfectly equal population with very high positive welfare, and for any number of lives with very negative welfare, there is a population consisting of the lives with negative welfare and lives with very low positive welfare which is better than the high welfare population, other things being equal.[2]

In other words, the Very Repugnant Conclusion considers a semi-hellish world. This world is populated by some people suffering so badly that they'd be better off not existing,[3] while the rest of the population has the same quality of life as the people from the end of the Repugnant Conclusion (i.e. only marginally worth living). Assuming a high enough population, this semi-hellish world is somehow better than one containing only extremely happy, well-off people.

The Very Repugnant Conclusion has been shown to be provably true if one accepts a very small set of basic moral/logical axioms,[2] all of which seem intuitively, obviously true to many people. Therefore, if you want a self-consistent ethical framework, one must either "bite the bullet" on the Very Repugnant Conclusion and accept it as correct, or reject one of the axioms it rests on, all of which would seemingly have far-reaching consequences on other basic moral intuitions.[1][4]

This probably hasn't convinced you that formal ethics is a contradictory illusion, if you didn't already think so. After all, perhaps there's some clever way around the Very Repugnant Conclusion we haven't discovered yet, or perhaps you're simply willing to just bite the bullet on it and say "yeah sure, maybe my moral intuition is flawed here."[5]

More generally, it seems intuitively plausible that a formal system can (in theory) be devised[6] which, if followed, will always lead one to choose the "most ethical" option available, or at least to avoid choosing an "ethical atrocity."[7] Consider that creating an AI which understands human ethics[8] seems at least theoretically doable. We also know that neural networks are, at the end of the day, mathematically equivalent to incredibly complex Turing machines, and aren't our brains basically fancy neural nets as well? If AI can (presumably) do it, and brains can do it, what's stopping philosophers from doing it, and writing down a self-consistent, intuitively moral ethics down? (Beyond the lack of paper and funding, of course...)

All sarcasm aside, I believe that the formal self-consistency--or lack thereof--of Ethics is quite possibly a fundamental problem for AI Alignment, among other fields. What would it mean for us if Ethics was fundamentally inconsistent?

Note that this is a very different question than asking if Ethics is "objective" or not; at this point it seems pretty obvious that our Ethics is in large part a product of human psychology and culture.[9] However, just because it's a largely subjective "man-made" framework doesn't mean it can't have its own internally consistent logic.

Why is this distinction important? Well, if you're trying to build an AI which is aligned with commonsense ethical values, I strongly suspect that the question of self-consistency will impact the sorts of challenges such a project will have to face.[10] I'm having trouble formulating exactly what the implications here are, which is why I'm turning to the community for help. I feel like there's a really important insight somewhere around here, but it's just out of reach...


  1. ^

    I'm not going to get into the details of the formal logic here, since I'm lazy and it isn't necessary to understand my main point.

  2. ^
  3. ^

    By this I mean that it would be better had they never been brought into existence in the first place, not that they would (or necessarily should) choose to commit suicide once alive.

  4. ^

    The Repugnant Conclusion rests on fewer assumptions, but is more acceptable to many people, so makes for a less compelling case study.

  5. ^

    Or even "I don't share the intuition that this is bad in the first place," though I don't know how many people would seriously say that about the Very Repugnant Conclusion.

  6. ^

    Keep in mind that such a system is allowed to be ad-hoc and super complex, as long as it's self-consistent.

  7. ^

    From the perspective of at least one human possessing an ethical intuition. I'm leaving the quoted terms deliberately vague, so if you want to be pedantic here, mentally replace those quotes with whatever you think would make this post better.

  8. ^

    Even if it doesn't follow said ethics itself; what matters is if it can consistently reason about it.

  9. ^
  10. ^

    Whereas the question of the "objectivity" of human ethics is basically the Orthogonality thesis debate, which is a different can of worms entirely.

New Answer
New Comment

2 Answers sorted by



Utilitarianism is just an approximate theory. I don’t think it’s truly possible to compare happiness and pain, and certainly one can not balance the other. The Repugnant Conclusion should be that Utilitarianism is being stretched outside of its bounds. It’s not unlike Laplace’s demon in physics: it’s impossible to know enough about the system to make those sorts of choices.

You would have to look at each individual. I order to get a sufficiently detailed picture of their life, it takes a lot of time. Happiness isn’t a number. It’s more like a vector in high-dimensional space, where it can depend on any number of factors, including the mental state of one’s neighbors. Comparing requires combinatorics, so again, these hypothetical computations would blow up to impracticality.

Utilitarianism is instead an approximate theory. We are accepting the approximation that happiness and pain are a one-dimensional. It’s not real, but it makes the math easier to deal with. It’s useful, because that approximation works for most cases, without knowing the details, similar to statistical mechanics, but once you start getting into edge cases, the wheels fall off. That shouldn’t be surprising, as we are collapsing a high-dimensional vector into a single point. We’re losing fidelity, to gain computability.

I think it’s fair to say that humans are incapable of truly understanding each other. Relationships of that approximate level of knowledge take years to develop, and in most cases never do. Without that you don’t know their preferences, and without that you can’t know the vectors of their mental state, and therefore you can’t really compare for the level of detail needed to truly know if the world would be better in one state or another.

So, we approximate. Which is fine, as long as you remember that it is an approximation. I don’t think that it is possible to have a perfect ethical system with no contradictions. The best we can do is hold several ethical models, and see how they compare as a guide for our actions in an uncertain world.


I agree with you when it comes to humans that an approximation is totally fine for [almost] all purposes. I'm not sure that this holds when it comes to thinking about potential superintelligent AI, however. If it turns out that even in a super high-fidelity multidimensional ethical model there are still inherent self-contradictions, how/would that impact the Alignment problem, for instance?

Given the state of AI, I think AI systems are more likely to infer our ethical intuitions by default.



Utilitarianism is not supposed to be applied like this. It is only a perspective. If you apply it everywhere, then there's a much quicker shortcut: we should kill a healthy person and use this person's organs to save several other people who would otherwise be healthy if not for some organ disfunction.

Lives are in general not comparable by amount, especially human lives, for a society to function. Which is why the person who pulls the handle in the trolly problem commits a crime.

This is where intuition can go wrong. If intuitions are not necessarily consistent, since most people avoid the trolley problem at all cost, then no wonder ethics built to be based on intuition is futile.

2 comments, sorted by Click to highlight new comments since:

If in extreme situations the ethical ideas fall apart, it might make sense to add an extra rule to stay away from the extreme situations. Like maybe not forever, but to proceed sufficiently slowly so that we have time to reflect on how we feel about that.

Thanks for the post. I don’t know the answer to whether a self-consistent ethical framework can be constructed, but I’m working on it (without funding). My current best framework is a utilitarian one with incorporation of the effects of rights, self-esteem (personal responsibility) and conscience. It doesn’t “fix” the repugnant or very repugnant conclusions, but it says how you transition from one world to another could matter in terms of the conscience(s) of the person/people who bring it about.

It’s an interesting question as to what the implications are if it’s impossible to make a self-consistent ethical framework. If we can’t convey ethics to an AI in a self-consistent form, then we’ll likely rely in part on giving it lots of example situations (that not all humans/ethicists will agree on) to learn from and hope it’ll augment this with learning from human behavior, and then generalize well to outside all this not perfectly consistent training data. (Sounds a bit sketchy, doesn't it - at least for the first AGI's, but perhaps ASI's could fare better?) Generalize "well” could be taken to mean that an AI won’t do anything that most people would strongly disapprove of if they understood the true implications of the action.

[This paragraph I'm less sure of, so take it with a grain of salt:] An AI that was trying to act ethically and taking the approval of relatively wise humans as some kind of signal of this might try to hide/avoid ethical inconsistencies that humans would pick up on. It would probably develop a long list of situations where inconsistencies seemed to arise and of actions it thought it could "get away with" versus not. I'm not talking about deception with malice, just sneakiness to try to keep most humans more or less happy, which, I assume would be part of what its ethics system would deem as good/valuable. It seems to me that problems may come to the surface if/when an "ethical" AI is defending against bad AI, when it may no longer be able to hide inconsistencies in all the situations that could rapidly come up. 

If it is possible to construct a self-consistent ethical framework and we haven't done it in time or laid the groundwork for it to be done quickly by the first "transformative" AI's, then we'll have basically dug our own grave for the consequences we get, in my opinion. Work to try to come up with a self-consistent ethical framework seems to me to be a very under explored area for AI safety.