At least deontological theories propose specific precepts about what is probably good: not murdering, not stealing, and so on. All of those are on the table if you’re a utilitarian and that makes you someone I don’t want to be around or to have moral authority over me.
Utilitarianism says that murdering is bad because it reduces utility. You're concerned that if murdering increased utility, then utilitarianism would endorse murdering.
But doesn't that objection apply to deontology too? Deontology says that murdering is bad because you have a duty to refrain from murdering. So we should be concerned about circumstances in which a person would infer that they have a duty to murder.
In practice, there are cases where people come to believe that doing a particular murder would increase utility ... and there are also cases in which people come to believe that they have a duty to murder. Some serial killers, spree killers, and terrorists seem to hold this position. The guy who threw the firebomb at Sam Altman's house seems to have held this position: that consistency entailed a duty to use violent force.
So are utilitarians really worse to be around, or to exercise moral authority, than deontologists? They seem to suffer very similar failure cases! Perhaps what we should care about is not whether people are utilitarians or deontologists, but rather whether they have the sort of cognitive habits that keep them safely far away from the error of believing that a murder is what the world needs right now.
When talking about definitions of good, it may be one’s first instinct to reach for philosophy.
But there is a fairly obvious conception of good which is a simple extension of the evolutionary “survival” objective.
Evolutionarily, every organism is in competition for finite resources, and those which fit their environment best are those which tend to exist for longer.
We would like an advanced civilisation that lasts a long time. Our civilisation is already collapsing due to a lack of knowledge and wisdom transfer. So it is unlikely to last much longer in its current form.
The survival objective, conceived narrowly, is about avoiding ruin for as long as possible. Ruin is the situation where you “no longer get to play,” permanent death, in other words. Evolution merely says, tautologically, that which is best at causing itself to exist in the next time interval will exist more in total, if you add up all the time intervals.
But thinking in terms of “survival” is limiting, because it’s binary. It can work for an “at least this much” fitness function, but ultimately there must be a greater basis for action which results in “more survival.” This is antiruin - action taken in opposition to the direction of ruin.
The basic statistical fact is that if you make a habit of betting everything on less-than-sure-things, you will be ruined with probability approaching 1.
This produces a simple signal for good: it is actions which are non-probabilistically antiruinous. That is, opposed to the direction of ruin, which every organism is in by default, considering that it costs constant energy input just to continue existing.
So a better, and more general and reliable way of thinking about good is as actions which are definitely not bad.
This has wide-reaching implications. For example, the Repugnant Conclusion is easy to reject: It’s bad because it’s not definitely not bad. This accords with a common-sense understanding.
Online, I feel there is something of disdain for deontological theories of ethics. But I have to say, I respect those a lot more than those who “just multiply,” arrive at the repugnant conclusion and then accept it because that’s what the numbers say. At least deontological theories propose specific precepts about what is probably good: not murdering, not stealing, and so on. All of those are on the table if you’re a utilitarian and that makes you someone I don’t want to be around or to have moral authority over me.
Of course, in practice sometimes hard decisions must be made about who to prioritise. In New Zealand, we have a centralised medication-buying agency which has to decide how to allocate limited funds in order to help the most people the most.
But even here, we do not “simply multiply,” they make specific hypotheses about what might be good, and they can sometimes get it wrong but at least they’re trying.
If you outsource your morality to a formula, that results in atrocidal conclusions.
Doing a good thing is good, that is, when you know in doing it that it is definitely not bad. This is the antiruin conception of good. It can be small, or large, but fundamentally the principle is about creating the space to live, to survive more, to thrive for a longer time.
Nonabsolutism
There is a second failure mode which necessitates further reasoning than just good as antiruin. The structure of reality is such that mass cooperation is always going to be more resource-efficient than competition or neutrality. You can think of cooperation as like “voluntarily letting yourself be exploited” in the VNM, dutch book money pump kind of sense.
This works, until it doesn’t and some part tries to take more than what another part is willing to part with. So there is a path there where one part tries to take “ownership” of all the other parts of an organism and thereby gain control for itself of all the resources accessible by all the parts together.
Why is this a failure mode? Well, essentially by doing this, one part has killed (brought to ruin) every other part by taking control of its resources. It is reduced to machine, not living organism.
This suggests the principle that diversity is a good in itself. We see in machine learning that this is indeed the case:
For our purposes here of moral philosophy, it is enough to note that diversity is a requirement for good performance. So allowing the gelated “All cooperate at the behest of one,” ie., absolutism, is a ruinous failure mode. It is worth highlighting separately even though it is implicit in good as antiruin, as the connection is not obvious.
What does this mean?
In my view, we now have a basic philosophical framework for evaluating what is good, and so this is perhaps a step toward making machines that reliably do good, and not bad. This is a pressing question with the situation with AI today.
In The Value of Information I gave the conceptual outline of a criterion that could be used to make LLMs truthful, in the sense of faithfully representing their training data. In this document, I have given a conceptual outline of how to make them behave as good members of society: they must deduce that an action is other than ruinous.
I would like to see AI labs take note of these ideas and make better machines. Personally, I was surprising that RLHF, the current “alignment” training paradigm works at all. But no, to be reliably good you must be definitely not bad. This is a difficult question, but then again, we already have automatic differentiation and theorem provers, so I feel the pieces are likely already there to make a computer system that doesn’t lie, is robust to even adversarial attempts to make it do bad things, and remains generally helpful in everyday life.
If large AI labs won’t do this, I will do it for myself (ironically likely using AI) because these types of systems will be so much and so obviously better.
Stepping aside from AI for a second, this also brings clarity to some behaviours that we “just feel” are wrong. I will leave examples to the reader, but in particular note that the carrying on of tradition, the transfer of ideas across generations is important, because these were ideas that stood the test of time, that were antiruinous, and therefore, might still be unless something fundamental has changed.
So society is collapsing under the weight of inveridical information, but we do actually have all the pieces to put together something new and better. I would like to do that, preferably with other smart people who I perceive as already doing the right thing.