Moral Golems Assume Uncertainty

by Erich_Grunewald11 min read28th Aug 20212 comments


Ethics & MoralityWorld Optimization

[T]he Christian humanist and Hebraist Johannes Reuchlin (1492) report[ed] the creation of an artificial anthropoid whose forehead bore the Hebrew inscription "YHVH Elohim emeth" (The name of God is truth) and who fell dead to the ground when the aleph, the first letter forming the Hebrew word for truth, was removed.[1]

– Cathy Gelbin

I used to consider myself vaguely utilitarian, but after reading Fellow Creatures[2] and other texts by Christine M. Korsgaard I began to see myself more as a Kantian. Her ethics make a lot of sense to me, though I still find myself thinking at times in utilitarian or even Schopenhauerian terms. The weird thing is that I want to pick a favourite. Or maybe I should say "find myself wanting", because of course from some distance I can see that, when it comes to ethics, where evidence is scarce and p-values rarely cross the magical threshold, a stiff dose of uncertainty is a good thing. I suppose I find myself wanting to pick a favourite partly because I want to make it part of my identity (hmm, hmm) and partly because I feel that it is the necessary groundwork one needs to make before one can decide what to think about object-level moral issues.

That was the tension I was trying to resolve when I wrote the post about moral golems. I argued, or suggested perhaps, in it that we "should not search for the one true moral system, [but] seek many moral systems that each have a truth in them" and that "the project of ethics is not dissimilar from that of the social sciences, say, in that it tries to produce answers by modelling the essential characteristics of very complicated human structures". I likened such a moral system to a golem that we inscribe with truth, animate and instruct, and which, like a statistical model, does no more and no less than what we tell it to do. I was not aware at the time that a group of philosophers had been working on exactly this question under the heading moral uncertainty, but I am happy to report now that my discovering this caused me no annoyance, though it might have long ago.

Drawing by Viktoriia Shcherbak of huddling golem.

How would the moral golem approach work in practice? Say Suleiman has decided that Utilitarianism, Kantianism and the ethics of Arthur Schopenhauer all have some truth in them. He is not sure about which one of them is true, if one is at all, but they seem like likely candidates. An Ottoman general knocks on his door and asks him are there any Armenians in his house? Suleiman is in fact hiding an Armenian family in his attic. The utilitarian calculation favours lying to the Ottoman general; the Kantian calculation (in the particular variant of Kantianism to which he subscribes, at least) favours telling the truth; Schopenhauer would probably recommend lying, too, as that seems to be more compassionate and to negate the will-to-life within Suleiman in that it (1) puts his own life in danger and (2) recognises the oneness of all striving things. Having thought about this before and heard the majority of the three golems answer in favour of lying, Suleiman goes ahead and lies to the Ottoman general.

Moral Uncertainty

Moral golems only make sense if moral uncertainty makes sense. Without uncertainty, there would be no need for combining multiple moral systems, because we would (or should) be certain that one of them is true. But herein lies a problem. If we could somehow detach ourselves from our human perspective and see things from the eyes of a god, objectively, all-knowingly, then there would be no uncertainty. So it seems that, for there to be uncertainty about moral systems, there needs to be a subject with imperfect information; uncertainty is always uncertainty for a particular agent. But in that case, do people not always and everywhere act correctly when they act according to how they see the world?

Put differently, the moral golem approach assumes that we should act with uncertainty in mind, and therefore according to our chosen moral systems – that is the right thing to do. But what if Suleiman had accepted a different set of moral theories, such that he would have decided that the right thing to do was to tell the Ottoman general the truth about the Armenian family? In that case, he would surely have been worthy of blame. Does the moral golem approach say that Suleiman acted correctly in this second scenario? If so, something must be wrong with the moral golem approach: it commits us to the view that people acting according to their subjective moral systems are always acting blamelessly.

I think a solution to this is just to say that using the moral golem approach is the right thing to do (if it is the right thing to do) meta-ethically. It is an answer to the question "which moral system should I use?", not "how should I act in this particular situation?" ... Once the ensemble of moral golems is in place, the blameworthiness of an action depends on the moral systems that make it up, on whether the person made a good-faith effort to find the most plausible systems given the available evidence. That is why we do not blame people who have been indoctrinated like we blame people who "know fully what they are doing". In Harman[3], where I saw this objection, it seems to me almost as if the argument is something like "taking uncertainty seriously is bad, because it is not as good as certainty given perfect moral knowledge". To me, it is hard to imagine morality without uncertainty.

(There are a number of other objections to taking moral uncertainty seriously; for an overview and responses, see MacAskill et al.[4])

The moral golem approach is, so I now know, a crude version of Nick Bostrom and Toby Ord's Parliamentary Approach to moral uncertainty. It is also similar to (and again far less fleshed out than) another of the main approaches to moral uncertainty, namely that of maximising the expected value.[5] (Oh well, rediscovery is the mind's curare.) This is the textbook solution for deciding between options with varying probabilities and payoffs. To the extent that there is a difference at all between this and the moral golem approach, I suppose it is that the latter says that, when calculating expected value, it is better to take into account many simple moral systems than few complicated ones, because this guards against overfitting.

Underfitting and Overfitting

Probably the more useful idea in the original blog post was that of overfitting or underfitting moral systems. An overfitted moral system is one that has been tailored to specific moral problems to such an extent that it does not generalise well to other moral problems. An underfitted moral system is one that is too simple to be able to handle many different moral problems. As I wrote in the original post:

Say Eithne has philosophised herself a moral system. Now she comes across an interesting thought experiment. When she applies her moral system to the thought experiment (this is the new context), it produces an outlandish outcome which she cannot possibly accept. Maybe it fails, in the new situation, to judge wanton murder to be wrong. So she adjusts her moral system to account for the new data. It now produces sensible outcomes on both the new example and the old examples that she has tested it against. Neat! But Eithne has also made her system more complex. If she keeps doing that, it may become closely tied to the contexts she has considered and won't transfer well to new contexts.

That was an example of overfitting. Here is one of underfitting. Scott Alexander talks about crystallised metaphysical heuristics, where a complicated model for preventing antisocial behaviour, to use one of his examples, is distilled into a simple maxim ("avenge wrongdoing") that regular people can use. That seems like a very simple model indeed. In fact, it is so simple that it will underfit – it will not be able to capture the complexity of the real world: heuristics like "avenge wrongdoing" will under some circumstances recommend seemingly immoral actions. "Do not lie" is another heuristic that serves us pretty well most of the time, but had Suleiman adhered to it when talking to the Ottoman general, he would have done something blameworthy.

(I am reminded that Arthur Schopenhauer described pedantry as what happens when people "lack confidence in their own understanding" and, instead of looking at the particular details of their situation, "start out from universal concepts, rules and maxims and seek to hold themselves exactingly to these".[6] In other words, pedantry is over-reliance on heuristics; compare with the German Prinzipienreiter, literally "principle-rider", used to describe a person who pettishly insists on their principles.)

To capture complexity using simple models, you need to combine many of them. With heuristics, this gets a little weird, because unlike moral systems like Utilitarianism or Kantianism they typically apply only to specific situations. I can imagine Suleiman, in the example above, using a variety of heuristics to decide on the right course of action, such as "do not lie" and "obey authority" (favouring telling the truth) and "do not betray somebody's trust", "act courageously" and "do not aid in wrongdoing and do not murder" (favouring lying), but I am not sure whether this would be a good approach. You might need a lot of them to capture as much complexity as, say, pure Utilitarianism does. You may also run into problems where a new heuristic, perhaps because it has been derived from the same source as all the others (like all of Scott's examples being derived from Utilitarianism), is too correlated with the others to add any new information.

Moral Golems as an Advanced Stage of Moral Reasoning

Timothy Johnson pointed me to a summary by David Chapman of Robert Kegan's model of human psychological development. From what I understand having read this summary, Kegan's model describes five stages through which humans can pass as they get better at moral reasoning. Stages one and two are for children; most but not all humans reach stage three during adolescence; few reach stage four at all; and rarer still is it for anybody to reach stage five.

At stage three, we find ourselves able to see things from others' points of view, and we are able to have intrinsically valuable relationships with them. I understand this stage as essentially focusing on order, harmony, fairness, norms and social circles.

However, because their focus is local, people at stage three are basically unable to solve the sorts of large coordination problems that often come up in the modern world. At stage four, we begin to think in terms of systems. We are able to bracket ancillary concerns and reason about problems in isolation. Now morality means less taking everyone's feelings into account (as in stage three) and more taking their interests into account, even generating moral frameworks for dealing with this.

But at stage five – and here it gets interesting – we begin to look at things from a meta perspective. We see contradictions between systems and learn how to resolve those contradictions:

For stage 4, a system is justified by an ideology that grounds out in some set of ultimate principles. When you realize that the system doesn't work as well as the ideology claims it should, you look for an alternative set of principles. This can motivate adopting a series of political or religious affiliations, each of which seems at first to be right; and each of which eventually fails you.

But at some point you realize that all principles are somewhat arbitrary or relative. There is no ultimately true principle on which a correct system can be built. It's not just that we don't yet know what the absolute truth is; it is that there cannot be one. All systems come to seem inherently empty.


Fluid epistemology can relate systems to each other, in a way that the systematic mode cannot. Systems become objects of creative play rather than constitutive of self, other, and groups. Fluidity can hold contradictions between systems comfortably while respecting the specific functioning and justification-structure of each. All ideologies are relativized as tools rather than truths. [...] Stage 5 can, therefore, conjure with systems, as animated characters in a magical shadow-play drama.

I suppose I should be flattered that the moral golem approach to ethics that I thought of is a sign of the highest stage of moral development, one which fewer than one in twenty American adults have reached. But I am not sure I am really entitled to, because in the end I am more of a Kantian than anything else.

  1. Gelbin, C. S. (2011). The Golem returns: from German romantic literature to global Jewish culture, 1808-2008. University of Michigan Press. ↩︎

  2. Korsgaard, C. M. (2018). Fellow creatures: Our obligations to the other animals. Oxford University Press. ↩︎

  3. Harman, E. (2015). The irrelevance of moral uncertainty. Oxford studies in metaethics, 10, 53-79. ↩︎

  4. MacAskill, M., Bykvist, K., & Ord, T. (2020). Moral uncertainty (p. 240). Oxford University Press. ↩︎

  5. ibid. ↩︎

  6. Schopenhauer, A. (2016). Arthur Schopenhauer: The World as Will and Presentation: Volume I. Routledge. ↩︎


2 comments, sorted by Highlighting new comments since Today at 12:11 AM
New Comment

I love how you're working through these tricky questions. If I read you correctly, you are assuming that there is one true morality, we just don't know what it is (uncertainty in the map rather than in the territory, you could say). Would be great if you wrote about what grounds the one true morality about which we lack certainty, i.e., where does it come from? (If you have, sorry for not being familiar with that post of yours.)

thanks for commenting. i haven't written about anything like that because my thoughts about it are rudimentary at best! i think you're correct that these speculations are premised on some sort of moral realism (if i understand you correctly). to be clear, i really don't know whether moral realism or anti-realism is more plausible. just from a very shallow knowledge of metaethics, i think something like constructivism seems most plausible to me, but i'm not sure about how that maps onto the realism/anti-realism question.