Many people - people we want to persuade - are essentialists about morality and ethics. They give weight to the idea that just like how knowing facts about my shoes is possible because my shoes exist and interact with me, facts about what is right or good are possible because there is some essence of rightness or goodness "out there" that we somehow interact with.

This isn't totally wrong. But I think it reflects the usual human heuristic of assuming that every category has an essence that makes it what it is. Put a human race in a world with water, air, and fire, and the first thing they'll think is that the basic categories of water, air, and fire must somehow be reflected in the fundamental order of the world. And why does heat behave the way it does? The flow of heat substance. Why does opium make you drowsy? It has sleepiness essence.

This sort of heuristic passes for everyday use, but if you assume that goodness is a fundamental part of the world, the idea of an AI learning to do the right thing is going to sound a bit weird. How can we talk about value learning without even once checking if the AI interacts with the goodness-essence?

This post outlines the sort of strategy I'd use to try to get people following essentialist intuitions on board with value learning. I'll indirectly rely on some properties of morality that behave more like a pattern and less like an essence. I'd definitely be interested in feedback, and if you have moral essentialist intuitions, please forgive me for framing this post as talking about you, not to you.

1: Why bother?

AI is going to be the pivotal technology of the coming century. AI that does good things can help us manage existential risks like climate change, bio-engineered diseases, asteroids, or rogue AIs build by less careful people.

In a future where very clever AI does good things, everything is probably pretty great. In a future where very clever AI does things without any regard to their goodness, even if humanity doesn't get swept away to extinction, we're certainly not realizing the potential boon that AI represents.

Therefore, it would be really handy if you could program a computer to figure out what the right thing to do was, and then do it - or at least to make an honest try on both accounts. The status quo of humanity is not necessarily sustainable (see above about climate change, disease, asteroids, other AIs), so the point is not to design an AI where we're 100% absolutely certain that it will do the right thing, the point is just to design an AI that we're more confident in than the status quo.

2: It goes from God, to Jerry, to me.

Suppose that humans have knowledge of morality. Then we just need to have the AI learn that knowledge. Just like how you can know that lithium is the third element without having experimentally verified it yourself - you learn it from a trustworthy source. Hence, "value learning."

Problem solved, post over, everyone go home and work on value learning? Well...

The basic question someone might have about this is "what about moral progress?" For example, if one models the transition from slavery being legal to illegal as moral progress made by interacting with the external goodness-essence, what if there are further such transitions in our future that the AI can't learn about? Or what if futuristic technology will present us with new dilemmas, moral terra incognita, which can only be resolved correctly by consultation of the goodness-essence?

This rather depends on how the AI works. If what the AI learns from humans is some list of rules that humans follow, then absolutely it can become morally outdated. But what if the AI learns the human intuitions and dispositions that lead humans to make moral judgments? Then maybe if you sent this AI back to the 1700s, it would become an abolitionist.

In other words, the real goal of value learning isn't just to regurgitate human opinions, it's to become connected to moral judgments in the same way humans are. If you think that the human connection to morality is supernatural, even then there are conceptions of the supernatural that would allow AI to reach correct moral conclusions. But I think that even people who say they think morality is supernatural still share a lot of the same intuitions that would let an AI learn morality. Like "moral reasoning can be correct or not independent of the person who thinks it."

If the human connection to the goodness-essence is something that depends on the details of how humans reason about morality, then I think the success of value learning is very much still on the table, and we should be looking for ways to achieve it. If you think human morality isn't supernatural but don't think that merely copying human moral reasoning to the extent that you could be an early abolitionist is sufficient, don't use that as an excuse to give up! Try to figure out how an AI could learn to do the right thing, because it might be important!


3 comments, sorted by Click to highlight new comments since: Today at 12:04 AM
New Comment

So I want to poke at you and see if I'm understanding you correctly.

First, are you just talking about strong moral essentialism (morals are powered by real, possibly observable, facts or processes, the are causally connected to moral judgements) here or all moral realism (moral facts exist, even if they are unknowable)?

Second, what makes you think a moral realist would not be in favor of AI value learning such that you need to argue for it?

I think there are ways in which believing in moral realism may make you sloppy about value learning and is more likely to result in learning designs that will result in learning values that we would, in retrospect, not endorse, but I don't see that as suggesting a moral realist would be against value learning (preprint on this point). In fact, I expect them to be for it, only that they will expect values to naturally converge no matter the input where an anti-realist would except input to matter a lot. The standard objection from an antirealist would be "seems like data matters a lot so far to outcomes" and the standard realist reply might be "need more data or less biased data".

Yeah, I'm not 100% my caricature of a person actually exists or is worth addressing. They're mostly modeled on Robert Nozick, who is dead and cannot be reached for comment on value learning. But I had most of these thoughts and the post was really easy to write, so I decided to post it. Oh well :)

The person I am hypothetically thinking about is not very systematic - on average, they would admit that they don't know where morality comes from. But they feel like they learn about morality by interacting in some mysterious way with an external moral reality, and that an AI is going to be missing something important - maybe even be unable to do good - if they don't do that too. (So 90% overlap with your description of strong moral essentialism.)

I think these people plausibly should be for value learning, but are going to be dissatisfied with it and feel like it sends the wrong philosophical message.

But they feel like they learn about morality by interacting in some mysterious way with an external moral reality,

What does it mean for a reality to be moral?