"Moral" as a preference label

byStuart_Armstrong2mo26th Mar 20191 comment


Ω 5

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Note: working on a research agenda, hence the large amount of small individual posts, to have things to link to in the main documents.

In my quest to synthesise human preferences, I've occasionally been asked whether I distinguish moral preferences from other types of preferences - for example, whether preferences for Abba or Beethoven, or avocado or sausages, should rank as high as human rights or freedom of speech.

The answer is, of course not. But these are not the sort of things that should be built into the system by hand. This should be reflected in the meta-preferences. We label certain preferences "moral", and we often have the belief that these should have priority, to some extent, over merely "selfish" preferences (the extent of this belief varies from person to person, of course).

I deliberately wrote the wrong word there for this formalism - we don't have the "belief" that moral preferences are more important, we have the meta-preference that a certain class of beliefs, labelled "moral", whatever that turns out to mean, should be given greater weight. This is especially the case as there are a lot of cases where it is very unclear if a preference is moral or not (many people have strong moral-ish preferences over mainstream cultural and entertainment choices).

This is an example of the sort of challenges that a preference synthesis process should be able to figure out on its own. If the method needs to be constantly tweaked to get over every small problem of definition, then it cannot work. As always, however, it need not get everything exactly right; indeed, it needs to be robust enough that it doesn't change much if a borderline meta-preference such as "everyone should know their own history" gets labelled as moral or not.