Combining individual preference utility functions


Ω 4

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Note: working on a research agenda, hence the large amount of small individual posts, to have things to link to in the main documents.

I've been working on a way of synthesising the preferences of a single given human.

After this is done, there is the business of combining these utilities into a single utility function for the whole human population. One obvious way of doing this is to weigh the different utilities by some individual measure of intensity, then add them together, and maximise the sum.

But there are some arguments against this. For example, a lot of people's preferences is over the behaviour or experiences of other people. This includes things like promoting religious or cultural beliefs (and this can cover preferences like wanting others to have basic human rights or to be free from oppressive situations).

Apart from simple summing, which may result in an excessive strength to a majority, there are two other obvious solutions: the first is to remove preferences over other people entirely. Note that this would also remove most preferences over social and cultural systems. The other one is to remove any anti-altruistic components to the system: preferences over the suffering of others are no longer valid (note this would allow punishment-as-a-deterrence, or punishment-as-retribution-to-a-victim, but not punishment-as-abstract-justice). This may be a bit tricky to define - what's the clear difference between wanting to be of higher status and wanting others to be of lower status? - but might be a desirable compromise. It might, indeed, be the kind of compromise that different people would "negotiate to" in some sort of "moral parliament" example, since people often tend to prefer their own individual desires over desires to control other people, so this might be a solution that would have majority support.