Very different, very adequate outcomes

by Stuart_Armstrong 1 min read2nd Aug 201910 comments


Ω 6

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Let be the utility function that - somehow - expresses your preferences[1]. Let be the utility function expresses your hedonistic pleasure.

Now imagine an AI is programmed to maximise . If we vary in the range of to , then we will get very different outcomes. At , we will generally be hedonically satisfied, and our preferences will be followed if they don't cause us to be unhappy. At , we will accomplish any preference that doesn't cause us huge amounts of misery.

It's clear that, extrapolated over the whole future of the universe, these could lead to very different outcomes[2]. But - and this is the crucial point - none of these outcomes are really that bad. None of them are the disasters that could happen if we picked a random utility . So, for all their differences, they reside in the same nebulous category of "yeah, that's an ok outcome." Of course, we would have preferences as to where lies exactly, but few of us would risk the survival of the universe to yank around within that range.

What happens when we push towards the edges? Pushing towards seems a clear disaster: we're happy, but none of our preferences are respected; we basically don't matter as agents interacting with the universe any more. Pushing towards might be a disaster: we could end up always miserable, even as our preferences are fully followed. The only thing protecting us from that fate is the fact that our preferences include hedonistic pleasure; but this might not be the case in all circumstances. So moving to the edges is risky in the way that moving around in the middle is not.

In my research agenda, I talk about adequate outcomes, given a choice of parameters, or acceptable approximations. I mean these terms in the sense of the example above: the outcomes may vary tremendously from one another, given the parameters or the approximation. Nevertheless, all the outcomes avoid disasters and are clearly better than maximising a random utility function.

  1. This being a somewhat naive form of preference utilitarianism, along the lines of "if the human choose it, then its ok". In particular, you can end up in equilibriums where you are miserable, but unwilling to choose not to be (see for example, some forms of depression). ↩︎

  2. This fails to be true if preference and hedonism can be maximised independently; eg if we could take an effective happy pill and still follow all our preferences. I'll focus on the situation where there are true tradeoffs between preference and hedonism. ↩︎


Ω 6