Satisficers' undefined behaviour

by Stuart_Armstrong 1 min read5th Mar 201512 comments


I previously posted an example of a satisficer (an agent seeking to achieve a certain level of expected utility u) transforming itself into a maximiser (an agent wanting to maximise expected u) to better achieve its satisficing goals.

But the real problem with satisficers isn't that they "want" to become maximisers; the real problem is that their behaviour is undefined. We conceive of them as agents that would do the minimum required to reach a certain goal, but we don't specify "minimum required".

For example, let A be a satisficing agent. It has a utility u that is quadratic in the number of paperclips it builds, except that after building 10100, it gets a special extra exponential reward, until 101000, where the extra reward becomes logarithmic, and after 1010000, it also gets utility in the number of human frowns divided by 3↑↑↑3 (unless someone gets tortured by dust specks for 50 years).

A's satisficing goal is a minimum expected utility of 0.5, and, in one minute, the agent can press a button to create a single paperclip.

So pressing the button is enough. In the coming minute, A could decide to transform itself into a u-maximiser (as that still ensures the button gets pressed). But it could also do a lot of other things. It could transform itself into a v-maximiser, for many different v's (generally speaking, given any v, either v or -v will result in the button being pressed). It could break out, send a subagent to transform the universe into cream cheese, and then press the button. It could rewrite itself into a dedicated button pressing agent. It could write a giant Harry Potter fanfic, force people on Reddit to come up with creative solutions for pressing the button, and then implement the best.

All these actions are possible for a satisficer, and are completely compatible with its motivations. This is why satisficers are un(der)defined, and why any behaviour we want from it - such as "minimum required" impact - has to be put in deliberately.

I've got some ideas for how to achieve this, being posted here.