The low cost of human preference incoherence

by Stuart_Armstrong2 min read27th Mar 20195 comments

20

Ω 8

Personal Blog
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Note: working on a research agenda, hence the large amount of small individual posts, to have things to link to in the main documents.

In aiming for an adequate synthesis of human preferences, I tend to preserve quirks and even biases of humans, rather than aiming for elegant simplicity as many philosophers do. The main reason is that, feeling that value is fragile, I fear losing a valid preference more than allowing an illegitimate bias to get promoted to preference.

But another point is that I don't feel that the cost of excessive complexity is very high.

The small cost of one-off restrictions

An analogy could be with government interventions in the market: most naive restrictions will only slightly raise the cost, but will not have a large effect, as the market routes around the restrictions. Consider a government that created price controls on all bread, for some reason. If that was a one-off rule, then we know what would happen: loaves of bread would get gradually smaller, while bakeries would start producing more cake or cake-like products, which now occupy the niche that more expensive bread products would otherwise have filled.

There is likely an efficiency loss here - the balance between cake and bread is different from the market would have given, for small losses to bakers and customers. But these are not huge losses, and they are certainly much smaller than would have happened if the bakers and customers had simply continued as before, except with the price control.

Note that if the government is dynamic with its regulations, then it can impose effective price controls. It could add regulations about bread size, then bread quality, repeatedly; it could apply price controls to cake as well or ban cakes; as the market adjusts, regulations could adjust too. This would impose large efficiency losses (possibly coupled with equity gains, in certain circumstances).

But the cost of one-off restrictions tends to be low.

Quirks and biases in human preferences

It seems to me that including quirks and biases in human preferences, is akin to one-off restrictions in the market. Firstly because these quirks will be weighted and balanced against more standard preferences, so would be weaker than "government regulations". They would be one-off: quirks and biases tend to be, by definition, less coherent than standard preferences, so they wouldn't be particularly consistent or generalisable, making it easy to "route around".

Consider for example how governments deal with these biases today. People tend to be prejudiced against utilitarian calculations for life-and-death issues, and in favour of deontological rules. Yet healthcare systems are very much run on utilitarian lines, sometimes explicitly, sometimes implicitly. We resolve trolley problems (which most people hate) by ensuring that individuals don't encounter important trolley-like problems in their everyday lives (eg the Hippocratic oath, rules against vigilantism), while larger organisations take trolley-like decisions in bureaucratic ways that shield individuals from the pain of taking any specific decision. In effect, large organisations have become skilled at both respecting and getting around human quirks and biases.

Now, there are a few examples where human biases impose a large cost - such as our obsession with terrorism (and to a lesser extent, crime), or our fear of flying. But, to extend the analogy, governments are far from perfectly efficient, and have their own internal dynamics, blame, and power struggles. An AI dedicated to a) maximising human survival, and b) respecting people's fear of terrorism, would be much more efficient that our current system - providing the most efficient and spectacular uses of security theatre to make people feel safer, while prioritising terrorism deaths over other deaths as little as is compatible with human goals. For example, the AI might be able to diffuse the visceral human response to terrorist attacks, which would deprive them of much of their impact. This resembles the nearest unblocked strategy approach: the AI is respecting the bias as little as it needs to.

And, of course, as the AI's power increased, fulfilling quirks and biases will become relatively cheaper to satisfy, as the space of possible strategies opens up. It is very hard to design a patch that AIs cannot route around; human quirks and biases are very far from being well-designed patches for this purpose.

Conclusion

This is reasoning by weak analogy, so the issue is worth investigating further. However, at first glance, it seems that the cost of extra complexity in preferences, especially of the quirks and biases types, may be rather low.

20

Ω 8

5 comments, sorted by Highlighting new comments since Today at 2:20 PM
New Comment

The bread example is really worrying: AI may find the ways to routes around the not only biases but also preferences.

Note that it "routes around" but it also satisfies those preferences; ultimately, AI should prevent humans from dying of terrorist attacks and from other causes.

Returning to the bread example: it basically means that the market will optimise short-term profits, not matter which rules we will try to impose on it.

Now the question arises: what will optimise AI no matter what?

One answer is "its own reward function", which means that any sufficiently advance AI will quickly find the ways to wirehead itself and halts. This means that there is an upper limit of AI's optimisation power, above which it wireheads itself almost immediately.

Interesting question is how this upper limit relates to the AI's level needed to tile the universe with paperclips. If wireheading level is above universe tiling level, then paperclipper is possible. Otherwise, single paperclipper can't tile the universe, but society of AI's could still do it.

I am not sure that any type of bias could be added up to better result. For example, if I want something sweet (my real goal) and ask for a cucumber (biased idea of sweet), I will get the cucumber, but not sweet food.

I feel it's more like "you want something sweet, except between 2 and 3 pm". In that case, one solution is for the shop to only stock sweet things, and not let you in between 2 and 3 (or just ignore you during that time).