The low cost of human preference incoherence

by Stuart_Armstrong2 min read27th Mar 20195 comments


Ω 7

Personal Blog
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Note: working on a research agenda, hence the large amount of small individual posts, to have things to link to in the main documents.

In aiming for an adequate synthesis of human preferences, I tend to preserve quirks and even biases of humans, rather than aiming for elegant simplicity as many philosophers do. The main reason is that, feeling that value is fragile, I fear losing a valid preference more than allowing an illegitimate bias to get promoted to preference.

But another point is that I don't feel that the cost of excessive complexity is very high.

The small cost of one-off restrictions

An analogy could be with government interventions in the market: most naive restrictions will only slightly raise the cost, but will not have a large effect, as the market routes around the restrictions. Consider a government that created price controls on all bread, for some reason. If that was a one-off rule, then we know what would happen: loaves of bread would get gradually smaller, while bakeries would start producing more cake or cake-like products, which now occupy the niche that more expensive bread products would otherwise have filled.

There is likely an efficiency loss here - the balance between cake and bread is different from the market would have given, for small losses to bakers and customers. But these are not huge losses, and they are certainly much smaller than would have happened if the bakers and customers had simply continued as before, except with the price control.

Note that if the government is dynamic with its regulations, then it can impose effective price controls. It could add regulations about bread size, then bread quality, repeatedly; it could apply price controls to cake as well or ban cakes; as the market adjusts, regulations could adjust too. This would impose large efficiency losses (possibly coupled with equity gains, in certain circumstances).

But the cost of one-off restrictions tends to be low.

Quirks and biases in human preferences

It seems to me that including quirks and biases in human preferences, is akin to one-off restrictions in the market. Firstly because these quirks will be weighted and balanced against more standard preferences, so would be weaker than "government regulations". They would be one-off: quirks and biases tend to be, by definition, less coherent than standard preferences, so they wouldn't be particularly consistent or generalisable, making it easy to "route around".

Consider for example how governments deal with these biases today. People tend to be prejudiced against utilitarian calculations for life-and-death issues, and in favour of deontological rules. Yet healthcare systems are very much run on utilitarian lines, sometimes explicitly, sometimes implicitly. We resolve trolley problems (which most people hate) by ensuring that individuals don't encounter important trolley-like problems in their everyday lives (eg the Hippocratic oath, rules against vigilantism), while larger organisations take trolley-like decisions in bureaucratic ways that shield individuals from the pain of taking any specific decision. In effect, large organisations have become skilled at both respecting and getting around human quirks and biases.

Now, there are a few examples where human biases impose a large cost - such as our obsession with terrorism (and to a lesser extent, crime), or our fear of flying. But, to extend the analogy, governments are far from perfectly efficient, and have their own internal dynamics, blame, and power struggles. An AI dedicated to a) maximising human survival, and b) respecting people's fear of terrorism, would be much more efficient that our current system - providing the most efficient and spectacular uses of security theatre to make people feel safer, while prioritising terrorism deaths over other deaths as little as is compatible with human goals. For example, the AI might be able to diffuse the visceral human response to terrorist attacks, which would deprive them of much of their impact. This resembles the nearest unblocked strategy approach: the AI is respecting the bias as little as it needs to.

And, of course, as the AI's power increased, fulfilling quirks and biases will become relatively cheaper to satisfy, as the space of possible strategies opens up. It is very hard to design a patch that AIs cannot route around; human quirks and biases are very far from being well-designed patches for this purpose.


This is reasoning by weak analogy, so the issue is worth investigating further. However, at first glance, it seems that the cost of extra complexity in preferences, especially of the quirks and biases types, may be rather low.

Personal Blog


Ω 7