Three anchorings: number, attitude, and taste

Stuart_Armstrong

I've shown that one cannot simultaneously deduce the preferences and rationality of an irrational agent - neither in theory, nor in practice.

To get around that problem, one needs to add extra assumptions - assumptions that cannot be deduced from observations. The approach that seems the most promising to me is to use the internal models that humans have - models of themselves and of other humans. Note that this approach violates algorithmic equivalence, since some of the internal structures of the human algorithm are relevant (this means it also violated extentionality and some version of functionalism).

Humans very often agree...

One thing that gives me some hope for that approach, is that humans very often agree with each other about when other humans, or themselves, are being irrational or reasonable. The agreement isn't perfect by any means - and we spend a lot of time debating the uncertain cases - but in many, many areas, most humans agree with each other, implying that most humans use similar internal models.

My favourite example of this is the anchoring bias, where, for example, a human is asked to consider the last two digits of their social security number, asked whether they would pay that amount for chocolates, then asked what price they would actually pay for those chocolates. The bias comes from the fact that the price they named was influenced by the two digits they named.

What's interesting is that almost everyone agrees that this is a bias; no-one argues the case that actually people really value pricing things close to numbers they've heard recently.

More preference, less bias

Now let's imagine a situation where there is no quoting of social security numbers, but there are two potential chocolate vendors, one of them of standard politeness, the second very rude. Without running the experiment, I'm confident that people will be willing to pay more in the first case than in the second.

This is very similar to the anchoring bias: the two situations differ by one detail, and the price is different.

Is this a bias? Here I expect more disagreement. Yes, technically, the rudeness of the vendor should be independent of the quality of the chocolates, but there are arguments that, in social situations, one should take these into account.

All on taste

Now let's consider a third "anchoring" situation, where the chocolates are sold in the same way, the only difference being that the first batch of chocolates is delicious, the second is disgusting. Again, I predict the delicious chocolates would sell for more.

Is this a bias? I'd expect that there would be almost universal agreement that it is not; indeed, "taste" is a loose synonym of "preference".

So that's one of the setups I'm considering when elucidating human preference. Three situations in which chocolates differ by one detail, and are priced differently in consequence. The first is clearly a bias, the second is debatable, the third is clearly a preference difference. Once AIs can figure out differences like this, we can start doing inverse reinforcement learning of human preferences.

Now let's imagine a situation where [...] there are two potential chocolate vendors, one of them of standard politeness, the second very rude. Without running the experiment, I'm confident that people will be willing to pay more in the second case than in the first.

Do you mean the first case (standard politeness)? If not, I'm not sure why you think people would be willing to pay more. Is it that the rude vendor is doing some sort of handicapping, and the fact that they're still in business proves they have great chocolate?

Judging by the structure of your post though, I think you're pointing out that though the rudeness doesn't change the quality of the chocolate, it still reasonably (according to most humans) changes preferences about where to buy it

>If not, I'm not sure why you think people would be willing to pay more.

It's more that they will pay less in the other case. Since the setup is experimental, I doubt people would reason about handicapping.

Sorry for obtuseness, but I'm still not sure if you made a typo-ish mistake or not. You say (a) "here are two potential chocolate vendors, one of them of standard politeness, the second very rude" and (b) "willing to pay more in the second case than in the first". Subbing (a) into (b) I get "willing to pay more in the very rude case than the standard politeness case". I think this was just a flipped word, but thought I'd check

Sorry, you are correct, that was a typo, now corrected.