I was struck by that question reading one of the responses to the post polling the merits of several AI alignment research ideas.

I have not really thought this through but it seems the requirement for preference ordering satisfying a transitivity requirement must also assume the alternatives being ranked can be distilled to some common denominator (economics would probably suggest utility per unit or more accurately MU/$).

I'm not sure that really covers all, and perhaps not even the majority of cases.

It we're really comparing different sets of attributes we label A, B and C transitive preferences might well the the exception rather than the rule.

The A>B, B>C therefore A>C is often violated -- in political science that produces a voting cycle -- when considering group choices.

I just wonder if it really is correct to claim such results within one person's head, given we're comparing different things -- and so likely the use/consumption in a slightly different context as well.

Could that internal voting cycle be a source of indecision (which is a bit different that indifference) and why we will often avoid a pair-wise decision process and opt for putting all the alternatives up against the others to pick the preferred alternative?

If so would that be something that an AGI will also find naturally occurs and it is not an error to be corrected but rather a situation where applying a pair-wise choice or some transitivity check would actually be the error.


New Answer
New Comment

1 Answers sorted by

You can also protect yourself against money pumping by having vague preferences and unstable preferences. Money pumping doesn't seem to happen IRL.

Right, it does seem that we have found ways, being bounded and irrational agents, to get closer to rationality by using our boundedness to protect ourselves from our irrationality (and vice versa!).

This seems to be a case of using boundedness in the form of not being precise, maintaining uncertainty that is not resolved until the last moment, and also probably exhaustion (if you try to lead me through a pump after a few steps I'll give up before you can take too much advantage of me) to avoid bad results of maximizing on irrational preferences.

The opp... (read more)

4 comments, sorted by Click to highlight new comments since: Today at 1:22 PM

Here's a rather out-there hypothesis.

I'm sure many LessWrong members have had the experience of arguing some point piecemeal, where they've managed to get weak agreement on every piece of the argument, but as soon as they step back and point from start to end their conversation partner ends up less than convinced. In this sense, in humans even implication isn't transitive. Mathematics is an example with some fun tales I'm struggling to find sources for, where pre-mathematical societies might have people unwilling to trade two of A for two of B, but happy to trade A for B twice, or other such oddities.

It's plausible to me that the need for consistent models of the world only comes about as intelligence grows and allows people to arbitrage value between these different parts of their thoughts. Early humans and their lineage before that weren't all that smart, so it makes sense that evolution didn't force their beliefs to be consistent all that much—as long as it was locally valid, it worked. As intelligence evolved, occasionally certain issues might crop up, but rather than fixing the issue in a fundamental way, which would be hard, minor kludges were put in place.

For example, I don't like being exploited. If someone leads me around a pump, I'm going to value the end state less than its ‘intrinsic’ value. You can see this behaviour a lot in discussions of trolley problem scenarios: people take objection to having these thoughts traded off against each other to the degree it often overshadows the underlying dilema. Similarly, I find gambling around opinions intrinsically uncomfortable, and notice that fairly frequently people take objection to me asking them to more precisely quantify their claims, even in cases where I'm not staking an opposing claim. Finally, since some people are better at sounding convincing than I am, it's completely reasonable to reject some things more broadly because of the possibility the argument is an exploit—this is epistemic learned helplessness, sans ‘learned’.

There are other explanations for all the above, so this is hardly bulletproof, but I think there is merit to considering evolved defenses to exploitation that don't involve being exploit-free, as well as whether there is any benefit to something of this form. Behaviours that avoid and back away from these exploits seem fairly obvious places to look into. One could imagine (sketchily, non-endorsingly) an FAI built on these principles, so that even without a bulletproof utility function, the AI would still avoid self-exploit.

You give me two things to think about now. Your comment about intelligence and how models fit in.

It could be that as intelligence grows (and I'm using intelligence loosely to include both raw analytic capacity and knowledge and information) we become better at distilling those bundles of attributes we labeled A, B and C into a common denominator. But I also can see that working the other way too -- we gain an ability to further differentiate alternatives so see more intransitive relationships.

I wonder if there are settings -- not sure if that would be specific to the characteristics of the alternatives or characteristics relating to uses of the alternatives (means-driven versus ends-driven) -- where we might predict which of the two paths would be taken.

Since we only interact with the external world via the models in our head I am now wondering about the relationship between the consistency of the models and the consistency of the decision or observed behavior/choices. But this is even less thought out than my original questions so think I stop at that.


Can you define "rational/consistent"? The terms are a bit overloaded, especially in this community, and making your definition precise is itself most of the answer to your question.

For example, you give some good examples of nontransitive decider-actors, and if some of them are "rational", then nontransitive preferences can be rational, as you point out. Alternatively, one definition of "consistent" is that a decider-actor will always reach the same decision when it has the same information, regardless of what other options it has previously rejected, which requires transitive preferences.

Thought I had replied ... but no seeing that now.

You are showing me an error in casual thought and speech I have. I should not link the two terms as I did -- but do carry seem to carry the two concepts around in my head largely in the same bucket as it were. I should stop doing that!

Thanks!

I really should have just stayed with the question of consistency and if transitivity was really a sufficient condition to suggest inconsistency was present -- which seemed implied in the comment I read sparking the thought.