Formalizing reflective inconsistency

byJohnicholas9y13th Sep 200913 comments


In the post Outlawing Anthropics, there was a brief and intriguing scrap of reasoning, which used the principle of reflective inconsistency, which so far as I know is unique to this community:

If your current system cares about yourself and your future, but doesn't care about very similar xerox-siblings, then you will tend to self-modify to have future copies of yourself care about each other, as this maximizes your expectation of pleasant experience over future selves.

This post expands upon and attempts to formalize that reasoning, in hopes of developing a logical framework for reasoning about reflective inconsistency.

In diagramming and analyzing this, I encountered a difficulty. There are probably many ways to resolve it, but in resolving it, I basically changed the argument. You might have reasonably chosen a different resolution. Anyway, I'll explain the difficulty and where I ended up.

The difficulty: The text "...maximizes your expectation of pleasant experience over future selves.". How would you compute expectation of pleasant experience? It ought to depend intensely on the situation. For example, a flat future, with no opportunity to influence my experience or that of my sibs for better or worse, would argue that caring for sibs has exactly the same expectation as not-caring. Alternatively, if a mad Randian was experimenting on me, rewarding selfishness, not-caring for my sibs might well have more pleasant experiences than caring. Also, I don't know how to compute with experiences - Total Utility, Average Utility, Rawlsian Minimum Utility, some sort of multiobjective optimization? Finally, I don't know how to compute with future selves. For example, imagine some sort of bicameral cognitive architecture, where two individuals have exactly the same percepts (and therefore choose exactly the same actions). Should I count that as one future self or two?

To resolve this, I replace EY's reason with an argument from analogy, like so:

If your current system cares about yourself and your future, but doesn't care about very similar xerox-siblings, then you will tend to self-modify to have future copies of yourself care about each other, for the same reasons that the process of evolution created kin altruism.

Here is the same argument again, "expanded". Remember, the primary reason to expand it is not readability - the expanded version is certainly less readable - it is as a step towards a generally applicable scheme for reasoning using the principle of reflective inconsistency.

At first glance, the mechanism of natural selection seems to explain selfish, but not unselfish behavior. However, the structure of the EEA seems to have offered sufficient opportunities for kin to recognize kin with low-enough uncertainty and assist (with small-enough price to the helper and large-enough benefit to the helped) that unselfish entities do outcompete purely selfish ones. Note that the policy of selfishness is sufficiently simple that it was almost certainly tried many times. We believe that unselfishness is still a winning strategy in the present environment, and will continue to be a winning strategy in the future.

The two policies, caring about sibs or not-caring, do in fact behave differently in the EEA, and so they are incompatible - we cannot behave according to both policies at once. Also, since caring about sibs outcompetes not-caring in the EEA, if a not-caring agent, X, were selecting a proxy (or "future self") to compete in an EEA-tournament to for utilons (or paperclips), X would pick a caring agent as proxy. The policy of not-caring would choose to delegate to an incompatible policy. This is what "reflectively inconsistent" means. Given a particular situation S1, one can always construct another situation S2 where the choices available in S2 correspond to policies to send as proxies into S1. One might understand the new situation as having an extra "self-modification" or "precommitment" choice point at the beginning. If a policy chooses an incompatible policy as its proxy, then that policy is "reflectively inconsistent" on that situation. Therefore, not-caring is reflectively inconsistent on the EEA.

The last step to the conclusion is less interesting than the part about reflective inconsistency. The conclusion is something like: "Other things being equal, prefer caring about sibs to not-caring".

Enough handwaving - to the code! My (crude) formalization is written in Automath, and to check my proof, the command (on GNU/Linux) is something like:

aut reflective_inconsistency_example.aut