This is very cool, and I haven't digested it yet, but I wonder if it might be open to the criticism that you're effectively postulating the favored answer to Newcomb's Problem (and other such scenarios) by postulating that when you surgically alter one of the nodes, you correspondingly alter the nodes for the other instances of the computation. After all, the crux of the counterfactual-reasoning dilemma in Newcomb's Problem (and similarly in the Prisoner's Dilemma) is to jusftify the inference "If I choose both boxes, then (probably) so does the simulation (even if in fact I/it do not)" rather than "If I choose both boxes, then the simulation doesn't necessarily match my choice (even though in fact it does)". It could be objected that your formalism postulates the desired answer rather than giving a basis for deriving it--an objection that becomes more important when we move away from identical or functionally equivalent source code and start to consider approximate similarities. (See my criticism of Leslie (1991)'s proposal that you should make your choice as though you were also choosing on behalf of other agents of similar causal structure. If I'm not mistaken, your proposal seems to be a formalization of that idea.)

Here's an alternative proposal.

Metacircular Decision Theory (MCDT)

For purposes of this discussion, let me just stipulate that subjective probabilities will be modeled as though they were quantum under MWI--that is, we'll regard the entire distribution as part of the universe. That move will help with dual-simulation/counterfactual-mugging scenarios; but also, as I argued in Good and Real, we effectively make that move whenever we assign value to probabilistic outcomes even in nonesoteric situations (so we may as well avail ourselves of that move in the weird scenarios too, though eventually we need to justify the move).

Say we have an agent embodied in the universe. The agent knows some facts about the universe (including itself), has an inference system of some sort for expanding on those facts, and has a preference scheme that assigns a value to the set of facts, and is wired to select an action--specifically, the/an action that implies (using its inference system) the/a most-preferred set of facts.

But without further constraint, this process often leads to a contradiction. Suppose the agent's repertoire of actions is A1, ...An, and the value of action Ai is simply i. Say the agent starts by considering the action A7, and dutifully evaluates it as 7. Next, it contemplates the action A6, and reasons as follows: "Suppose I choose A6. I know I'm a utility-maximizing agent, and I already know there's another choice that has value 7. Therefore, if follows from my (hypothetical) choice of A6 that A6 has a value of at least 7." But that inference, while sound, contradicts the fact that A6's value is 6.

Unsurprisingly, a false premise leads to a contradiction. To avoid contradiction, we need to limit the set of facts that the agent is allowed to reason from when making inferences about a hypothetical action. But which facts do we omit? Different choices yield different preferred actions. If we omit the fact that val(A6)=6, then we can infer val(A6)>=7; if instead we omit the fact that the agent utility-maximizes, then we can infer val(A6)=6 without contradiction (or at least without the particular contradiction above).

So this is the usual full-blown problem of counterfactual inference: which things do we "hold fixed" when contemplating a counterfactual antecedent, and which do we "let vary" for consistency with that antecedent? Different choices here correspond to different decision theories. If the agent allows inferences (only) from all facts about physical law as applied to the future, and all facts about the past and present universe-state, except for facts about the agent's internal decision-making state, then we get CDT. If we leave the criteria unspecified/ambiguous, we get EDT. If we allow the agent to reason from facts about the future as well as the past and present, we get FDT (Fatalist Decision Theory: choice is futile, which most people think follows from determinism).

MCDT's proposed criterion is this: the agent makes a meta-choice about which facts to omit when making inferences about the hypothetical actions, and selects the set of facts which lead to the best outcome if the agent then evaluates the original candidate actions with respect to that choice of facts. The agent then iterates that meta-evaluation as needed (probably not very far) until a fixed point is reached, i.e. the same choice (as to which facts to omit) leaves the first-order choice unchanged. (It's ok if that's intractable or uncomputable; the agent can muddle through with some approximate algorithm.)

EDIT1: The algorithm also needs to check, when it evaluates a meta-level choice candidate, that the winning choice at the next level down is consistent with all known facts. If not, the meta-level candidate is eliminated from consideration. (Otherwise, the A6 choice could remain stable in the example above.)

EDIT2: Or rather, that consistency check can probably substitute for the additional meta-iterations.

So e.g. in Newcomb's Problem or the Prisoner's Dilemma, the agent can calculate that it does better if it retains the fact that its dispositional-state/source-code is functionally equivalent to the simulation's/other's (but omits facts about which particular choice is made by both) than if it makes the CDT choice and omits the fact about equivalence, but keeps the facts about the simulation's/other's choice (or keeps some probability distribution about the simulation's/other's choice).

In other words, metacircular consistency isn't just a test that we'd like the decision theory to pass. Metacircular consistency is the theory; it is the algorithm.

1Gary_Drescher11yTo clarify: the agent in MCDT is a particular physical instantiation, rather than being timeless/Platonic (well, except insofar as physics itself is Platonic).

Ingredients of Timeless Decision Theory

by Eliezer Yudkowsky 7 min read19th Aug 2009226 comments


Followup toNewcomb's Problem and Regret of Rationality, Towards a New Decision Theory

Wei Dai asked:

"Why didn't you mention earlier that your timeless decision theory mainly had to do with logical uncertainty? It would have saved people a lot of time trying to guess what you were talking about."


All right, fine, here's a fast summary of the most important ingredients that go into my "timeless decision theory".  This isn't so much an explanation of TDT, as a list of starting ideas that you could use to recreate TDT given sufficient background knowledge.  It seems to me that this sort of thing really takes a mini-book, but perhaps I shall be proven wrong.

The one-sentence version is:  Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.

The three-sentence version is:  Factor your uncertainty over (impossible) possible worlds into a causal graph that includes nodes corresponding to the unknown outputs of known computations; condition on the known initial conditions of your decision computation to screen off factors influencing the decision-setup; compute the counterfactuals in your expected utility formula by surgery on the node representing the logical output of that computation.

To obtain the background knowledge if you don't already have it, the two main things you'd need to study are the classical debates over Newcomblike problems, and the Judea Pearl synthesis of causality.  Canonical sources would be "Paradoxes of Rationality and Cooperation" for Newcomblike problems and "Causality" for causality.

For those of you who don't condescend to buy physical books, Marion Ledwig's thesis on Newcomb's Problem is a good summary of the existing attempts at decision theories, evidential decision theory and causal decision theory.  You need to know that causal decision theories two-box on Newcomb's Problem (which loses) and that evidential decision theories refrain from smoking on the smoking lesion problem (which is even crazier).  You need to know that the expected utility formula is actually over a counterfactual on our actions, rather than an ordinary probability update on our actions.

I'm not sure what you'd use for online reading on causality.  Mainly you need to know:

  • That a causal graph factorizes a correlated probability distribution into a deterministic mechanism of chained functions plus a set of uncorrelated unknowns as background factors.
  • Standard ideas about "screening off" variables (D-separation).
  • The standard way of computing counterfactuals (through surgery on causal graphs).

It will be helpful to have the standard Less Wrong background of defining rationality in terms of processes that systematically discover truths or achieve preferred outcomes, rather than processes that sound reasonable; understanding that you are embedded within physics; understanding that your philosophical intutions are how some particular cognitive algorithm feels from inside; and so on.

The first lemma is that a factorized probability distribution which includes logical uncertainty - uncertainty about the unknown output of known computations - appears to need cause-like nodes corresponding to this uncertainty.

Suppose I have a calculator on Mars and a calculator on Venus.  Both calculators are set to compute 123 * 456.  Since you know their exact initial conditions - perhaps even their exact initial physical state - a standard reading of the causal graph would insist that any uncertainties we have about the output of the two calculators, should be uncorrelated.  (By standard D-separation; if you have observed all the ancestors of two nodes, but have not observed any common descendants, the two nodes should be independent.)  However, if I tell you that the calculator at Mars flashes "56,088" on its LED display screen, you will conclude that the Venus calculator's display is also flashing "56,088".  (And you will conclude this before any ray of light could communicate between the two events, too.)

If I was giving a long exposition I would go on about how if you have two envelopes originating on Earth and one goes to Mars and one goes to Venus, your conclusion about the one on Venus from observing the one on Mars does not of course indicate a faster-than-light physical event, but standard ideas about D-separation indicate that completely observing the initial state of the calculators ought to screen off any remaining uncertainty we have about their causal descendants so that the descendant nodes are uncorrelated, and the fact that they're still correlated indicates that there is a common unobserved factor, and this is our logical uncertainty about the result of the abstract computation.  I would also talk for a bit about how if there's a small random factor in the transistors, and we saw three calculators, and two showed 56,088 and one showed 56,086, we would probably treat these as likelihood messages going up from nodes descending from the "Platonic" node standing for the ideal result of the computation - in short, it looks like our uncertainty about the unknown logical results of known computations, really does behave like a standard causal node from which the physical results descend as child nodes.

But this is a short exposition, so you can fill in that sort of thing yourself, if you like.

Having realized that our causal graphs contain nodes corresponding to logical uncertainties / the ideal result of Platonic computations, we next construe the counterfactuals of our expected utility formula to be counterfactuals over the logical result of the abstract computation corresponding to the expected utility calculation, rather than counterfactuals over any particular physical node.

You treat your choice as determining the result of the logical computation, and hence all instantiations of that computation, and all instantiations of other computations dependent on that logical computation.

Formally you'd use a Godelian diagonal to write:

Argmax[A in Actions] in Sum[O in Outcomes](Utility(O)*P(this computation yields A []-> O|rest of universe))

(where P( X=x []-> Y | Z ) means computing the counterfactual on the factored causal graph P, that surgically setting node X to x, leads to Y, given Z)

Setting this up correctly (in accordance with standard constraints on causal graphs, like noncircularity) will solve (yield reflectively consistent, epistemically intuitive, systematically winning answers to) 95% of the Newcomblike problems in the literature I've seen, including Newcomb's Problem and other problems causing CDT to lose, the Smoking Lesion and other problems causing EDT to fail, Parfit's Hitchhiker which causes both CDT and EDT to lose, etc.

Note that this does not solve the remaining open problems in TDT (though Nesov and Dai may have solved one such problem with their updateless decision theory).  Also, although this theory goes into much more detail about how to compute its counterfactuals than classical CDT, there are still some visible incompletenesses when it comes to generating causal graphs that include the uncertain results of computations, computations dependent on other computations, computations uncertainly correlated to other computations, computations that reason abstractly about other computations without simulating them exactly, and so on.  On the other hand, CDT just has the entire counterfactual distribution rain down on the theory as mana from heaven (e.g. James Joyce, Foundations of Causal Decision Theory), so TDT is at least an improvement; and standard classical logic and standard causal graphs offer quite a lot of pre-existing structure here.  (In general, understanding the causal structure of reality is an AI-complete problem, and so in philosophical dilemmas the causal structure of the problem is implicitly given in the story description.)

Among the many other things I am skipping over:

  • Some actual examples of where CDT loses and TDT wins, EDT loses and TDT wins, both lose and TDT wins, what I mean by "setting up the causal graph correctly" and some potential pitfalls to avoid, etc.
  • A rather huge amount of reasoning which defines reflective consistency on a problem class; explains why reflective consistency is a rather strong desideratum for self-modifying AI; why the need to make "precommitments" is an expensive retreat to second-best and shows lack of reflective consistency; explains why it is desirable to win and get lots of money rather than just be "reasonable" (that is conform to pre-existing intuitions generated by a pre-existing algorithm); which notes that, considering the many pleas from people who want, but can't find any good intermediate stage between CDT and EDT, it's a fascinating little fact that if you were rewriting your own source code, you'd rewrite it to one-box on Newcomb's Problem and smoke on the smoking lesion problem...
  • ...and so, having given many considerations of desirability in a decision theory, shows that the behavior of TDT corresponds to reflective consistency on a problem class in which your payoff is determined by the type of decision you make, but not sensitive to the exact algorithm you use apart from that - that TDT is the compact way of computing this desirable behavior we have previously defined in terms of reflectively consistent systematic winning.
  • Showing that classical CDT, given self-modification ability, modifies into a crippled and inelegant form of TDT.
  • Using TDT to fix the non-naturalistic behavior of Pearl's version of classical causality in which we're supposed to pretend that our actions are divorced from the rest of the universe - the counterfactual surgery, written out Pearl's way, will actually give poor predictions for some problems (like someone who two-boxes on Newcomb's Problem and believes that box B has a base-rate probability of containing a million dollars, because the counterfactual surgery says that box B's contents have to be independent of the action).  TDT not only gives the correct prediction, but explains why the counterfactual surgery can have the form it does - if you condition on the initial state of the computation, this should screen off all the information you could get about outside things that affect your decision; then your actual output can be further determined only by the Godel-diagonal formula written out above, permitting the formula to contain a counterfactual surgery that assumes its own output, so that the formula does not need to infinitely recurse on calling itself.
  • An account of some brief ad-hoc experiments I performed on IRC to show that a majority of respondents exhibited a decision pattern best explained by TDT rather than EDT or CDT.
  • A rather huge amount of exposition of what TDT decision theory actually corresponds to in terms of philosophical intuitions, especially those about "free will".  For example, this is the theory I was using as hidden background when I wrote in "Causality and Moral Responsibility" that factors like education and upbringing can be thought of as determining which person makes a decision - that you rather than someone else makes a decision - but that the decision made by that particular person is up to you.  This corresponds to conditioning on the known initial state of the computation, and performing the counterfactual surgery over its output.  I've actually done a lot of this exposition on OBLW without explicitly mentioning TDT, like Timeless Control and Thou Art Physics for reconciling determinism with choice (actually effective choice requires determinism, but this confuses humans for reasons given in Possibility and Could-ness).  But if you read the other parts of the solution to "free will", and then furthermore explicitly formulate TDT, then this is what utterly, finally, completely, and without even a tiny trace of confusion or dissatisfaction or a sense of lingering questions, kills off entirely the question of "free will".
  • Some concluding chiding of those philosophers who blithely decided that the "rational" course of action systematically loses; that rationalists defect on the Prisoner's Dilemma and hence we need a separate concept of "social rationality"; that the "reasonable" thing to do is determined by consulting pre-existing intuitions of reasonableness, rather than first looking at which agents walk away with huge heaps of money and then working out how to do it systematically; people who take their intuitions about free will at face value; assuming that counterfactuals are fixed givens raining down from the sky rather than non-observable constructs which we can construe in whatever way generates a winning decision theory; et cetera.  And celebrating of the fact that rationalists can cooperate with each other, vote in elections, and do many other nice things that philosophers have claimed they can't.  And suggesting that perhaps next time one should extend "rationality" a bit more credit before sighing and nodding wisely about its limitations.
  • In conclusion, rational agents are not incapable of cooperation, rational agents are not constantly fighting their own source code, rational agents do not go around helplessly wishing they were less rational, and finally, rational agents win.

Those of you who've read the quantum mechanics sequence can extrapolate from past experience that I'm not bluffing.  But it's not clear to me that writing this book would be my best possible expenditure of the required time.