(This started as a reply to Gary Drescher's comment here in which he proposes a Metacircular Decision Theory (MCDT); but it got way too long so I turned it into an article, which also contains some amplifications on TDT which may be of general interest.)

*Part 1:* How timeless decision theory does under the sort of problems that Metacircular Decision Theory talks about.

Say we have an agent embodied in the universe. The agent knows some facts about the universe (including itself), has an inference system of some sort for expanding on those facts, and has a preference scheme that assigns a value to the set of facts, and is wired to select an action--specifically, the/an action that implies (using its inference system) the/a most-preferred set of facts.

But without further constraint, this process often leads to a contradiction. Suppose the agent's repertoire of actions is A1, ...An, and the value of action Ai is simply i. Say the agent starts by considering the action A7, and dutifully evaluates it as 7. Next, it contemplates the action A6, and reasons as follows: "Suppose I choose A6. I know I'm a utility-maximizing agent, and I already know there's another choice that has value 7. Therefore, if follows from my (hypothetical) choice of A6 that A6 has a value of at least 7." But that inference, while sound, contradicts the fact that A6's value is 6.

This is why timeless decision theory is a causality-based decision theory. I don't recall if you've indicated that you've studied Pearl's synthesis of Bayesian networks and causal graphs(?) (though if not you should be able to come up to speed on them pretty quickly).

So in the (standard) formalism of causality - just causality, never mind decision theory as yet - causal graphs give us a way to formally compute counterfactuals: We set the value of a particular node *surgically*. This means we *delete *the structural equations that would ordinarily give us the value at the node N_i as a function of the parent values P_i and the background uncertainty U_i at that node (which U_i must be uncorrelated to all other U, or the causal graph has not been fully factored). We delete this structural equation for N_i and make N_i parentless, so we don't send any likelihood messages up to the former parents when we update our knowledge of the value at N_i. However, we do send prior-messages from N_i to all of *its* descendants, maintaining the structural equations for the children of which N_i is a parent, and their children, and so on.

That's the standard way of computing counterfactuals in the Pearl/Spirtes/Verma synthesis of causality, as found in "Causality: Models, Reasoning, and Inference" and "Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference".

Classical causal decision theory says that your expected utility formula is over the *counterfactual *expectation of your *physical* act. Now, although the CDTs I've read have *not *in fact talked about Pearl - perhaps because it's a relatively recent mathematical technology, or perhaps because I last looked into the literature a few years back - and have just taken the counterfactual distribution as intuitively obvious mana rained from heaven - nonetheless it's pretty clear that their intuitions are operating pretty much the Pearlian way, via counterfactual surgery on the physical act.

So in *calculating *the "expected utility" of an act - the computation that classical CDT uses to *choose *an action - CDT assumes the act to be *severed from its physical causal parents*. Let's say that there's a Smoking Lesion problem, where the same gene causes a taste for cigarettes and an increased probability of cancer. Seeing someone else smoke, we would infer that they have an increased probability of cancer - this sends a likelihood-message upward to the node which represents the probability of having the gene, and this node in turns sends a prior-message downward to the node which represents the probability of getting cancer. But the counterfactual surgery that CDT performs on its physical acts, means that it calculates the expected utility as though the physical act is severed from its parent nodes. So CDT calculates the expected utility as though it has the base-rate probability of having the cancer gene regardless of its act, and so chooses to smoke, since it likes cigarettes. This is the common-sense and reflectively consistent action, so CDT appears to "win" here in terms of giving the winning answer - but it's worth noting that the *internal *calculation performed is *wrong*; if you act to smoke cigarettes, your probability of getting cancer is *not *the base rate.

And on Newcomb's Problem this internal error comes out into the open; the inside of CDT's counterfactual expected utility calculation, expects box B to contain a million dollars at the base rate, since it surgically severs the act of taking both boxes from the parent variable of your source code, which correlates to your previous source code at the moment Omega observed it, which correlates to Omega's decision whether to leave box B empty.

Now turn to timeless decision theory, in which the (Godelian diagonal) expected utility formula is written as follows:

Argmax[A in Actions] in Sum[O in Outcomes](Utility(O)*P(

this computationyields A []-> O|rest of universe))

The interior of this formula performs counterfactual surgery to sever the *logical output *of the expected utility formula, from the *initial conditions *of the expected utility formula. So we do *not *conclude, *in the inside of the formula as it performs the counterfactual surgery*, that if-counterfactually A_6 is chosen over A_7 then A_6 must have higher expected utility. If-evidentially A_6 is chosen over A_7, then A_6 has higher expected utility - but this is not what the interior of the formula computes. As we *compute* the formula, the logical output is divorced from all parents; we cannot infer anything about its immediately logical precedents. This counterfactual surgery may be *necessary*, in fact, to stop an infinite regress in the formula, as it tries to model its own output in order to decide its own output; and this, arguably, is exactly *why *the decision counterfactual has the form it does - it is *why *we have to talk about counterfactual surgery within decisions in the first place.

*Descendants *of the logical output, however, continue to update their values within the counterfactual, which is why TDT one-boxes on Newcomb's Problem - both your current self's physical act, and Omega's physical act in the past, are logical-causal *descendants *of the computation, and are recalculated accordingly inside the counterfactual.

If you desire to smoke cigarettes, this would be observed and screened off by conditioning on the *fixed initial conditions *of the computation - the fact that the utility function had a positive term for smoking cigarettes, would already tell you that you had the gene. (Eells's "tickle".) If you can't observe your own utility function then you are actually taking a step outside the timeless decision theory as formulated.

So from the perspective of Metacircular Decision Theory - what is done with various facts - timeless decision theory can state very definitely how it treats the various facts, within the interior of its expected utility calculation. It does not *update *any physical or logical parent of the logical output - rather, it *conditions* on the initial state of the computation, in order to screen off outside influences; then no further inferences about them are made. And if you already know anything about the consequences of your logical output - its descendants in the logical causal graph - you will *re*compute what they *would have been* if you'd had a different output.

This last codicil is important for cases like Parfit's Hitchhiker, in which Omega (or perhaps Paul Ekman), driving a car through the desert, comes across yourself dying of thirst, and will give you a ride to the city only if they expect you to pay them $100 *after* you arrive in the city. (With the whole scenario being trued by strict selfishness, no knock-on effects, and so on.) There is, of course, no way of forcing the agreement - so will you compute, *in the city*, that it is *better for you* to give $100 to Omega, after having *already *been saved? Both evidential decision theory and causal decision theory will give the losing (dying in the desert, hence reflectively inconsistent) answer here; but TDT answers, "*If I had decided not to pay,* then Omega *would have* left me in the desert." So the expected utility of not paying $100 remains lower, *even after you arrive in the city,* given the way TDT computes its counterfactuals inside the formula - which is the dynamically and reflectively consistent and winning answer.. And note that this answer is arrived at in one natural step, without needing explicit reflection, let alone precommitment - you will answer this way even if the car-driver Omega made its prediction without you being aware of it, so long as Omega can credibly establish that it was predicting you with reasonable accuracy rather than making a pure uncorrelated guess. (And since it's not a very complicated calculation, Omega knowing that you are a timeless decision theorist is credible enough.)

I wonder if it might be open to the criticism that you're effectively postulating the favored answer to Newcomb's Problem (and other such scenarios) by postulating that when you surgically alter one of the nodes, you correspondingly alter the nodes for the other instances of the computation.

This is where one would refer to the omitted extended argument about a calculator on Mars and a calculator on Venus, where both calculators were manufactured at the same factory on Earth and observed before being transported to Mars and Venus. If we manufactured two envelopes on Earth, containing the same letter, and transported them to Mars and Venus without observing them, then indeed the contents of the two envelopes would be correlated in our probability distribution, even though the Mars-envelope is not a cause of the Venus-envelope, nor the Venus-envelope a cause of the Mars-envelope, because they have a common cause in the background. But if we *observe *the common cause - look at the message as it is written, before being Xeroxed and placed into the two envelopes - then the standard theory of causality *requires *that our remaining uncertainty about the two envelopes be *uncorrelated*; we have observed the common cause and screened it off. If N_i is not a cause of N_j or vice versa, and you *know *the state of all the common ancestors A_ij of N_i and N_j, and you do *not *know the state of any mutual descendants D_ij of N_i and N_j, then the standard rules of causal graphs (D-separation) show that your probabilities at N_i and N_j must be independent.

However, if you manufacture on Earth two calculators both set to calculate 123 * 456, and you have not yet performed this calculation in your head, then you can *observe completely the physical state of the two calculators* before they leave Earth, and yet still have *correlated *uncertainty about what result will flash on the screen on Mars and the screen on Venus. So this situation is simply *not *compatible with the mathematical axioms on causal graphs if you draw a causal graph in which the only common ancestor of the two calculators is the physical factory that made them and produced their correlated initial state. If you are to preserve the rules of causal graphs at all, you must have an additional node - which would logically seem to represent one's logical uncertainty about the abstract computation 123 * 456 - which is the parent of both calculators. Seeing the Venusian calculator flash the result 56,088, this physical event sends a likelihood-message to its parent node representing the logical result of 123 * 456, which sends a prior-message to its child node, the physical message flashed on the screen at Mars.

A similar argument shows that if we have completely observed our own *initial *source code, and perhaps observed Omega's *initial* source code which contains a copy of our source code and the intention to simulate it, but we do not yet know our own decision, then the only way in which our uncertainty about our own physical act can possibly be correlated *at all *with Omega's past act to fill or leave empty the box B - given that neither act physically causes the other - is if there is some common ancestor node unobserved; and having already seen that our causal graph must include logical uncertainty if it is to stay factored, we can (must?) interpret this unobserved common node as the logical output of the known expected utility calculation.

From this, I would argue, TDT follows. But of course it's going to be difficult to exhibit an algorithm that computes this - guessing unknown causal networks is an extremely difficult problem in machine learning, and only small such networks can be learned. In general, determining the causal structure of reality is AI-complete. And by interjecting logical uncertainty into the problem, we really are heading far beyond the causal networks that known machine algorithms can *learn.* But it *is* the case that if you rely on humans to learn the causal algorithm, then it is pretty clear that the Newcomb's Problem setup, if it is to be analyzed in causal terms at all, must have nodes corresponding to logical uncertainty, on pain of violating the axioms governing causal graphs. Furthermore, in being told that Omega's leaving box B full or empty correlates to our *decision* to take only one box or both boxes, *and* that Omega's act lies in the past, *and* that Omega's act is not directly influencing us, *and *that we have not found any other property which would screen off this uncertainty even when we inspect our own source code / psychology in advance of knowing our actual decision, *and* that our computation is the only *direct* ancestor of our logical output, then we're being told in unambiguous terms (I think) to make our own physical act and Omega's act a common descendant of the unknown logical output of our known computation. (A counterexample in the form of another causal graph compatible with the same data is welcome.) And of course we could make the problem very clear by letting the agent be a computer program and letting Omega have a copy of the source code with superior computing power, in which case the logical interpretation is very clear.

So these are the facts which TDT takes into account, and the facts which it ignores. The Nesov-Dai updateless decision theory is even stranger - as far as I can make out, it ignores *all* facts except for the fact about which inputs have been received by the logical version of the computation it implements. If combined with TDT, we would interpret UDT as having a never-updated weighting on all possible universes, and a causal structure (causal graph, presumably) on those universes. Any given logical computation in UDT will count all instantiations of itself in all universes which have received exactly the same inputs - even if those instantiations are being imagined by Omega in universes which UDT would ordinarily be interpreted as "known to be logically inconsistent", like universes in which the third decimal digit of pi is 3. Then UDT calculates the counterfactual consequences, weighted across all imagined universes, using its causal graphs on each of those universes, of setting the logical act to A_i. Then it maximizes on A_i.

I would ask if, applying Metacircular Decision Theory from a "common-sense human base level", you see any case in which additional facts should be taken into account, or other facts ignored, apart from those facts used by TDT (UDT). If not, and if TDT (UDT) are reflectively consistent, then TDT (UDT) is the fixed point of MCDT starting from a human baseline decision theory. Of course this can't actually be the case because TDT (UDT) are incomplete with respect to the open problems cited earlier, like logical ordering of moves, and choice of conditional strategies in response to conditional strategies. But it would be the way I'd pose the problem to you, Gary Drescher - MCDT is an interesting way of looking things, but I'm still trying to wrap my mind around it.

*Part 2: Metacircular Decision Theory as reflection criterion.*

MCDT's proposed criterion is this: the agent makes a meta-choice about which facts to omit when making inferences about the hypothetical actions, and selects the set of facts which lead to the best outcome if the agent then evaluates the original candidate actions with respect to that choice of facts. The agent then iterates that meta-evaluation as needed (probably not very far) until a fixed point is reached, i.e. the same choice (as to which facts to omit) leaves the first-order choice unchanged. (It's ok if that's intractable or uncomputable; the agent can muddle through with some approximate algorithm.)

...In other words, metacircular consistency isn't just a

testthat we'd like the decision theory to pass. Metacircular consistencyisthe theory; itisthe algorithm.

But it looks to me like MCDT has to start from some particular base theory, and different base theories may have different fixed points (or conceivably, cycles). In which case we can't yet call MCDT itself a complete theory specification. When you talk about which facts *would* be wise to take into account, or ignore, (or recompute counterfactually even if they already have known values?), then you're imagining different source codes (or MCDT specifications?) that an agent could have; and calculating the benefits of adopting these different source codes, relative to the way the *current *base theory computes "adopting" and "benefit"

For example, if you start with CDT and apply MCDT at 7am, it looks to me like "use TDT (UDT) for all cases where my source code has a physical effect after 7am, and use CDT for all cases where the source code had a physical effect before 7am or a correlation stemming from common ancestry" is a reflectively stable fixed point of MCDT. Whenever CDT asks "*What if* I took into account these different facts?", it will say, "But Omega would not be physically affected by my self-modification, so clearly it can't benefit me in any way." If the MCDT criterion is to be applied in a different and intuitively appealing way that has only one fixed point (up to different utility functions) then this would establish MCDT as a good candidate for *the* decision theory, but right now it does look to me like *a* reflective consistency test. But maybe this is because I haven't yet wrapped my mind around the MCDT's fact-treatment-based decomposition of decision theories, or because you've already specified further mandatory structure in the base theory how the *effect of* ignoring or taking into account some particular fact is to be computed.

Thanks, Eliezer--that's a clear explanation of an elegant theory. So far, TDT (I haven't looked carefully at UDT) strikes me as more promising than any other decision theory I'm aware of (including my own efforts, past and pending). Congratulations are in order!

I agree, of course, that TDT doesn't make the A6/A7 mistake. That was just a simple illustration of the need, in counterfactual reasoning (broadly construed), to specify somehow what to hold fixed and what not to, and that different ways of doing so specify different senses of counterfactual inference (i.e., that there are different kinds of 'if-counterfactually'). If counterfactual inference is construed a la Pearl, for example, then such inferences (causal-counterfactual) correspond to causal links (if-causally).

As you say, TDT's utility formula doesn't perform general logical inferences (or evidential-counterfactual inferences) from the antecedents it evaluates (i.e. the candidate outputs of the Platonic computation). Rather, the utility formula performs causal-counterfactual inferences from the set of nodes that designate the outputs of the Platonic computation, in all places where that Platonic computation is approximately physically instantiated.

However, it seems to me we can, if we wish, use TDT to define what we can call a TDT-counterfactual that tells us would be true 'if-timelessly' a particular physical agent's particular physical action were to occur. In particular, whereas CDT says that what would be true (if-causally) consists of what's causally downstream from that action, TDT says that what would be true (if-timelessly) consists of what's causally downstream from the output of the suitably-specified Platonic computation that the particular physical agent approximately implements, and also what's causally downstream from that same Platonic computation in all other places where that computation is approximately physically instantiated. (And the physical TDT agent argmaxes over the utilities of the TDT-counterfactual consequences of that agent's candidate actions.)

I think there are a few reasons we might sometimes find it useful to think in terms of the TDT-counterfactual consequences of a physical agent's actions, rather than directly in terms of the standard TDT formulation (even though they're merely two different ways of expressing the same decision theory, unless I've misunderstood).

The TDT-counterfactual perspective places TDT in a common framework with other decision theories that (implicitly or explicitly) use other kinds of counterfactual reasoning, starting with a physical agent's action as the antecedent. Then we can apply some meta-criterion to ask which of those alternative theories is correct, and why. (That was the intuition behind my MCDT proposal, although MCDT itself was hastily specified and too simpleminded to be correct.)

Plausibly, people are agents who think in terms of the counterfactual consequences of an action, rather than being hardwired to use TDT. If we are to choose to act in accordance with TDT from now on (or, equivalently, if we are to build AIs who act in accordance with TDT), we need to be persuaded that doing so is for the best (even if e.g. a Newcomb snapshot was already taken before we became persuaded). (I'm assuming here that our extant choice machinery allows us the flexibility to be persuaded about what sort of counterfactual to use; if not, alas, we can't necessarily get there from here).

In the standard formulation of TDT, you effectively view yourself as an abstract computation with one or more approximate physical instantiations, and you ask what you (thus construed) cause (i.e. what follows causal-counterfactually). In the alternative formulation, I view myself as a particular physical agent that is among one or more approximate instantiations of an abstract computation, and I ask what follows TDT-counterfactually from what I (thus construed) choose.

The original formulation seems to require a precommitment to identify oneself with all instantiations (in the causal net) of the abstract computation (or at least seems to require that in order for us non-TDT agents to decide to emulate TDT). And that identification is indeed plausible in the case of fairly exact replication. But consider, say, a 1-shot PD game between Eliezer and me. Our mutual understanding of reflexive consistency would let us win. And I agree that we both approximately instantiate, at some level of abstraction, a common decision computation, which is what lets the TDT framework apply and lets us both win.

But (in contrast with an exact-simulation case) that common computation is at a level of abstraction that does not preserve our respective personal identities. (That's kind of the point of the abstraction. My utility function for the game places value on Gary's points and not Eliezer's points; the common abstract computation lacks that bias.) So I would hesitate to identify either of us with the common abstraction. (And I see in other comments that Eliezer explicitly agrees.) Rather, I'd like to reason that if-timelessly I, Gary, choose 'Cooperate', then so does Eliezer. That way, "I am you as you are me" emerges as a (metaphorical) conclusion about the situation (we each have a choice about the other's action in the game, and are effectively acting together) rather than being needed as the point of departure.

Again, the foregoing is just an alternative but equivalent (unless I've erred) way of viewing TDT, an alternative that may be useful for some purposes.