According to Ingredients of Timeless Decision Theory, when you set up a factored causal graph for TDT, "You treat your choice as determining the result of the logical computation, and hence all instantiations of that computation, and all instantiations of other computations dependent on that logical computation", where "the logical computation" refers to the TDT-prescribed argmax computation (call it C) that takes all your observations of the world (from which you can construct the factored causal graph) as input, and outputs an action in the present situation.

I asked Eliezer to clarify what it means for another logical computation D to be either the same as C, or "dependent on" C, for purposes of the TDT algorithm. Eliezer answered:

For D to depend on C means that if C has various logical outputs, we can infer new logical facts about D's logical output in at least some cases, relative to our current state of non-omniscient logical knowledge. A nice form of this is when supposing that C has a given exact logical output (not yet known to be impossible) enables us to infer D's exact logical output, and this is true for every possible logical output of C. Non-nice forms would be harder to handle in the decision theory but we might perhaps fall back on probability distributions over D.

I replied as follows (which Eliezer suggested I post here).

If that's what TDT means by the logical dependency between Platonic computations, then TDT may have a serious flaw.

Consider the following version of the transparent-boxes scenario. The predictor has an infallible simulator D that predicts whether I one-box here [EDIT: if I see $1M]. The predictor also has a module E that computes whether the ith digit of pi is zero, for some ridiculously large value of i that the predictor randomly selects. I'll be told the value of i, but the best I can do is assign an a priori probability of .1 that the specified digit is zero.

...reasoning under logical uncertainty using limited computing power... is another huge unsolved open problem of AI. Human mathematicians had this whole elaborate way of believing that the Taniyama Conjecture implied Fermat's Last Theorem at a time when they didn't know whether the Taniyama Conjecture was true or false; and we seem to treat this sort of implication in a rather different way than '2=1 implies FLT', even though the material implication is equally valid.

*Good and Real*) are: the predictor conducts a simulation that tentatively presumes there will be $1M in the large box, and then puts $1M in the box (for real) iff that simulation showed one-boxing. Thus, if the large box turns out to be

*empty*, there is no requirement for that to be predictive of the agent's choice under those circumstances. The present variant is the same, except that (D xor E) determines the $1M, instead of just D. (Sorry, I should have said this to begin with, instead of assuming it as background knowledge.)

That's very elegant! But the trick here, it seems to me, lies in the rules for setting up the world program in the first place.

First, the world-program's calling tree should match the structure of TDT's graph, or at least match the graph's (physically-)causal links. The physically-causal part of the structure tends to be uncontroversial, so (for present purposes) I'm ok with just stipulating the physical structure for a given problem.

But then there's the choice to use the same variable S in multiple places in the code. That corresponds to a choice (in TDT) to splice in a logical-dependency link from the Platonic decision-computation node to other Platonic nodes. In both theories, we need to be precise about the criteria for this dependency. Otherwise, the sense of dependency you're invoking might turn out to be wrong (it makes the theory prescribe incorrect decisions) or question-begging (it implicitly presupposes an answer to the key question that the theory itself is supposed to figure out for us, namely what things are or are not counterfactual consequences of the decision-computation).

So the question, in UDT1, is: under what circumstances do you represent two real-world computations as being tied together via the same variable in a world-program?

That's perhaps straightforward if S is implemented by literally the same physical state in multiple places. But as you acknowledge, you might instead have distinct Si's that diverge from one another for some inputs (though not for the actual input in this case). And the different instances need not have the same physical substrate, or even use the same algorithm, as long as they give the same answers when the relevant inputs are the same, for some mapping between the inputs and between the outputs of the two Si's. So there's quite a bit of latitude as to whether to construe two computations as "logically equivalent".

So, for example, for the conventional transparent-boxes problem, what principle tells us to formulate the world program as you proposed, rather than having:

(along with a similar program P2 that uses constant S2, yielding a different output from Omega_Predict)?

This alternative formulation ends up telling us to two-box. In this formulation, if S and S1 (or S and S2) are in fact the same, they would (counterfactually) differ if a different answer (than the actual one) were output from S—which is precisely what a causalist asserts. (A similar issue arises when deciding what facts to model as “inputs” to S—thus forbidding S to “know” those facts for purposes of figuring out the counterfactual dependencies—and what facts to build instead into the structure of the world-program, or to just leave as implicit background knowledge.)

So my concern is that UDT1 may covertly beg the question by selecting, among the possible formulations of the world-program, a version that turns out to presuppose an answer to the very question that UDT1 is intended to figure out for us (namely, what counterfactually depends on the decision-computation). And although I agree that the formulation you've selected in this example is correct and the above alternative formulation isn't, I think it remains to explain why.

(As with my comments about TDT, my remarks about UDT1 are under the blanket caveat that my grasp of the intended content of the theories is still tentative, so my criticisms may just reflect a misunderstanding on my part.)

First, to clear up a possible confusion, the S in my P is not supposed to be a variable. It's a constant, more specifically a piece of code that implements UDT1 itself. (If I sometimes talk about it as if it's a variable, that's because I'm trying to informally describe what is going on inside the computation that UDT1 does.)

For the more general question of how do we know the structure of the world program, the idea is that for an actual AI, we would program it to care about all possible world programs (or more generally, mathematical structures, see examp... (read more)