Timeless Decision Theory and Meta-Circular Decision Theory

[-]Gary_Drescher16y140

Thanks, Eliezer--that's a clear explanation of an elegant theory. So far, TDT (I haven't looked carefully at UDT) strikes me as more promising than any other decision theory I'm aware of (including my own efforts, past and pending). Congratulations are in order!

I agree, of course, that TDT doesn't make the A6/A7 mistake. That was just a simple illustration of the need, in counterfactual reasoning (broadly construed), to specify somehow what to hold fixed and what not to, and that different ways of doing so specify different senses of counterfactual inference (i.e., that there are different kinds of 'if-counterfactually'). If counterfactual inference is construed a la Pearl, for example, then such inferences (causal-counterfactual) correspond to causal links (if-causally).

As you say, TDT's utility formula doesn't perform general logical inferences (or evidential-counterfactual inferences) from the antecedents it evaluates (i.e. the candidate outputs of the Platonic computation). Rather, the utility formula performs causal-counterfactual inferences from the set of nodes that designate the outputs of the Platonic computation, in all places where that Platonic computation is approximately physically instantiated.

However, it seems to me we can, if we wish, use TDT to define what we can call a TDT-counterfactual that tells us would be true 'if-timelessly' a particular physical agent's particular physical action were to occur. In particular, whereas CDT says that what would be true (if-causally) consists of what's causally downstream from that action, TDT says that what would be true (if-timelessly) consists of what's causally downstream from the output of the suitably-specified Platonic computation that the particular physical agent approximately implements, and also what's causally downstream from that same Platonic computation in all other places where that computation is approximately physically instantiated. (And the physical TDT agent argmaxes over the utilities of the TDT-counterfactual consequences of that agent's candidate actions.)

I think there are a few reasons we might sometimes find it useful to think in terms of the TDT-counterfactual consequences of a physical agent's actions, rather than directly in terms of the standard TDT formulation (even though they're merely two different ways of expressing the same decision theory, unless I've misunderstood).

The TDT-counterfactual perspective places TDT in a common framework with other decision theories that (implicitly or explicitly) use other kinds of counterfactual reasoning, starting with a physical agent's action as the antecedent. Then we can apply some meta-criterion to ask which of those alternative theories is correct, and why. (That was the intuition behind my MCDT proposal, although MCDT itself was hastily specified and too simpleminded to be correct.)
Plausibly, people are agents who think in terms of the counterfactual consequences of an action, rather than being hardwired to use TDT. If we are to choose to act in accordance with TDT from now on (or, equivalently, if we are to build AIs who act in accordance with TDT), we need to be persuaded that doing so is for the best (even if e.g. a Newcomb snapshot was already taken before we became persuaded). (I'm assuming here that our extant choice machinery allows us the flexibility to be persuaded about what sort of counterfactual to use; if not, alas, we can't necessarily get there from here).
In the standard formulation of TDT, you effectively view yourself as an abstract computation with one or more approximate physical instantiations, and you ask what you (thus construed) cause (i.e. what follows causal-counterfactually). In the alternative formulation, I view myself as a particular physical agent that is among one or more approximate instantiations of an abstract computation, and I ask what follows TDT-counterfactually from what I (thus construed) choose.

The original formulation seems to require a precommitment to identify oneself with all instantiations (in the causal net) of the abstract computation (or at least seems to require that in order for us non-TDT agents to decide to emulate TDT). And that identification is indeed plausible in the case of fairly exact replication. But consider, say, a 1-shot PD game between Eliezer and me. Our mutual understanding of reflexive consistency would let us win. And I agree that we both approximately instantiate, at some level of abstraction, a common decision computation, which is what lets the TDT framework apply and lets us both win.

But (in contrast with an exact-simulation case) that common computation is at a level of abstraction that does not preserve our respective personal identities. (That's kind of the point of the abstraction. My utility function for the game places value on Gary's points and not Eliezer's points; the common abstract computation lacks that bias.) So I would hesitate to identify either of us with the common abstraction. (And I see in other comments that Eliezer explicitly agrees.) Rather, I'd like to reason that if-timelessly I, Gary, choose 'Cooperate', then so does Eliezer. That way, "I am you as you are me" emerges as a (metaphorical) conclusion about the situation (we each have a choice about the other's action in the game, and are effectively acting together) rather than being needed as the point of departure.

Again, the foregoing is just an alternative but equivalent (unless I've erred) way of viewing TDT, an alternative that may be useful for some purposes.

[-]Gary_Drescher16y50

[In TDT] If you desire to smoke cigarettes, this would be observed and screened off by conditioning on the fixed initial conditions of the computation - the fact that the utility function had a positive term for smoking cigarettes, would already tell you that you had the gene. (Eells's "tickle".) If you can't observe your own utility function then you are actually taking a step outside the timeless decision theory as formulated.

Consider a different scenario where people with and without the gene both desire to smoke, but the gene makes that desire stronger, and the stronger it is, the more likely one is to smoke. Even when you observe your own utility function, you don't necessarily have a clue whether the utility assigned to smoking is the level caused by the gene or else by the gene's absence. So your observation of your utility function doesn't necessarily help you to move away from the base-level probability of having cancer here.

[-]Ronny Fernandez14y40

Quick stupid question: what does "A[]->O" stand for? specifically "[]->"? Is that a material implication? Should I read it as "P(O|A is the output of this computation and rest of universe)"?

(edit): could someone please help with this?

[-]chaosmosis14y30

I suck at symbolic logic or computer logic or whatever so I'm commenting in the hope that someone else sees my comment and answers your question.

[-][anonymous]12y10

It means "If A were true, then O would be true." Note that this is a counterfactual statement.

[-]thomblake14y10

I could be wrong about this, but I believe the arrow is intended to indicate a functional mapping, and the [] is some noise about types. So: The probability that (this computation yields a lambda mapping A to O) given (rest of universe).

It would be nice if someone weighed in with something more definitive. Various reference materials, along with search tools such as Google and Symbolhound, are not particularly helpful.

[-]Ronny Fernandez13y30

Does

Argmax[A in Actions] in Sum[O in Outcomes] (Utility(O)*P(this computation yields A []-> O|rest of universe))

evaluate to:

def Tdt(Actions,Outcomes):
currentMax = 0
output = null
for A in Actions:
sum = 0
for O in Outcomes:
sum += U(O)*P(O|Tdt(Actions,Outcomes) == A and background knowledge)
if sum >= currentMax:
currentMax = sum
output = A
return output

Or am I missing some subtly? I am assuming that "P" and "U" have been defined elsewhere, and that python can deal with referencing the outcome of a computation inside itself before it has been completed (or at least that the probability function halts when given a yet to be computed function evaluating to a certain output as its inout statement). (edit): couldn't get the tabs to work, it's supposed to be pseudo python, but it's probably just as readable. Is there a way to type set tabs in the comments?

[-]alex_zag_al13y10

From the "Comment formatting" page on the wiki:

To make a paragraph where your indentation is preserved and no characters are treated specially, precede each line with (at least) four spaces. This is commonly used for computer program source code.

[-]RHollerith16y30

Bravo Eliezer! The material is extremely crispy, and I have never seen anyone who can explain technical material as well as you.

Quick question, please!

if we have completely observed our own initial source code, and perhaps observed Omega's initial source code which contains a copy of our source code and the intention to simulate it, but we do not yet know our own decision, then the only way in which our uncertainty about our own physical act can possibly be correlated at all with Omega's past act to fill or leave empty the box B - given that neither act physically causes the other - is if there is some common ancestor node unobserved;

Identifying this common ancestor node as the logical output of the expected-utility calculation is what you referred to earlier as Godelian diagonalization; is it not?

[-]Eliezer Yudkowsky16y40

No, the Godelian diagonal is the self-replicating recipe you use to have the computation talk about itself when it says "my own result". See p.3 of here.

Bravo Eliezer! The material is extremely crispy

Really? I thought I was frantically blurting out a huge blog-comment response that I didn't really have time to edit all that well.

[-]thomblake16y30

Seconding Richard's comment. You seem hesistant to explain technical things here for fear of being imprecise, but you're actually very good at explaining yourself and many of the folks here can fill in the gaps.

[-]RHollerith16y10

Really? I thought I was frantically blurting out a huge blog-comment response that I didn't really have time to edit all that well.

I was taking the signs that the response was blurted out quickly into account in my evaluation of skill level.

Maybe I should have put a "probably" in my statement. Certainly you are particularly good at explaining technical material to me.

[-]Johnicholas16y30

Gary Drescher wrote: "Unsurprisingly, a false premise leads to a contradiction. To avoid contradiction, ..."

I was under the impression that the relevant logicians (e.g. Anderson, Belnap, Dunn, Meyer) had solved this problem (of having to avoid irrelevant contradictions) decisively. Instead, EY uses the gadgetry of surgery on causal Bayesian networks to address this. Is there a sense in which relevant logics are doing screening and/or surgery? Does anyone know of an exposition that connects relevant logics to Pearl's counterfactuals?

[-]ChrisHibbert16y20

Eli, you are doing an amazing good job of putting Pearl's calculus into a verbal form, but I can't help feeling that this would be clearer if you had a few graphs. Do you have tools that would let you draw the causal diagrams? Why not use them? Is it that the move from Pearl's causal calculus to TDT is hard to express in the graph notation? I still think, in that case, that the causal surgery part of the argument would be clearer in Pearl's notation.

[-]Eliezer Yudkowsky16y20

Do you have tools that would let you draw the causal diagrams?

No. Do you have recommendations?

[-]anonym16y20

I looked through a paper of Pearl's to see what causal diagrams look like, and what I saw seemed like a good match for Graphviz. I noticed that Shalizi used it for many of the diagrams in his thesis too.

[-]Daniel_Lewis16y00

Graphviz is the LaTeX of graph-drawing tools. You'll get professional-looking output immediately, but the customization options aren't as discoverable as they would be in a visual editor.

If you plan on making lots of graphs or want them to look very pretty, I'd recommend it. If you're just looking for a quick way to draw a graph or two explaining TDT vs. CDT it may not be worth the time relative to a generic (vector) drawing program.

(The Python bindings might make things marginally easier if you know Python and don't want to learn more syntax.)

[-]anonym16y00

I'm think you're exaggerating how difficult it is to use graphviz for simple things by comparing it to LaTeX. Consider this diagram in the gallery and look at how trivially simple the source file that generates that image is.

I don't disagree that doing complex things can be difficult, but for graphs that consist of a handful of nodes and edges with assorted labels, and some boxes to group nodes together, it's hard to beat graphviz.

[-]Vladimir_Golovin16y20

If you're under Windows, Microsoft Visio will do just fine. Also, there are tools like Smartdraw and Gliffy, but I don't have any experience with them.

[-]ChrisHibbert16y00

I use OmniGraffle for such things on a Mac. Many people seem happy with the drawing packages in their word processor or presentation program, though. The advantage of an object based editing program is that you can keep arrows connected as you drag things around.

[-]gwern16y00

As a graphics doofus, I found Inkscape relatively easy to pick up the basics. But honestly, even a MS Paint/GNU Paint diagram would be better than nothing.

[-]Wei Dai16y20

Here's how my initial formulation of UDT (let's call it UDT1 for simplicity) would solve Drescher's problem.

Among the world programs embedded (and given a weight) in S, would be the following:

def P():
    action = S("the value of action Ai is simply i")
    S_utility = ActionToValue(action) # maps Ai to i

If this is the only world program that calls S with "the value of action Ai is simply i", and S's utility function has a component for S_utility at the end of this P, then upon that input, S would iterate over the Ai's, and for each Ai, compute what S_utility would be at the end of P under the assumption that S returns Ai. Finally it returns An since that maximizes S_utility.

Eliezer, the way you described it is:

If combined with TDT, we would interpret UDT as having a never-updated weighting on all possible universes, and a causal structure (causal graph, presumably) on those universes. Any given logical computation in UDT will count all instantiations of itself in all universes which have received exactly the same inputs - even if those instantiations are being imagined by Omega in universes which UDT would ordinarily be interpreted as "knowing to be logically inconsistent", like universes in which the third decimal digit of pi is 3. Then UDT calculates the counterfactual consequences, weighted across all imagined universes, using its causal graphs on each of those universes, of setting the logical act to A_i. Then it maximizes on A_i.

The "causal graph" part doesn't sound like UDT1. Is it equivalent?

ETA: To respond to Drescher's

"Suppose I choose A6. I know I'm a utility-maximizing agent, and I already know there's another choice that has value 7. Therefore, if follows from my (hypothetical) choice of A6 that A6 has a value of at least 7."

S is simply not programmed to think that. For A6 it would simulate P with "return A6" substituting for S, and calculate the utility of A6 that way.

ETA2: The previous sentence assumes that's what the "mathematical intuition" black box does.

[-]Eliezer Yudkowsky16y20

Wei, if you want to calculate the consequence of an action, you need to know that this computation outputting A1 has something do with box B containing a million dollars (and being obtained by you, for that matter) or that A2 has something to do with the driver in Parfit's Hitchhiker deciding to pick you up and take you to the city. (And yet hypothetically choosing A6 is not used to infer, inside the counterfactual, that A6 actually was better than A7.)

This is what I am saying would get computed via the causal graphs, and which may require actual counterfactual surgery a la Pearl - at least the part where you don't believe that A6 actually was better than A7 or that (hypothetically) deciding to cross the road makes it safe - though you may not need to recompute Parfit's Hitchhiker, since this is an updateless decision theory to begin with.

[-]Wei Dai16y10

I'm afraid I don't understand you. Can you look at my solution to Drescher's problem and point out which part is wrong or problematic? Or give a sample problem that UDT1 can't deal with because it doesn't use causal graphs?

Last time I tried to read Pearl's book, I didn't get very far. I'll try again if given sufficient motivation. I guess you can either explain to me some more about what problem it solves, or I can just take your word for it, if you think it's really a necessary component for UDT, and I'll understand that after I comprehend Pearl.

[-]Eliezer Yudkowsky16y90

We're taking apart your "mathematical intuition" into something that invents a causal graph (this part is still magic) and a part that updates a causal graph "given that your output is Y" (Pearl says how to do this).

If you literally have the ability to run all of reality excluding yourself as a computer program, I suppose the causal graph part might be moot, since you could just simulate elementary particles directly, instead of approximating them with a high-level causal model. But then it's not clear how to literally simulate out the whole universe in perfect detail when the inside of your computer is casting gravitational influences outward based on transistors whose exact value you haven't yet computed (since you can't compute all of yourself in advance of computing yourself!).

With different physics and a perfect Cartesian embedding (a la AIXI) you could do this, perhaps. With a perfect Cartesian embedding and knowledge of the rest of the universe outside yourself, there would be no need for causal graphs of any sort within the theory, I think. But you would still have to factor out your logical uncertainty in a way which prevented you from concluding "if I choose A6, it must have had higher utility than A7" when considering A6 as an option (as Drescher observes). After all, if you suffered a brief bout of amnesia afterward, and I told you with trustworthy authority that you really had chosen A6, you would conclude that you really must have calculated higher expected utility for it relative to your probability distribution and utility function.

If I believably tell you that Lee Harvey Oswald really didn't shoot JFK, you conclude that someone else did. But in the counterfactual on our standard causal model, if LHO hadn't shot JFK, no one else would have. So when postulating that your output is A6 inside the decision function, you've got to avoid certain conclusions that you would in fact come to, if you observed in reality that your output really was A6, like A6 having higher expected utility than A7. This sort of thing is the domain of causal graphs, which is why I'm assuming that the base model is a causal graph with some logical uncertainty in it. Perhaps you could come up with a similar but non-causal formalism for pure logical uncertainty, and then this would be very interesting.

[-]Wei Dai16y30

Eliezer, one of your more recent comments finally prodded me into reading http://bayes.cs.ucla.edu/IJCAI99/ijcai-99.pdf (don't know why I waited so long), and I can now understand this comment much better. Except this part:

But you would still have to factor out your logical uncertainty in a way which prevented you from concluding "if I choose A6, it must have had higher utility than A7" when considering A6 as an option (as Drescher observes).

Under UDT1, when I'm trying to predict the consequences of choosing A6, I do want to assume that it has higher expected utility than A7. Because suppose my prediction subroutine sees that there will be another agent who is very similar to me, about to make the same decision, it should predict that it will also choose A6, right?

Now when the prediction subroutine returns, that assumption pops off the stack and goes away. I then call my utility evaluation routine to compute a utility for those predictions. There is no place for me to conclude "if I choose A6, it must have had higher utility than A7" in a form that would cause any problems.

Am I missing something here?

[-]Eliezer Yudkowsky16y20

Why bother predicting the counterfactual consequences of choosing A6 since you already "know" the EU is higher than A7 and all the other options?

On the other hand, if you actually do see a decision process similar to your decision choose A6, then you know that A6 really does have EU higher than A7.

[-]Wei Dai16y40

Why bother predicting the counterfactual consequences of choosing A6 since you already "know" the EU is higher than A7 and all the other options?

Are you sure you're not anthropomorphizing the decision procedure? If I actually run through the steps that it specifies in my head, I don't see any place where it would say "why bother" or fail to do the prediction.

On the other hand, if you actually do see a decision process similar to your decision choose A6, then you know that A6 really does have EU higher than A7.

No, in UDT1 you don't update on outside computations like that. You just recompute the EU.

[-]Vladimir_Nesov16y00

In any case, you shouldn't know wrong things at any point. The trick is to be able to consider what's going on without assuming (knowing) that you result from an actual choice.

No, in UDT1 you don't update on outside computations like that. You just recompute the EU.

This doesn't seem right. You update quite fine, in the sense that you'd prefer a strategy where observing utility-maximizer choose X leads you to conclude that X is the highest-utility choice, in the sense that all the subsequent actions are chosen as if it's so.

[-]Psy-Kosh16y30

Looking over this... maybe this is stupid, but... isn't this sort of a use/mention issue?

When simulating "if I choose A6", then simulate "THEN I would have believed A6 has higher EU", without having to escalate that to "actual I (not simulated I) actually currently now believes A6 has higher EU"

Just don't have a TDT agent consider the beliefs of the counterfactual simulated versions of itself it be a reliable authority on actual noncounterfactual reality.

Am I missing the point? Am I skimming over the hard part, or...?

[-]Eliezer Yudkowsky16y40

That's one possible approach. But then you have to define what exactly constitutes a "use" and what constitutes a "mention" with respect to inferring facts about the universe. Compare the crispness of Pearl's counterfactuals to classical causal decision theory's counterfactual distributions falling from heaven, and you'll see why you want more formal rules saying which inferences you can carry out.

[-]Psy-Kosh16y30

Seems to me that it ought be treatable as "perfectly ordinary"...

That is, if you run a simulation, there's no reason to for you to believe the same things that the modeled beings believe, right? If one of the modeled beings happen to be a version of you that's acting and believing in terms of a counterfactual that is the premise of the simulation, then... why would that automatically lead to you believing the same thing in the first place? If you simulate a piece of paper that has written upon it "1+1=3", does that mean that you actually believe "1+1=3"? So if instead you simulate a version of yourself that gets confused and believes that "1+1=3"... well, that's just a simulation. If there's a risk of that escalating into your actual model of reality, that would suggest something is very wrong somewhere in how you set up a simulation in the first place, right?

ie, simulated you is allowed to make all the usual inferences from, well, other stuff in the simulated world. It's just that actual you doesn't get to automatically equate simulated you's beliefs with actual you's beliefs.

So allow the simulated version to make all the usual inferences. I don't see why any restriction is needed other than the level separation, which doesn't need to treat this issue as a special case.

ie, simulated you in the counterfactual in which A6 was chosen believes that, well, A6 is what the algorithm in question would choose as the best choice. So? You calmly observe/model the actions simulated you takes if it believes that and so on without having to actually believe that yourself. Then, once all the counterfactual modelings are done and you apply your utility function to each of those to determine their actual expected utility, thus finding that A7 produces the highest EU, you actually do A7.

It simply happens to be that most of the versions of you from the counterfactual models that arose in the process of doing the TDT computation had false beliefs about what the actual output of the computation actually is in actual reality.

Am I missing the point still, or...?

(wait... I'm understanding this issue to be something that you consider an unsolved issue in TDT and I'm saying "no, seems to me to be simple to make TDT do the right thing here. The Pearl style counterfactual stuff oughtn't cause any problem here, no special cases, no forbidden inferences need to be hard coded here", but now, looking at your comment, maybe you meant "This issue justifies TDT because TDT actually does the right thing here", in which case there was no need for me to say any of this at all. :))

[-]Vladimir_Nesov16y00

The belief that A6 is highest-utility must come from somewhere. Strategy that includes A6 is not guaranteed to be real (game semantics: winning, ludics: without a daemon), that is it's not guaranteed to hold without assuming facts for no reason. The action of A6 is exactly such an assumption that is given no reason to be actually found in the strategy, and the activity of the decision-making algorithm is exactly in proving (implementing) one of the actions to be actually carried out. Of course, the fact that A6 is highest-utility may also be considered counterfactually, but then you are just doing something not directly related to proving this particular choice.

[-]Psy-Kosh16y00

Sorry, I'm not sure I follow what you're saying.

I meant when dealing with the logical uncertainty of not yet knowing the outcome of the calculation that your decision process consists of, and counterfactually modelling each of the outcomes it "could" output, then when modeling the results of your own actions/beliefs as a result of that, simply don't escalate that from a model of you to, well, actually you. The simulated you that conditions based on you (counterfactually) having decided A6 would presumably believe A6 has higher utility. So? You, who are also running the simulation for if you had chosen A7, etc etc, would compare and conclude that A7 has highest utility, even though simulated you believes (incorrectly) A6. Just keep separate levels, don't do use/mention style errors, and (near as I can tell) there wouldn't be a problem.

Or am I utterly missing the point here?

[-]Vladimir_Nesov16y10

Remember the counterfactual zombie principle: you are only implication, your decision or your knowledge only says what it would be if you exist, but you can't assume that you do exist.

When you counterfactual-consider A6, you consider how the world-with-A6 will be, but don't assume that it exists, and so can't infer that it's of highest utility. You are right that your copy in world-with-A6 would also choose A6, but that also doesn't have to be an action of maximum utility, since it's not guaranteed the situation will exist. For the action that you do choose, you may know that you've chosen it, but for the action you counterfactually-consider, you don't assume that you do choose it. (In causal networks, this seems to correspond to cutting off the action-node from yourself before setting it to a value.)

[-]thomblake16y10

But then it's not clear how to literally simulate out the whole universe in perfect detail when the inside of your computer is casting gravitational influences outward based on transistors whose exact value you haven't yet computed (since you can't compute all of yourself in advance of computing yourself!).

Somewhat tangentially, this is a way to grok how the information-processing capabilities of markets are computationally intractable to simulate (or predict their outputs via experts).

[-]Wei Dai16y00

Thanks, that's helpful.

LESSWRONG
LW

LESSWRONG
LW

42

Timeless Decision Theory and Meta-Circular Decision Theory

42

42