What Program Are You?

25Eliezer Yudkowsky

9MichaelGR

3RobinHanson

2Eliezer Yudkowsky

15RobinHanson

1Wei Dai

1Eliezer Yudkowsky

2mormon2

1whpearson

0Psy-Kosh

1SilasBarta

0Psy-Kosh

1SilasBarta

6Ramana Kumar

0Kutta

4Wei Dai

2jimrandomh

2emil_mieilica

1RobinHanson

1Benquo

1Jonii

1RobinHanson

2Wei Dai

1Tyrrell_McAllister

1whpearson

-1wedrifid

1whpearson

2Vladimir_Nesov

0whpearson

0wedrifid

0whpearson

0wedrifid

1RobinHanson

2Tyrrell_McAllister

1[anonymous]

1RobinHanson

2Tyrrell_McAllister

7whpearson

2cousin_it

1Tyrrell_McAllister

0jimrandomh

-2shibl

-2timtyler

New Comment

Methodological remark: One should write at some point on a very debilitating effect that I've noticed in decision theory, philosophy generally, and Artificial Intelligence, which one might call Complete Theory Bias. This is the academic version of Need for Closure, the desire to have a complete theory with all the loose ends sewn up for the sake of appearing finished and elegant. When you're trying to eat Big Confusing Problems, like anything having to do with AI, then Complete Theory Bias torpedoes your ability to get work done by preventing you from navigating the space of partial solutions in which you can clearly say what you're trying to solve or not solve at a given time.

This is very much on display in classical causal decision theory; if you look at Joyce's *Foundations of Causal Decision Theory*, for example, it has the entire counterfactual distribution falling as mana from heaven. This is partially excusable because Pearl's book on how to compute counterfactual distributions had only been published, and hence only really started to be popularized, one year earlier. But even so, the book (and any other causal decision theories that did the same thing) should have carried a big sign saying, "This counterfactual distribution, where all the interesting work of the theory gets carried out, falls on it as manna from heaven - though we do consider it obvious that a correct counterfactual for Newcomb ought to say that if-counterfactual you one-box, it has no effect on box B." But this would actually get less credit in academia, if I understand the real rules of academia correctly. You do not earn humility points for acknowledging a problem unless it is a convention of the field to acknowledge that particular problem - otherwise you're just being a bother, and upsetting the comfortable pretense that nothing is wrong.

Marcello and I have all sorts of tricks for avoiding this when we navigate the space of fragmentary solutions in our own work, such as calling things "magic" to make sure we remember we don't understand them.

TDT is very much a partial solution, a solution-fragment rather than anything complete. After all, if you had the *complete* decision process, you could run it as an AI, and I'd be coding it up right now.

TDT does say that you ought to use Pearl's formalism for computing counterfactuals, which is progress over classical causal decision theory; but it doesn't say how you get the specific causal graph... since factoring the causal environment is a very open and very large AI problem.

Just like the entire problem of factoring the environment into a causal graph, there's a whole entire problem of reasoning under logical uncertainty using limited computing power. Which is another huge unsolved open problem of AI. Human mathematicians had this whole elaborate way of believing that the Taniyama Conjecture implied Fermat's Last Theorem at a time when they didn't know whether the Taniyama Conjecture was true or false; and we seem to treat this sort of implication in a rather different way than "2=1 implies FLT", even though the material implication is equally valid.

TDT assumes there's a magic module bolted on that does reasoning over impossible possible worlds. TDT requires this magic module to behave in certain ways. For the most part, my methodology is to show that the magic module has to behave this way *anyway* in order to get commonsense logical reasoning done - i.e., TDT is nothing *special*, even though the whole business of reasoning over impossible possible worlds is an unsolved problem.

To answer Robin's particular objection, what we want to do is drop out of TDT and show that an analogous class of reasoning problems apply to, say, pocket calculators. Let's say I know the transistor diagram for a pocket calculator. I type in 3 + 3, not knowing the answer; and upon the screen flashes the LED structure for "6". I can interpret this as meaning 3 + 3 = 6, or I can interpret it as a fact about the output of this sort of transistor diagram, or I can interpret it as saying that 3 + 3 is an even number, or that 2 * 3 is 6. And these may all tell me different things, at first, about the output of another, similar calculator. *But* all these different interpretations should generally give me *compatible* logical deductions about the other calculator and the rest of the universe. If I arrive at *contradictory* implications by forming different abstractions about the calculator, then my magic logic module must not be sound.

The idea that you want to regard "all computations similar to yourself as having the same output" is just a gloss on the real structure. In the real version, there's a single *canonical* mathematical fact of which you are presently logically uncertain, the output of the Godelian diagonal:

Argmax[A in Actions] in Sum[O in Outcomes](Utility(O)*P( *this computation* yields A []-> O|rest of universe))

The **this computation** above is not a reference to your entire brain. It is a reference to that *one equation* above, the canonical diagonal form. It's *assumed*, in TDT, that you're implementing that particular equation - that TDT *is* how you make your decisions.

Then you assume *that particular equation* has a particular output, and update your view of the rest of the physical universe accordingly. In "almost" the same way you would update your view of the universe when you saw the calculator output "6". It might indeed depend on your logical reasoning engine. There might be things similar to yourself that you did not know were similar to yourself. If so, then you'll (all) do worse, because your logical reasoning engine is weaker. But you should at least not arrive at a contradiction, if your logical reasoning engine is at least sound.

What if you can only approximate that equation instead of computing it directly, so that it's possible that you and the equation will have different outputs? Should the equation be about your approximation of it, or should you just try to approximate the original equation? This is an open problem in TDT, which reflects the underlying open problem in AI; I just assumed there was enough computing power to do the above finite well-ordered computation directly. If you could show me a particular approximation, I might be able to answer better. Or someone could deliver a decisive argument for why any approximation ought to be treated a particular way, and that would make the problem less open in TDT, even though which approximation to use would still be open in AI.

(I also note at this point that the only way your counterfactual can apparently control the laws of physics, is if you know that the laws of physics imply that at least one answer is not compatible with physics, in which case you already *know* that option is not the output of the TDT computation, in which case you know it is not the best thing to do, in which case you are done considering it. So long as all answers *seem* not-visibly-incompatible with physics relative to your current state of logical knowledge, supposing a particular output should not tell you anything about physics.)

An example of a much more unsolved problem within TDT, which is harder to dispose of by appeal to normal non-TDT logical reasoning, is something that I only realized existed after reading Drescher; you actually can't update on the subjunctive / counterfactual output of TDT in exactly the same way you can update on the actually observed output of a calculator. In particular, if you actually observed something isomorphic to your decision mechanism output action A2, you could infer that A2 had higher expected utility than A1, including any background facts about the world or one's beliefs about it, that this would require; but if we only *suppose* that the mechanism is outputting A2, we don't want to presume we've just calculated that A2 > A1, but we do want to suppose that other decision mechanisms will output A2.

The two ways that have occurred to me for resolving this situation would be to (1) stratify the deductions into the physical and the logical, so that we can deduce within the counterfactual that other physical mechanisms will output "A2", but not deduce within our own logic internal to the decision process that A2 > A1. Or (2) to introduce something akin to a causal order within logical deductions, so that "A2 > A1" is a parent of "output = A2" and we can perform counterfactual surgery on "output = A2" without affecting the parent node.

So is the "input" to this computation the functions U and P? Is "that computation" all places in spacetime when this particular input was considered, or all uses of the TDT framework at all?

"This computation" is exactly equal to the Godelian diagonal and anything you can *deduce* from making assumptions about it. If I assume the output of a calculator into which I punched "3 + 3" is "6", then the question is not "What computation do I believe this to be, exactly?" but just "What else can I logically infer from this given my belief about how various other logical facts are connected to this logical fact?" You could regard the calculator as being a dozen different calculations simultaneously, and if your inferences are sound they ought not to tangle up.

With that said, yes, you could view the TDT formula as being parameterized around U, P, and the action set A relative to P. But it shouldn't matter how you view it, any more than it matters how you view a calculator for purposes of making inferences about arithmetic and hence other calculators. The key inferences are not carried out through a reference class of computations which are all assumed to be correlated with each other and not anything else. The key inferences are carried out through more general reasoning about logical facts, such as one might use to decide that the Taniyama Conjecture implied Fermat's Last Theorem. In other words, I can make inferences about other computations without seeing them as "the same computation" by virtue of general mathematical reasoning.

"That computation" is just a pure abstract mathematical fact about the maximum of a certain formula.

Counterexample request: can you give me a specific case where it matters which computation I view myself as, given that I'm allowed to make general mathematical inferences?

I really have a lot of trouble figuring out what you are talking about. I thought I could take just one concept you referred to and discuss that, but apparently this one concept is in your mind deeply intertwined with all your other concepts, leaving me without much ground to stand on to figure out what you mean. I guess I'll just have to wait until you write up your ideas in a way presentable to a wider audience.

I agree that if we had a general theory of logical uncertainty, then we wouldn't need to have an answer to Robin's question.

Counterexample request: can you give me a specific case where it matters which computation I view myself as, given that I'm allowed to make general mathematical inferences?

I think the old True PD example works here. Should I view myself as controlling the computation of both players, or just player A, assuming the two players are not running completely identical computations (i.e. same program *and* data)? If I knew how I should infer the decision of my opponent given my decision, then I wouldn't need to answer this question.

What I would generally say at this point is, "What part of this is a *special* problem to TDT? Why wouldn't you be faced with just the same problem if you were watching two *other* agents in the True PD, with some particular partial knowledges of their source code, and I told you that one of the agents' computations had a particular output? You would still need to decide what to infer about the other. So it's not *TDT's* problem, it legitimately modularizes off into a magical logical inference module..."

(Of course there *are* problems that are special to TDT, like logical move ordering, how not to infer "A1 has EU of 400, therefore if I output A2 it must have EU > 400", etc. But "Which computation should I view myself as running?" is not a special problem; you could ask it about any calculator, and if the inference mechanism is sound, "You can use multiple valid abstractions at the same time" is a legitimate answer.)

"TDT is very much a partial solution, a solution-fragment rather than anything complete. After all, if you had the complete decision process, you could run it as an AI, and I'd be coding it up right now."

I must nitpick here:

First you say TDT is an unfinished solution, but from all the stuff that you have posted there is no evidence that TDT is anything more than a vague idea; is this the case? If not could you post some math and example problems for TDT.

Second, I hope this was said in haste not in complete seriousness that if TDT was complete you could run it as an AI and you'd be coding. So does this mean that you believe that TDT is all that is required for the theory end of AI? Or are you stating that the other hard problems such as learning; sensory input and recognition, and knowledge representation are all solved for your AI? If this be the case I would love to see a post on that.

Thanks

Have you defined the type/interface of the magic modules? In haskell at least you can define a function as undefined with a type signature and check whether it compiles.

What if you can only approximate that equation instead of computing it directly, so that it's possible that you and the equation will have different outputs? Should the equation be about your approximation of it, or should you just try to approximate the original equation?

Incidentally, that's essentially a version the issue I was trying to deal with here (and in the linked conversation between Silas and I)

Ooh! Good point! And for readers who follow through, be sure to note my causal graph and my explanation of how Eliezer_Yudkowsky has previously accounted for how to handle errors when you can't compute *exactly* what your output will be due to the hardware's interference [/shameless self-promotion]

If you're right, I'd be extra confused, because then Eliezer could account for the sort of error I was describing, in terms of ambiguity of what algorithm you're actually running, but could *not* deal with the sort of errors due to one merely approximating the ideal algorithm, which I'd think to be somewhat of a subset of the class of issues I was describing.

Well, either way, as long as the issue is brought to the front and solved (eventually) *somehow*, I'm happy. :)

The difference is that Newcomb's problem **allows** you to assume that your (believed) choice of output is guaranteed to be your actual decision.

Post-computation interference only occurs in **real-life** scenarios (or hypotheticals that assume this realistic constraint), and it is those scenarios where Eliezer_Yudkowsky shows that you should pick a different computation output, given its robustness against interference from your "corrupted hardware".

Does it bother anyone else that the world doesn't even decompose uniquely into physical objects?

There is a damn lot of regularity at human levels. Well, even flatworms are able to navigate through their lives, with their dismal intellect - in which intellect's design stable *environmental regularities* played a big role. I think this universe is actually rather friendly to reasoners embedded in it.

Robin, until we solve this problem (and I do agree that you've identified a problem that needs to be solved), is there anything wrong with taking the decomposition of an agent into program and data as an external input to the decision theory, much like how priors and utility functions are external inputs to evidential decision theory, and causal relationships are an additional input to causal decision theory?

It seems that in most decision problems there are intuitively obvious decompositions, even if we can't yet formalize the criteria that we use to to do this, so this doesn't seem to pose a practical problem as far as using TDT/UDT to make everyday decisions. Do you have an example where the decomposition is not intuitively obvious?

It seems that in most decision problems there are intuitively obvious decompositions, even if we can't yet formalize the criteria that we use to to do this

I propose the following formalization. The "program" is everything that we can control fully and hold constant between all situations given in the problem. The "data" is everything else.

Which things we want to hold constant and which things vary depend on the problem we're considering. In ordinary game theory, the program is a complete strategy, which we assume is memorized before the beginning and followed perfectly, and the data is some set of observations made between the start of the game and some decision point within it. Problems may force us to move things that are normally part of the program into the state, by taking them out of our control. For example, when reasoning about how a company should act in relation to a market, we treat everything that decides what the corporation does as a black box program, and the observations it makes of the market as its input data. If internal politics matter, then we have to narrow the black-boxing boundary to only ourselves. If we're worried about akrasia or mind control, then we draw the boundary inside our own mind.

Whether something is Program or Data is not a property of the object itself, but rather of how we reason about it. If it can be fully modeled as a black box function, then it's part of the program; otherwise it's data.

If Functional Programming and LISP has taught me anything is that all "programs" are "data". The boundary between data and code is blurry at least. We are all instances of "data" that is executed on the machine known as the "Universe". (I think this kind of Cartesian duality will lead to other dualities and I don't think we need "soul" and "body" mixed into this talk)

The decomposition rarely seems intuitively obvious to me. For example, what part of me is program vs. data? And are there any constraints on acceptable decompositions? Is it really all right to act as if you were controlling the actions of all physical objects, for example?

I wonder if it would help to try to bracket the uncertain area with less ambiguous cases, and maybe lead to a better articulation of the implicit criteria by which people distinguish program and data.

On one side, I propose that if the behavior you're talking about would also be exhibited by a crash dummy substituted for your body, then it's data and not program. For example, if someone pushes me off a cliff, it's not my suicidal "program" that accelerates me downwards @ 32ft / s^2, but the underlying "data."

On the other, if you write down a plan beforehand and actually locomote (e.g. on muscle power) to enact the plan, then it is program.

Are these reasonable outer bounds to our uncertainty? If not, why? If so, can we narrow them further?

The advice to "choose as though controlling the logical output of the abstract computation you implement" might have you choose as if you controlled the actions of all physical objects, if you viewed the laws of physics as your program, or choose as if you only controlled the actions of the particular physical state that you are, if every distinct physical state is a different program.

This seems to be about identity. First case being where one thinks that they are the Universe itself, and latter being complete denialist of time passing in any sense at all, and our experience being non-connected solid state, thus refusing to accept that he is in any meaningful sense the guy he was 5 minutes earlier. I don't think it's wrong that this kind of differences in views of self-identity should change our decisions.

A positive theory of human behavior may well depend on self-assigned identity. But a normative theory of agent behavior will need a normative theory of identify if identity is to be a central element.

But a normative theory of agent behavior will need a normative theory of identify if identity is to be a central element.

Doesn't causal decision theory also require a theory of identify, which you have to use in order to provide CDT with a set of possible choices? For example, in a Prisoner's Dilemma game, you could identify with both players, and make your choice set the four pairs {,,,} instead of {C,D}, but presumably you don't. If you're puzzled about whether you should view yourself as controlling the actions of all physical objects under TDT/UDT, why aren't you puzzled about this same question under CDT?

Don't these kinds of considerations apply to any decision theory? Don't they all suppose that you're given some kind of carving-up of the world into various things with various manipulable properties? Don't they all suppose that you have some kind of identity criteria for saying when things are "the same", and for partitioning up events to assign payoffs to them? Is any decision theory responsible for dictating what your initial carving-up of the world should be?

I think that TDT and UDT assume that the agent, for whatever reason, starts out with a given decomposition of itself into program and data. If it had started with a different decomposition, it would have been a different agent, and so, unsurprisingly, might have made different decisions.

This is the reason why I don't think decision theory is that fundamental to AI. Suppose you have an AI, how should it decide what decision problem it is facing. E.g. what its options are. In reality the choice is never as stark as just one or two boxing. We are often exhorted to think outside the box, if you'll pardon the pun.

It is not as if we humans get told what we are supposed to be doing or deciding between in this life. We have to invent it for ourselves.

Please do not ever create an AI capable of recursively self improvement. 'Thinking outside the box' is a *bug*.

Systems without the ability to go beyond the mental model their creators have (at a certain point in time), are subject to whatever flaws that mental model possesses. I wouldn't classify them as full intelligences.

I wouldn't want a flawed system to be the thing to guide humanity to the future.

Systems without the ability to go beyond the mental model their creators have (at a certain point in time), are subject to whatever flaws that mental model possesses.

Where does the basis for deciding something to be a flaw reside?

In humans? No one knows. My best guess at the moment for the lowest level of model choice is some form of decentralised selectionist system, that is much as decision theoretic construct as real evolution is.

We do of course have higher level model choosing systems that might work on a decision theoretic basis, but they have models implicit in them which can be flawed.

Improving the mental model is right there at the centre of the box. Creating a GAI that doesn't operate according to some sort of decision theory? That's, well, out of the box crazy talk.

We might be having different definitions of thinking outside of the box, here.

Are you objecting to the possibility of a General intelligence not based on a decision theory at its foundation, or do you just think one would be unsafe?

Do you think us humans are based on some form of decision theory?

Are you objecting to the possibility of a General intelligence not based on a decision theory at its foundation, or do you just think one would be unsafe?

Unsafe.

Do you think us humans are based on some form of decision theory?

No. And I wouldn't trust a fellow human with that sort of uncontrolled power.

My understanding is that TDT and UDT are supposed to be used by an agent that we design. In all likelihood, *we* will have decomposed the agent into program and data in the process of designing it. When the agent starts to use the decision theory, it can take that decomposition as given.

This consideration applies to ourselves, insofar as we have a hand in designing ourselves.

My understanding is that TDT and UDT are supposed to be used by an agent that we design. In all likelihood, we will have decomposed the agent into program and data in the process of designing it.

Reading this statement, it comes across as quite objectionable. I think that this is because dividing something into program and data seems it cannot be done in a non-arbitrary manner--many programming languages don't distinguish between code and data, and a universal Turing machine must interpret its input as program at some point.

Perhaps one could have as special "how to write a program" decision theory, but that would not be a general decision theory applicable to all other decisions.

Isn't this like criticizing Bayesianism because it doesn't tell you how to select your initial prior? For practical purposes, that doesn't matter because you *already have* a prior; and once you have a prior, Bayesianism is enough to go on from there.

Similarly, you already decompose at least some part of yourself into program and data (don't you?). This is enough for that part of yourself to work with these decision theories. And using them, you can proceed to decide how to decompose the rest of yourself, or even to reflect on the original decomposition and choose a new one.

*The following is slightly tongue in cheek, but I don't normally place a stable boundary between program and data on myself, I revise it depending on purpose. The following is one view I find useful sometimes*

Nope, I'm all program. What you would call data is just programming in weaker languages than Turing complete ones. I can rewrite my programming, do meta analysis on it.

The information streaming into my eyes is a program that I don't know what it will make me do, it could make me flinch or it change the conceptual way that I see the world. The visual system is just an interpreter for the programming optical signals.

"Prior" is like a get out of jail card. Whenever the solution to some problem turns out to conveniently depend on an unknown probability distribution, you can investigate further, or you can say "prior" and stop there. For example, the naive Bayesian answer to game theory would be "just optimize based on your prior over the enemy's actions", which would block the route to discovering Nash equilibria.

It's true that it's worthwhile to investigate where priors ought to come from. My point is only that you can still put Bayesianism to work even before you've made such investigations.

To reason sensibly under abnormal conditions like mind copying or dealing with perfect predictors, we first separate the things that we control and which can be kept constant through all the situations in the problem (the program) from the things which we can't control or which vary in the course of the problem (the state/data).

Which things we want to hold constant and which things vary depend on the problem we're considering. If we're trying to choose a strategy for a game - the Iterated Prisoner's Dilemma, for example - then the program is a complete strategy which we assume is memorized before the beginning and followed perfectly, and the state is all the observations made between the start of the game and some decision point within it. Problems may force us to move things that are normally part of the program into the state, by taking them out of our control. For example, when reasoning about how a company should act in relation to a market, we treat everything that decides what the corporation does as a black box program, and the observations it makes of the market as its input data. If internal politics matter, then we have to narrow the black-boxing boundary to only ourselves. If we're worried about procrastination, then we draw the boundary inside our own mind.

Whether something is Program or Data is not a property of the object itself, but rather of how we reason about it. If it can be fully modeled as a black box function, then it's part of the program; otherwise it's data.

While I agree that humans can be decomposed into a number of program/data systems, I think there a decomposition that preserves our 'self's.

This composition is where our conscious minds is the program and our memories and sensory input is the data.

In other words, we can be copied into another substrate that has different program/data layers as long as this is maintained we will still think that our 'self' is preserved.

Note that I am not suggesting that we have a preference for preserving the 'self' in this fashion, but that we will have a subjective experience of continuity.

Disposition-Based Decision Theory has the feature that it requires a conceptual split between an initial state and subsequent activity as well. However, exactly where the initial state snapshot is taken turn out not to be critical - provided it is "early enough" - and the paper introducing DBDT describes what "early enough" means.

I've been trying for a while to make sense of the various alternate decision theories discussed here at LW, and have kept quiet until I thought I understood something well enough to make a clear contribution. Here goes.

You simply cannot reason about what to do by referring to what program you run, and considering the other instances of that program, for the simple reason that:

there is no unique program that corresponds to any physical object.Yes, you can think of many physical objects O as running a program P on data D, but there are many many ways to decompose an object into program and data, as in O = <P,D>. At one extreme you can think of every physical object as running exactly the same program, i.e., the laws of physics, with its data being its particular arrangements of particles and fields. At the other extreme, one can think of each distinct physical state as a distinct program, with an empty unused data structure. Inbetween there are an astronomical range of other ways to break you into your program P and your data D.

Eliezer's descriptions of his "Timeless Decision Theory", however refer often to "the computation" as distinguished from "its input" in this "instantiation" as if there was some unique way to divide a physical state into these two components. For example:

The one-sentence version is: Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.

The three-sentence version is: Factor your uncertainty over (impossible) possible worlds into a causal graph that includes nodes corresponding to the unknown outputs of known computations; condition on the known initial conditions of your decision computation to screen off factors influencing the decision-setup; compute the counterfactuals in your expected utility formula by surgery on the node representing the logical output of that computation.

And also:

Timeless decision theory, in which the (Godelian diagonal) expected utility formula is written as follows: Argmax[A in Actions] in Sum[O in Outcomes](Utility(O)*P(this computation yields A []-> O|rest of universe)) ... which is why TDT one-boxes on Newcomb's Problem - both your current self's physical act, and Omega's physical act in the past, are logical-causal descendants of the computation, and are recalculated accordingly inside the counterfactual. ... Timeless decision theory can state very definitely how it treats the various facts, within the interior of its expected utility calculation. It does not update any physical or logical parent of the logical output - rather, it conditions on the initial state of the computation, in order to screen off outside influences; then no further inferences about them are made.

These summaries give the strong impression that one cannot use this decision theory to figure out what to decide until one has first decomposed one's physical state into one's "computation" as distinguished from one's "initial state" and its followup data structures eventually leading to an "output." And since there are many many ways to make this decomposition, there can be many many decisions recommended by this decision theory.

The advice to "choose as though controlling the logical output of the abstract computation you implement" might have you choose as if you controlled the actions of all physical objects, if you viewed the laws of physics as your program, or choose as if you only controlled the actions of the particular physical state that you are, if every distinct physical state is a different program.