# My Take on a Decision Theory

by ygert4 min read9th Jul 201330 comments

# 4

Personal Blog

Finding a good decision theory is hard. Previous attempts, such as Timeless Decision Theory, work, it seems, in providing a stable, effective decision theory, but are mathematically complicated. Simpler theories, like CDT or EDT, are much more intuitive, but have deep flaws. They fail at certain problems, and thus violate the maxim that rational agents should win. This makes them imperfect.

But it seems to me that there is a relatively simple fix one could make to them, in the style of TDT, to extend their power considerably. Here I will show an implementation of such an extension of CDT, that wins on the problems that classic CDT fails on. It quite possibly could turn out that this is not as powerful as TDT, but it is a significant step in that direction, starting only from the naivest of decision theories. It also could turn out that this is nothing more than a reformulation of TDT or a lesser version thereof. In that case, this still has some value as a simpler formulation, easier to understand. Because as it stands, TDT seems like a far cry from a trivial extension of the basic, intuitive decision theories, as this hopes to be.

We will start by remarking that when CDT (or EDT) tries to figure out the expected value or a action or outcome, the naive way which it does so drops crucial information, which is what TDT manages to preserve. As such, I will try to calculate a CDT with this information not dropped. This information is, for CDT, the fact that Omega has simulated you and figured out what you are going to do. Why does a CDT agent automatically assume that it is the "real" one, so to speak? This trivial tweak seems powerful. I will, for the purpose of this post, call this tweaked version of CDT "Simulationist Causal Decision Theory", or SCDT for short.

Let's run this tweaked version though Newcomb's problem. Let Alice be a SCDT agent. Before the problem begins, as is standard in Newcomb's problem, Omega looks at Alice and calculates what choice Alice will make in the game. Without to much loss of generality, we can assume that Omega directly simulates Alice, and runs the simulation through the a simulation of the game, in order make the determination of what choice Alice will make. In other formulations of Newcomb's problem, Omega figures in out some other way what Alice will do, say by doing a formal analysis of her source code, but that seems intuitively equivalent. This is a possible flaw, but if the different versions of Newcomb's problem are equivalent (as they seem to be) this point evaporates, and so we will put it aside for now, and continue.

We will call the simulated agent SimAlice. SimAlice does not know, of course, that she is being simulated, and is an exact copy of Alice in all respects. In particular, she also uses the same SCDT thought processes as Alice, and she has the same utility function as Alice.

So, Alice (or SimAlice, she doesn't know which one she is) is presented with the game. She reasons thusly:

There are two possible cases: Either I am Alice or I am SimAlice.

• If I am Alice: Choosing both boxes will always get me exactly \$1000 more then choosing just one. Regardless of whether or not there is \$1,000,000 in box 2, by choosing box 1 as well, I am getting an extra \$1000. (Note that this is exactly the same reasoning standard CDT uses!)
• If I am SimAlice: Then "I" don't actually get any money in this game, regardless of what I choose. But my goal is not SimAlice getting money it is is Alice getting money, by the simple fact that this is what Alice wants, and we assumed above that SimAlice uses the same utility function as Alice.And depending what I choose now, that will affect the way Omega sets up the boxes, and so affects the amount of money Alice will get. Specifically, if I one box, Omega will put an extra \$1,000,000 in box 2, and so Alice will get an extra \$1,000,000, no matter what she chooses. (Because in both the choices Alice could make (taking either box 2 or boxes 1&2), she takes box 2, and so will wind up with a bonus \$1,000,000 above what she would get if box 2 was empty, which is what would happen if SimAlice didn't two box.)
So, as I don't know whether I am Alice or SimAlice, and as there is one of each, there is a 0.5 probability of me being either one, so by the law of total expectation,
E[money|I one box]=0.5 * E[money|(I one box)&(I am Alice)] + 0.5 * E[money|(I one box)&(I am SimAlice)]
So my expected return off one boxing (above what I would get by two boxing) is 0.5 * -\$1000 + 0.5 * \$1,000,000 = \$450,000, which is positive, so I should one box.

As you can see, just by acknowledging the rules of the game, by admitting that Omega has the power to simulate her (as the rules of Newcomb's problem insist), she will one box. This is unlike a CDT agent, which would ignore Omega's power to simulate her (or otherwise figure out what she will do), and say "Hey, what's in the boxes is fixed, and my choice does not affect it". That is only valid reasoning if you know you are the "original" agent, and Alice herself uses that reasoning, but only in the case where she is assuming she is the "original". She takes care, unlike a CDT agent, to multiply the conditional expected value by the chance of the condition occurring.

This is not only limited to Newcomb's problem. Let's take a look at Parfit's Hitchhiker, another scenario CDT has trouble with. There are again two identical agents making decisions: The "real" Alice, as soon as she gets home; and the "Alice-after-she-gets-home-as simulated-by-the-driver-offering-her-a-ride, which I will again call SimAlice for short.

Conditional on an agent being Alice and not SimAlice, paying the driver loses that agent her \$100 and gains her nothing compared to refusing to pay. Conditional on an agent being SimAlice and not Alice, agreeing to pay the driver loses her nothing (as she, being a simulation, cannot give the driver real money), and gains her a trip out of the desert, and so her life. So, again, the law of total expectation gives us that the expected value of paying the driver (considering you don't know which you are), is 0.5 * -\$100 + 0.5 * (Value of Alice's life). This gives us that Alice should pay if and only if she values her life at more than \$100, which is, once again, the correct answer.

So, to sum up, we found that SCDT can not only solve Newcomb's problem, which standard CDT cannot, but also solve Parfit's Hitchhiker, which neither CDT nor EDT can do. It does so at almost no cost in complexity compared to CDT, unlike, say, TDT, which is rather more complex. In fact, I kind of think that it is entirely possible that this SCDT is nothing more than a special case of something similar to TDT. But even if it is, it is a very nice, simple, and relatively easy to understand special case, and so may deserve a look for that alone.

There are still open problems for SCDT. If, rather than a simulation, you are analysed in a more direct way, should that change anything? What if, in Newcomb's problem, Omega simulates many simulations of you in parallel? Should that change the weights you place on the expected values? This ties in deeply with the philosophical problem of how you assign measure to identical, independent agents. I can not give a simple answer, and a simple answer to those questions is needed before SCDT is complete. But, if we can figure out the answer to these questions, or otherwise bypass them, we have a trivial extrapolation of CDT, the naivest decision theory, which solves correctly most or all of the problems that trip up CDT. That seems quite worthwhile.

# 4

30 comments, sorted by Highlighting new comments since
New Comment

What do you mean by "simulated"? In Parfit's hitchhiker scenario, as I understand it, the driver isn't doing anything remotely like a full simulation of Alice; just understanding her thinking well enough to make a guess at whether he'll get paid later. In which case, why should anyone think they are equally likely to be Alice or SimAlice?

[EDITED to add:] For that matter, in Newcomb's problem it's not generally stated explicitly that simulation is involved. Maybe Omega is just really good at proving theorems about people's behaviour and can do it at a high enough level of abstraction that no simulation is needed. Or maybe Omega has a chronoscope and is just looking at the future, though it's not obvious that this case doesn't need treating differently.

So perhaps SCDT needs to be reformulated in terms of some principle to the effect that when someone or something might try to predict your actions you need to imagine yourself as being simulated. And I suspect that formalizing this ends up with it looking a lot like TDT.

Maybe Omega is just really good at proving theorems about people's behaviour and can do it at a high enough level of abstraction that no simulation is needed.

How is it not a simulation? How do you define simulation in a way that includes manipulating bits (running a program) but excludes manipulating strings according to a set of rules (proving theorems)?

Or maybe Omega has a chronoscope and is just looking at the future

This is basically isomorphic to simulation, only now Omega is included in it. How would such a device work? The opaque box is full/empty before you make your choice, so the chronotron can either run two timelines (full/empty) and then terminate the one where its guess was wrong, or run one where Omega does not place anything, look into the future, then, if needed, go back and rerun the events with the box filled, again terminating the original timeline if the guess is incorrect.

In both cases there is a "simulation" going on, only the chronotron is acting as a super-Omega.

How is it not a simulation? [...] How do you define simulation in a way that [...]

I'm not sure how to answer either question, and I'm not sure anyone has a perfectly satisfactory definition of "simulation". I'm pretty sure I've never seen one. But consider the following scenario.

Omega looks at the structure of your brain, and deduces a theorem of the following sort: For a broad class of individuals with these, and those, and these other, features, they will almost all make the one-box-or-two decision the same way. (The theorem doesn't say which way. In fact, the actual form of the theorem is: "Given features A, B and C, any two individuals for which parameters X, Y and Z are within delta will go the same way with probability at least 1-epsilon", and features A,B,C are found about as often in one-boxers as in two-boxers. Finding and proving such a theorem doesn't in itself give Omega much information about whether you're likely to one-box or two-box.) Omega then picks one of those individuals which isn't you and simulates it; then Omega knows with high confidence which way you will choose.

Let's suppose -- since so far as I can see it makes no difference to the best arguments I know of either for one-boxing or for two-boxing -- that Omega tells you all of the above. (But not, of course, which way the prediction ended up.)

In this scenario, what grounds are there for thinking that you could as easily be in Omega's simulation as in the outside world? You know that the actual literal simulation was of someone else. You know that the theorem-proving was broad enough to cover a wide class of individuals besides yourself, including both one-boxers and two-boxers. So what's going on here that's a simulation you could be "in"?

(Technical note: If you take the form I gave for the theorem too literally, you'll find that things couldn't actually be quite as I described. Seeing why and figuring out how to patch it are left as exercises for the reader.)

How would such a device work?

There's a wormhole between our present and our future. Omega looks through it and sees your lips move.

Omega then picks one of those individuals which isn't you and simulates it; then Omega knows with high confidence which way you will choose.

Seems like Omega is simulating me for the purposes which matter. The "isn't you" statement is not as trivial as it sounds, it requires a widely agreed upon definition of identity, something that gets blurred easily once you allow for human simulation. For example, how do you know you are not a part of the proof? How do you know that the statement that Omega tells you that it simulated "someone else" is even truth-evaluable? (And not, for example, a version of "this statement is false".)

There's a wormhole between our present and our future. Omega looks through it and sees your lips move.

Ah, Novikov's self-consistent time loops. Given that there is no way to construct those using currently known physics, I tend to discard those. Given how they reduce all NP-hard problems to O(1) and such, making the world too boring to live in.

In a less convenient universe, Omega tells you that it has discovered that someone's zodiac sign predicts their response to Newcomb's problem with 90% accuracy. What do you do?

You don't need any of that magic for imperfect predictors, people are pretty predictable as it is, just not perfectly so. And the effort required to improve the prediction accuracy diverges as the accuracy goes to certainty.

To clarify, I don't see why SCDT would one-box in the zodiac setting.

Why do you stipulate 90% accuracy, not 100%?

Because 100% accuracy would just be weird.

They fail at certain problems, and thus violate the maxim that rational agents should win. This makes them imperfect.

Surely there's something No Free Lunch-like that says a decision theory can't possibly work for all situations. Though obviously, some can be better for a given domain than others.

Here is an example:

Omega: Here are the rules, make your choice.
Decision Theory: makes the choice.
Omega: Actually, I lied. You get the opposite of what I told you, so now you have lost.

Obviously, from the set of decision theories assuming that Omega never lies, the better decision theory gets worse results in this situation.

Even worse example:

Omega: We have two boxes and... oh, I realized I don't like your face. You lose.

For each decision theory there can be an Omega disliking this specific theory, and then this theory does not win.

So, does the No Free Lunch-like theorem only predict these results, or something stronger than this?

Omega: Here are two opaque boxes. One contains \$1, one contains \$1M. You can open one box and keep whatever is in it. I have predicted your decision and put the big money in the other box. I've played this game a lot and haven't lost the million yet.

This reminds me of the non-existence of a perfect encryption algorithm, where an encryption algorithm is a bijective map S -> S, where S is the set of finite strings on a given alphabet. The image of strings of length at most n cannot lie in strings of length at most n-1, so either no string gets compressed (reduced in length) or there will be some strings that will become longer after compression.

I think this can be solved in practice by heeding the assumption that a very sparse subset of all such strings will be mapped by our encryption algorithm when embedded physically. Then if we low-dimensionally parametrize hash functions of the form above, we can store the parameters for choosing a suitable hash function along with the encrypted text, and our algorithm only produces compressed strings of greater length if we try to encrypt more than some constant percentage of all possible length <= n strings, with n fixed (namely, when we saturate suitable choices of parameters). If this constant is anywhere within a few orders of magnitude of 1, the algorithm is then always compressive in physical practice by finiteness of matter (we won't ever have enough physical bits to represent that percentage of strings simultaneously).

Maybe a similar argument can be made for Omega? If Omega must be made of matter, we can always pick a decision theory given the finiteness of actual Omega's as implemented in physics. Of course, there may be no algorithm for choosing the optimal decision theory if Omega is allowed to lie unless we can see Omega's source code, even though a good choice exists.

I don't understand what you're saying here. You mean you pick a has function, and then store both the hashed value and the parameters for the hash function? But hash functions are lossy.

Why does perfection in an encryption algorithm require compression? Did you mean to say "compression algorithm"?

Some notes on compression algorithms: If one defines a compression algorithm as a bijective map on all strings, and a decompression algorithm as an inverse of a compression algorithm, then according to this definition, all decompression algorithm are also compression algorithms. Suppose we apply a compression algorithm to a string, and then apply the algorithm to the result, and so on and so on. Suppose we call this the orbit of the string. Every orbit will be finite or infinite. A finite orbit will eventually come back to the original string. The length of strings in an orbit cannot be strictly monotonically decreasing, and if they are constant, then the orbit must be finite. In an infinite orbit, the length of the strings will tend towards infinity. So every compression algorithm will either simply cycle between some strings, or eventually makes things bigger.

Yes, thank you, I meant compression algorithm.

That's the precise NFL I had in mind.

I don't see how this is comparable. It's not like you're shuffling around bad decisions and good decisions into an optimal arrangement. You just try to make better decisions, and the bad decisions can go take a hike.

(edit: basically, what Wedifrid said at 2:25)

Yeah, as I noted above. It's reasonably obvious I'm not actually any sort of mathematician :-)

So, does the No Free Lunch-like theorem only predict these results, or something stronger than this?

You have it. (Roughly speaking.)

Of No Free Lunch relevance is the additional consideration that most decision theories perform poorly at tasks other than maximising expected utility according to the supplied values. That is, No Free Lunch tells us that decision theories are very bad at making bad decisions.

Problems where Omega is adversarial and punishes a specific decision theory are not very interesting. None of the standard problems where naive CDT/EDT fail are set up this way.

I don't have an actual theorem, I'm rambling by analogy. I was thinking of something analogous to No Free Lunch in the compression field. There is no compression algorithm that is ideal for all data, but you can say "this one is better than this one" in a given domain. e.g. bzip2 compresses smaller than gzip for almost any input (though it's more work to apply). Analogously, you shouldn't expect any decision theory to be "perfect" for all input, just better for a sensible range of inputs.

Of course, I've just realised that analogy is flawed, since the compression NFL is that no algorithm can make all possible inputs smaller, whereas for decision theories they can be highly inefficient as long as they're correct, that being the question asked. But I'm wondering if there's something NFL-like regardless.

Omega: Here are ten numbered boxes, of which you may choose one. Nine have \$100 inside, the tenth has nothing. I have guessed the response of Deterministic Algorithm A (source code available upon request) to this very problem, and have made their choice the empty box.

This problem is fair in a sense, because it does not reference the source code of the player as such. And not only is it fair, but it's a very easy problem - 9 out of 10 chance of free money. Unless you happen to be implementing a version of algorithm A, in which case you're screwed. Even a sophisticated choice-maker can lose to AlphabeticalBot, who chooses the alphabetically first option - as long as the problem is one of those no-free-lunch problems.

I don't know whether I am Alice or SimAlice, and as there is one of each, there is a 0.5 probability of me being either one

I'm having a hard time justifying the 50% here. I don't buy it at all, actually. But that does't mean SCDT shouldn't use it: if we take it to be proportional to the number of known simulations, we run into a lot of trouble (let's call this #SCDT)

Omega: In transparent box 1, here's \$1000. You also get what's in the opaque box, which is 1 cent if I predicted you would choose to only take the opaque box, otherwise it's empty. I determine this by simulating you one trillion times.

what #SCDT says:

• 1 real Alice: take the transparent box for \$1k

• 1 trillion simulated Alices: if we take only the opaque box, real Alice will get 1 more cent! \$0.01

Net outcome: Alice walks away with 1 cent.

Meanwhile, SCDT has Alice walks away with \$1k.

I suspect that by weighting them equally, you're simply allowing yourself to find whichever option works out better on the net. The 50% probability is specious.

Can you explicitly give your definition of what it means for a decision theory to "fail"?

This would have been helpful to my 11-year-old self. As I had always been rather unnecessarily called precocious, I developed the pet hypothesis that my life was a simulation of someone whose life in history had been worth re-living: after all, the collection of all possible lives is pretty big, and mine seemed to be extraordinarily neat, so why not imagine some existential video game in which I am the player character?

Unfortunately, I think this also led me to subconsciously be a little lazier than I should have been, under the false assumption that I was going to make great things anyway. If I had realized that given I was a simulation of an original version of me, I would have to perform the exact same actions and have the exact same thoughts original me did, including those about being a simulation, I better buckle up and sweat it out!

Notice your argument does not imply the following: I am either a simulation or the original, and I am far more likely to be a simulation as there can only be one original but possibly many simulations, so I should weigh my actions far more towards the latter. This line of reasoning is wrong because all simulations of me would be identical experience copies, and so it is not the quantity that decides the weight, but the number of equivalence classes: original me, and simulated me. At this point, the weights again become 0.5, one recovers your argument, and finds I should never have had such silly thoughts in the first place (even if they were true!).

Finding a good decision theory is hard. Previous attempts, such as Timeless Decision Theory, work, it seems, in providing a stable, effective decision theory, but are mathematically complicated. Simpler theories, like CDT or EDT, are much more intuitive, but have deep flaws. They fail at certain problems, and thus violate the maxim that rational agents should win. This makes them imperfect.

What's wrong with expected utility maximization - or finite approximations thereof?

Ok. I naively thought I could trip up this system by altering the probability of being a simulation or the payout, but I can't.

SCDT successfully 1 boxes on any box 2 payout larger than \$1000. SCDT successfully 1 boxes even if a single simulation is used to predict any number of actual Alices. (The scenario I worked through involved 10,000 duplicate Alices being predicted by a single Alice simulation.)

I'm thoroughly impressed by this decision theory.

This information is, for CDT, the fact that Omega has simulated you and figured out what you are going to do. Why does a CDT agent automatically assume that it is the "real" one, so to speak?

Looks like you are arguing for reflective consistency/equilibrium. How would you deal with one-shot counterfactual mugging? One-shot PD?