# 9

Personal Blog

There has been a lot of discussion on LW about finding better decision theories. A lot of the reason for the various new decision theories proposed here seems to be an effort to get over the fact that classical CDT gives the wrong answer in 1-shot PD's, Newcomb-like problems and Parfit's Hitchhiker problem. While Gary Drescher has said that TDT is "more promising than any other decision theory I'm aware of ", Eliezer gives a list of problems in which his theory currently gives the wrong answer (or, at least, it did a year ago). Adam Bell's recent sequence has talked about problems for CDT, and is no doubt about to move onto problems with EDT (in one of the comments, it was suggested that EDT is "wronger" than CDT).

In the Iterated Prisoner's Dilemma, it is relatively trivial to prove that no strategy is "optimal" in the sense that it gets the best possible pay-out against all opponents. The reasoning goes roughly like this: any strategy which ever cooperates does worse than it could have against, say, Always Defect. Any strategy which doesn't start off with cooperate does worse than it could have against, say Grim. So, whatever strategy you choose, there is another strategy that would do better than you against some possible opponent. So no strategy is "optimal". Question: is it possible to prove similarly that there is no "optimal" Decision Theory? In other words - given a decision theory A, can you come up with some scenario in which it performs worse than at least one other decision theory? Than any other decision theory?

One initial try would be: Omega gives you two envelopes - the left envelope contains \$1 billion iff you don't implement decision theory A in deciding which envelope to choose. The right envelope contains \$1000 regardless.

Or, you might not like Omega being able to make decisions about you based entirely on your sourcecode (or "ritual of cognition"), then how about this:in order for two decision theories to sensibly be described as "different", there must be some scenario in which they perform a different action (let's call this Scenario 1). In Scenario 1, DT A makes decision A whereas DT B makes decision B. In Scenario 2, Omega offers you the following setup: here are two envelopes, you can pick exactly one of them. I've just simulated you in Scenario 1. If you chose decision B, there's \$1,000,000 in the left envelope. Otherwise it's empty. There's \$1000 in the right envelope regardless.

I'm not sure if there's some flaw in this reasoning (are there decision theories for which Omega offering such a deal is a logical impossibility? It seems unlikely: I don't see how your choice of algorithm could affect Omega's ability to talk about it). But I imagine that some version of this should work - in which case, it doesn't make sense to talk about one decision theory being "better" than another, we can only talk about decision theories being better than others for certain classes of problems.

I have no doubt that TDT is an improvement on CDT, but in order for this to even make sense, we'd have to have some way of thinking about what sort of problem we want our decision theory to solve. Presumably the answer is "the sort of problems which you're actually likely to face in the real world". Do we have a good formalism for what this means? I'm not suggesting that the people who discuss these questions haven't considered this issue, but I don't think I've ever seen it explicitly addressed. What exactly do we mean by a "better" decision theory?

# 9

New Comment

Here's a simplified version of your second counterexample:

Omega appears and asks you which colour you like better, red or blue. If you chose the same colour that Omega happens to like, you get a million dollars, otherwise zero.

Obviously, your decision in this ridiculous scenario depends on your prior for meeting Omegas who like red vs. Omegas who like blue. Likewise, in your original counterexample, your action in scenario 1 should depend on your prior for encountering scenario 1 vs. scenario 2.

So yeah, this is a pretty big flaw in UDT that I pointed out sometime ago on the workshop list, and then found out that Caspian nailed it even earlier in the comments to Nesov's original post on Counterfactual Mugging. The retort "just use priors" may or may not be satisfactory to you. It's certainly not completely satisfactory to me, so I'd like a decision theory that doesn't require anything beyond "local" descriptions of scenarios. Presumably, such a theory would win in Scenario 1 and lose in Scenario 2, which may or may not be what we want.

Which decision theory should we use? CDT? UDT? TDT? What exactly do we mean by a "better" decision theory?

To get some practice in answering this kind of question, lets look first at a simpler set of questions: Which play should I make in the game PSS? Paper? Stone? Scissors? What exactly do we mean by a better play in this game?

Bear with me on this. I think that a careful look at the process that game theorists went through in dealing with game-level questions may be very helpful in our current confusion about decision-theory-level questions.

The first obvious thing to notice about the PSS problem is that there is no universal "best" play in the game. Sometimes one play ("stone", say) works best; sometimes another play works better. It depends on what the other player does. So we make our first conceptual breakthrough. We realize we have been working on the wrong problem. It is not "which play produces the best results?". It is rather "which play produces the best expected results?" that we want to ask.

Well, we are still a bit puzzled by that new word "expected", so we hire consultants. One consultant, a Bayesian/MAXENT theorist tells us that the appropriate expectation is that the other player will play each of "paper", "stone", and "scissors" equally often. And hence that all plays on our part are equally good. The second consultant, a scientist, actually goes out and observes the other player. He comes back with the report that out of 100 PSS games, the other player will play "paper" 35 times, "stone" 32 times, and "scissors" 33 times. So the scientist recommends the play "scissors" as the best play. Our MAXENT consultant has no objection. "That choice is no worse than any other", says he.

So we adopt the strategy of always playing scissors, which works fine at first, but soon starts returning abysmal results. The MAXENT fellow is puzzled. "Do you think maybe the other guy found out about our strategy?" he asks. "Maybe he hired our scientist away from us. But how can we possibly keep our strategy secret if we use it more than once?" And this leads to our second conceptual breakthrough.

We realize that it is both impossible and unnecessary to keep our strategy secret (just as cryptographer knows that it is difficult and unnecessary to keep the encryption algorithm secret. But it is both possible and essential to keep the plays secret until they are actually made (just as a cryptographer keeps keys secret). Hence, we must have mixed strategies where the strategy is a probability distribution and a play is a one-point sample from that distribution.

Take a step back and think about this. Non-determinism of agents is an inevitable consequence of having multiple agents whose interests are not aligned (or more precisely, agents whose interests cannot be brought into alignment by a system of side payments). Lesson 1: Any decision theory intended to work in multi-agent situations must handle (i.e. model) non-determinism in other agents. Lesson 2: In many games, the best strategy is a mixed strategy.

Think some more. Agents whose interests are not aligned often should keep secrets from each other. Lesson 3: Decision theories must deal with secrecy. Lesson 4: Agents may lie to preserve secrets.

But how does game theory find the best mixed strategy? Here is where it gets weird. It turns out that, in some sense, it is not about "winning" at all. It is about equilibrium. Remember back when we were at the PSS stage where we thought that "Always play scissors" was a good strategy? What was wrong with this, of course, was that it induced the other player to switch his strategy toward "Always play stone" (assuming, of course, that he has a scientist on his consulting staff). And that shift on his part induces (assuming we have a scientist too) us to switch toward paper.

So, how is this motion brought to a halt? Well, there is one particular strategy you can choose that at least removes the motivation for the motion. There is one particular mixed strategy which makes your opponent not really care what he plays. And there is one particular mixed strategy that your opponent can play which makes you not really care what you play. So, if you both make each other indifferent, then neither of you has any particular incentive to stop making each other indifferent, so you both just stick to the strategy you are currently playing.

This is called Nash equilibrium. It also works on non-zero sum games where the two players' interests are not completely misaligned. The decision theory at the heart of Game Theory - the source of eight economics Nobel prizes so far - is not trying to "win". Instead, it is trying to stop the other player from squirming so much as he tries to win. Swear to God. That is the way it works.

Alright, in the last paragraph, I was leaning over backward to make it look weird. But the thing is, even though you no longer look like you are trying to win, you still actually do as well as possible, assuming both players are rational. Game theory works. It is the right decision theory for the kinds of decisions that fit into its model.

So, was this long parable useful in our current search for "the best decision theory"? I guess the answer to that must depend on exactly what you want a decision theory to accomplish. My intuition is that Lessons #1 through #4 above cannot be completely irrelevant. But I also think that there is a Lesson 5 that arises from the Nash equilibrium finale of this story. The lesson is: In any optimization problem with a multi-party optimization dynamics to it, you have to look for the fixpoints.

There's probably no single-player decision theory that, if all players adopted it, would lead to Nash equilibrium play in all games. The reason is that many games have multiple Nash equilibria, and equilibrium selection (aka bargaining) is often "indeterminate": it requires you to go outside the game and look at the real-world situation that generated it.

Here on LW we know how to implement "optimal" agents, who cooperate with each other and share the wins "fairly" while punishing defectors, in only two cases: symmetric games (choosing the Pareto-best symmetric correlated equilibrium), and games with transferable utility (using the Shapley value). The general case of non-symmetric games with non-transferable utility is still open. I'm very skeptical that any single-player decision theory can solve it, and have voiced my skepticism many times.

It has been a while since I looked at Aumann's Handbook, and I don't have access to a copy now, but I seem to recall discussion of an NTU analog of the Shapley value. Ah, I also find it in Section 9.9 of Myerson's textbook. Perhaps the problem is they don't collectively voluntarily punish defectors quite as well you would like them to. I'm also puzzled by your apparent restriction of correlated equilibria to symmetric games. You realize, of course, that symmetry is not a requirement for a correlated equilibrium in a two-person game?

It wasn't my intention, at least in this posting, to advocate standard Game Theory as the solution to the FAI decision theory question. I am not at all sure I understand what that question really is. All I am doing here is pointing out the analogy between the "best play" problem and the "best decision theory" metaproblem.

I'm also puzzled by your apparent restriction of correlated equilibria to symmetric games. You realize, of course, that symmetry is not a requirement for a correlated equilibrium in a two-person game?

Yes, I realize that. The problem lies elsewhere. When you pit two agents using the same "good" decision theory against each other in a non-symmetric game, some correlated play must result. But which one? Do you have a convention for selecting the "best" correlated equilibrium in an arbitrary non-symmetric game? Because your "good" algorithm (assuming it exists) will necessarily give rise to just such a convention.

About values for NTU games: according to my last impressions, the topic was a complete mess and there was no universally agreed-upon value. Unlike the TU case, there seems to be a whole zoo of competing NTU values with different axiomatic justifications. Maybe our attempts to codify "good" algorithms will someday cut through this mess, but I don't yet see how.

Do you have a convention for selecting the "best" correlated equilibrium in an arbitrary non-symmetric game?

What is wrong with the Nash bargaining solution (with threats)? Negotiating an acceptable joint equilibrium is a cooperative game. It is non-cooperatively enforceable because you limit yourself to only correlated equilibria rather than the full Pareto set of joint possibilities.

I must be missing something. You are allowing communication and (non-binding) arbitration, aren't you? And a jointly trusted source of random numbers.

Um, maybe it's me who's missing something. Does the Nash bargaining solution uniquely solve all games? How do you choose the "disagreement point" used for defining the solution, if the game has multiple noncooperative equilibria? Sorry if I'm asking a stupid question.

Nash's 1953 paper covers that, I think. Just about any game theory text should explain. Look in the index for "threat game". In fact, Googling on the string "Nash bargaining threat game" returns a host of promising-looking links.

Of course, when you go to extend this 2-person result to coalition games, it gets even more complicated. In effect, the Shapley value is a weighted average of values for each possible coalition structure, with the division of spoils and responsibilities within each coalition also being decided by bargaining. The thing is, I don't see any real justification for the usual convention of giving equal weights to each possible coalition. Some coalitions seem more natural to me than others - one most naturally joins the coalitions with which one communicates best, over which one has the most power to reward and punish, and which has the most power over oneself. But I'm not sure exactly how this fits into the math. Probably a Rubinstein-style answer could be worked out within the general framework of Nash's program.

Well, sorry. You were right all along and I'm a complete idiot. For some reason my textbook failed to cover that, and I never stumbled on that anywhere else.

(reads paper, goes into a corner to think)

Does this paper have what you're looking for? I'm not in the office, so can't read it at the moment - and might not be able to anyway, as my university's subscriptions tend not to include lots of Science Direct journals - it does at least seem to provide one plausible answer to your question.

(no idea if that link will work - the paper is Bargained-Correlated Equilibria by Tedeschi Piero)

Thanks a lot. RobinZ sent me the paper and I read it. The key part is indeed the definition of the disagreement point, and the reasoning used to justify it is plausible. The only sticky issue is that the disagreement point defined in the paper is unattainable; I'm not sure what to think about that, and not sure whether the disagreement point must be "fair" with respect to the players.

The "common priors" property used in the paper gave me the idea that optimal play can arise via Aumann agreement, which in turn can emerge from individually rational behavior! This is really interesting and I'll have to think about it.

The abstract looks interesting, but I can't access the paper because I'm a regular schmuck in Russia, not a student at a US university or something :-)

I have it. PM me with email address for PDF.

Done!

I don't think this counterexample is actually a counterexample. When you-simulation decides in Scenario 1, he has no knowledge of Scenario 2. Yes, if people respond in arbitrary and unexpected ways to your decisions, this sort of thing can easily be set up; but ultimately the best you can do is to maximize expected utility. If you lose due to Omega pulling such a move on you, that's due to your lack of knowledge and bad calibration as to his probable responses, not to a flaw in your decision theory. If you-simulation somehow knew what the result would be used for, he would choose taking that into account.

In the Iterated Prisoner's Dilemma, it is relatively trivial to prove that no strategy is "optimal" in the sense that it gets the best possible pay-out against all opponents. ... So no strategy is "optimal".

In order to reach this conclusion, you have to eliminate the possibility of making inferences about the other player's decision. When you factor in the probability of various other competing decision theories (which you can get either from that inference, or from a sensible prior like an Occamian one), this objection no longer applies, as there will be some set of strategies which is, on average, optimal.

Note that when you e.g. recognize the symmetry between your position and the other player's, that is an inference about the other's likely decision!

Or, you might not like Omega being able to make decisions about you based entirely on your sourcecode (or "ritual of cognition"), then how about this:in order for two decision theories to sensibly be described as "different", there must be some scenario in which they perform a different action (let's call this Scenario 1).

I'm not sure if there's some flaw in this reasoning ...

One flaw I see is this: in order to create a Scenario 1, you would have to arbitrarily reject from consideration some decision theories or otherwise predicate the result on choice-irrelevant details of the ritual of cognition.

The reason for this is that a decision theory can contain some element (as TDT, UDT, and folk decision theory) that recognizes the connection (in Drescher's terms, a subjunctive means-end link) between making decision A and getting less money. In order for that decision theory to nevertheless choose A, you would have to put some arbitrary restriction on its choices that unfairly disadvantages A against B from the beginning. This would make the comparison between the two irrelevant.

In order to make inferences about the other players decision, you have to have a probability distribution for them, and then do a complicated integral.

The point is that there is no strictly optimal strategy, which means that your "generally optimal" strategy can do really abysmally against the wrong probability distribution. Call that probability distribution "Newcomb's problem" and the point that I see is that you might have an "optimal decision theory" which fails to be "rational."

If you know all of the rules, as in Newcomb's problem, then you can know how to react optimally. If you fail to incorporate your knowledge of Omega's capabilities, you're not acting optimally, and you can do better.

But you don't have absolute knowledge of Omega, you have a probability estimate of whether ve is, say, omniscient versus stalking you on the internet, or on the other hand if ve even has a million dollars to put in the other box.

The sort of newcomb- (or kavka-) like problem you might expect to run into on the street hinges almost entirely on the probability that there's a million dollars in the box. So if you're trying to create a decision theory that gives you optimal action assuming the probability distributions of real life I don't see how optimizing for a particular, uncommon, problem (where other problems are more common and might be decided differently!) will help out with being rational on hostile hardware.

If we spend TOO MUCH time preparing for the least convenient possible world we may miss out on the real world.

On this point, I've going to have to agree with what EY said here (which I repeated here).

In short: Omega's strategy and its consequences for you are not, in any sense, atypical. Omega is treating you based upon what you would do, given full (or approximate) knowledge of the situation. This is quite normal: people do in fact treat you differently based upon estimation of "what you would do", which is also known as your "character".

Your point would be valid if Omega were basing the reward profile on your genetics, or how you got to your decision, or some other strange factor. But here, Omega is someone who just bases its treatment of you on things that are normal to care about in normal problems.

You're just emphasizing the fact that you have full knowledge of the situation.

I currently believe, that if I ever am in a position where I believe myself to be confronted with Newcomb's problem, no matter how convinced I am at that time, it will be a hoax in some way; for example, Omega has limited prediction capability or there isn't actually \$1 million in the box.

I'm not saying "you should two-box because the money is already in there" I'm saying "maybe you should JUST take the \$1000 box because you've seen that money and if you don't think ve's lying you're probably hallucinating."

True: you will probably never be in the epistemic state in which you will justifiably believe you are in Newcomb's problem. Nevertheless, you will frequently be in probabilistic variants of the problem, and a sane decision theory that wins on those cases will have the implication that it should one-box when you take the limit of all variables as they go to what they need to be to make it the literal Newcomb's problem.

I got wrapped up in writing this comment and forgot about the larger context; my point is that it may be necessary (in the least convenient possible world) to choose a decision theory that does poorly on Newcomb's problem but better elsewhere, given that Newcomb's problem is unlikely to occur and similar-seeming but more common problems give better results with a different strategy.

So like the original post, I ask why Newcomb's problem seems to be (have been?) driving discussions of decision theory? Is it because this is the easiest place to make improvements, or because it's fun to think about?

I'm interested in what decision theory is the best under the true distribution of future histories that lay ahead.

(Unrealistic) hypothetical scenarios that make A win over B don't interest me, except that I'm confident we can manufacture them - after all, the decision theories are fixed in advance, and then we just make up some arbitrary rules under which they fail (differently from one another).

I'm interested in what decision theory is the best under the true distribution of future histories that lay ahead.

It appears you are not a Bayesian. There is no "true distribution". Probability is subjective. The distribution depends on what is known. What is known will change (Some future histories will drop out of the "future history space".) Which, of course, is the problem with choosing decision theories based on performance over some problem set (or rather problem measure space).

I do agree with you though, about how silly the present search for a decision theory looks from the outside. I would be charitable and suggest that the "search" is a fiction imposed for the sake of effective pedagogy, but that would, I fear, be wishful thinking on my part.

I definitely know that we don't know the future, and further, that we don't know the true distribution. Nonetheless, that's still what I'm most interested in (I'll settle for approximations).

I've just simulated you in Scenario 1. If you chose decision B, there's \$1,000,000 in the left envelope. Otherwise it's empty. There's \$1000 in the right envelope regardless.

If you want to make every decision theory fall into the trap, other than the most arbitrary and insane, simply have the S2 basis be:

If, during scenario 1, you recited the square ordinality primes (ie. 1st prime, 4th prime, 9th prime, etc.) up to the 100th prime, the envelope will contain \$1 billion

What exactly do we mean by a "better" decision theory?

One which comes up with the optimal result on average, given the available information, even if (in some strands of reality) it fails to give the optimal result due to being in a particular scenario that it's user was unaware of

ie. the best decision theory would still fail to get the best result for the "simulate you in an unrelated scenario and base your reward on something completely irrelevant"

It would also fail to get the best result in the "you can have a 20% chance of winning 2 million utilons OR an 80% chance of winning 0.2 utilons" if the 20% chance failed to come up, and the 80% chance did.

Regarding your final paragraph, game theorist would point out that with efficient insurance markets, and for a small fee, you would be able to cash-in that lottery-ticket in exchange for a surefire 400,000.16 utilons

the sort of problems which you're actually likely to face in the real world". Do we have a good formalism for what this means?"

Cellular automata represent a nice abstract model of the world - many of them exhibit key features like locality, reversibiity, massive parallelism, Occam's razor, universal computation, self-replication, 2lot, etc.

However - unlike the "real" world - we can model them exactly - and simulate and test in them freely. They make good universe-modelling material.

I know a lot of cellular automata are Turing complete, which makes them about as useful (in principle) as anything, but I fail to see what a game of Life teaches us about decision-making (aside from reminding us of the about the determinism of the overall system), or about what we'll run into in the real world. Does a glider gun one-box? Does it run into Omega more often than No-mega?

I'm just unsure how you mean for this to relate to the problem.

It is intelligent agents - not gliders - that make decisions.

Right, I just wouldn't expect trying to model decision making agents in cellular automata to be any more illuminating (and certainly not any easier) than more conventional (everyday level) ways of figuring out what the world's most frequent problems are. I understand they make good universe-modelers in theory, but I don't see them being useful here for the same reason I don't resort to QM for figuring out optimal horseshoe-tossing techniques. Too much work to get all the way back up to the everyday level.

The question was whether we have a formalism for which problems are most likely to come up.

Cellular automata represent an elegant theoretical model for very many questions about things that are likely to happen in spatialised, reversible, local universes - like our own.

The question, as you quoted it, was whether we have a "good" formalism for this.

I would define "good" in this context as something like "useful for solving the problem at hand". If you would define it simply as "elegant", then I suppose we weren't really disagreeing to begin with. But if you define it the same way I do, then perhaps you've just seen cellular automata do some way more impressive high-level things than I've seen them do.

Well, the ones in question are universal - and so can do all the same things that any other parallel universal system can do without very much stress.

Bentarm - you might be interested in my old post on The Limits of Moral Theory. There I discuss a related result in moral philosophy (due to Donald Regan): that there's no general way to specify what actions each individual ought to take, such that those agents who satisfy the theory (i.e. who do as they ought) thereby bring about the best (collectively possible) results. It's quite curious.

I have no doubt that TDT is an improvement on CDT, but in order for this to even make sense, we'd have to have some way of thinking about what sort of problem we want our decision theory to solve. Presumably the answer is "the sort of problems which you're actually likely to face in the real world".

If that's so, why do we spend so much time talking about Newcomb problems? Should we ban Omega from our decision theories?

Omega is relevant because AGIs might show each other their source code, at which point they gain the predictive powers, vis-a-vis each other, of Omega.

On the other hand, an AGI running CDT would self-modify to UDT/TDT if running UDT/TDT lead to better outcomes, so maybe we can leave the decision theoretic work to our AGI.

The issue there is that a 'proof' of friendliness might rely on a lot of decision theory.

If you want to build a smart machine, decision theory seems sooo not the problem.

Deep Blue just maximised its expected success. That worked just fine for beating humans.

We have decision theories. The main problem is implementing approximations to them with limited spacetime.

IMO, this is probably all to do with crazyness about provability - originating from paranoia.

Obsessions with the irrelevant are potentially damaging - due to the risks of caution.