Hello,

I've been reading through various Newcomblike Problems, in order to get a better understanding of the differences between each Decision Theory. From what I can tell, it seems like each Decision Theory gets evaluated based on whether they are able to "win" in each of these thought experiments. Thus, there is an overarching assumption that each thought experiment has an objectively "right" and "wrong" answer, and the challenge of Decision Theory is to generate algorithms that will guarantee that the agent will choose the "right" answer.

However, I am having some trouble in seeing how some of these problems have an objectively "winning" state. In Newcomb's Problem, obviously one can say that one-boxing "wins" because you get way more money than two-boxing, and these are the only two options available. Of course, even here there is some room for ambiguity, as said by Robert Nozick:

To almost everyone it is perfectly clear and obvious what should be done. The difficulty is that these people seem to divide almost evenly on the problem, with large numbers thinking that the opposing half is just being silly.

But other Newcomblike Problems leave me with some questions. Take, for example, the Smoking Lesion Problem. I am told that smoking is the winning decision here (as long as we take a suspension of disbelief from the fact that smoking is bad in the real world). But I'm not sure why that makes such a big difference. Yes, the problem states we would prefer to smoke if we could, but our preferences can come from many different dimensions such as our understanding of the environment, not just a spontaneous inner desire. So when EDT says that you shouldn't smoke because it increase the probability of having a cancerous lesion, then one could say that that information has shaped your preference. To use a different analogy, I may desire ice cream because it tastes good, but I may still prefer not to eat it out of my understanding of how it impacts my health and weight. So in other words, I'm not sure I could objectively say that a preference influenced by EDT is "not winning". Unlike Newcomb's Problem, there isn't a quantifiable value such as money to say one is objectively better than the other.

A second question comes from the Prisoner's Dilemma. There is a common notion that C,C is the "winning" outcome, and Decision Theory strives to generate algorithms that will guarantee a C,C result. But for each individual prisoner, C,C isn't the best payoff they can get. So hypothetically, there could be an undiscovered Decision Theory in which Prisoner A is tricked into cooperating, only for B to betray him, resulting in B achieving the optimal outcome of C,D. Wouldn't such a result objectively see B "winning", because he got a higher payoff than C,C?

The third and most baffling example is from Counterfactual Mugging. Just like in Newcomb's Problem, we have a quantifiable value of money to track which decision is best. However, in this case I understand the "winning" result is to pay the mugger, despite the fact you become $100 poorer and gain nothing. I understand the general concept, that an updateless system factors in the counterfactual timeline where you could have gotten $10,000. What I don't understand is the seeming inconsistency, where Newcomb's problem defines "winning" by gaining the most money, which clearly doesn't apply here. How can we objectively say that refusing to pay the mugger is "not winning"?

If our definition of "winning" is being shaped by the branch of Decision Theory that we adopt, then I worry about falling into a kind of circular logic, because I thought that the point of Logical Decision Theory is to generate policies that guarantee a winning state.

New Answer
New Comment

4 Answers sorted by

Wei Dai

Aug 09, 2022

122

There is definitely not an "objective" way to define "winning" in general, at least not yet. It's more of an intuition pump that can help make it clearer in each example what the right thing to do is (so that we can judge whether a decision theory recommends doing the right thing). To add some context, prior to Eliezer's innovation of asking what the "winning" choice is, philosophers used to ask what the "rational" choice is, which led to some to say that the "rational" choice in Newcomb's problem is to two-box and to propose decision theories like CDT that two-box. Talking about "winning" makes it clearer or more intuitive that one-box is the right thing to do. But it's not a panacea and often it's not entirely obvious what the "winning" choice is, or different people can disagree about it.

So hypothetically, there could be an undiscovered Decision Theory in which Prisoner A is tricked into cooperating, only for B to betray him, resulting in B achieving the optimal outcome of C,D. Wouldn’t such a result objectively see B “winning”, because he got a higher payoff than C,C?

Yeah, I would say an agent or decision theory that can trick other agents into the C,D outcome would be "more winning". From my perspective, we're not so much "striving to generate algorithms that will guarantee a C,C result" but instead aiming for a decision theory that exploits everyone who can be exploited and achieves C,C with everyone else (to the extent possible).

Could you please provide a simple explanation of your UDT?

JBlack

Aug 09, 2022

44

Yes, defining what "winning" means is indeed part of the problem of evaluating decision theories, and is largely the difference between many decision theories.

One way of defining "winning" is getting the most you can (on expectation) given the information you have at a given instant of time. In the Counterfactual Mugging situation where you are asked for $100, winning obviously means not paying by this definition. In the other situation you have no choices to make and so automatically "win" because in that local situation there is no better action you can take.

Another way of defining "winning" is getting the most you can (on expectation) across all scenarios weighted by their probabilities. In Counterfactual Mugging, this means paying up because the $100 loss in half the scenarios by probability is greatly outweighed by the $1000 gain in the other half.

Note that the scenario is always presented as "Omega asks you to pay up". While this is the only scenario in which you get to make a decision, it also biases perception of the problem by directing attention away from the equally prevalent scenarios in which Omega just turns up and (if you would have paid) gives you $1000 or (otherwise) tells you that you get nothing.

Jiro

Aug 09, 2022

30

The smoking lesion problem depends entirely on why people with the smoking lesion smoke more often.

If the mechanism does not affect one's ability to do reasoning, come to logical conclusions, etc., then even though people who smoke usually are more likely to have the smoking lesion, people who smoke because they followed the reasoning telling them to smoke are not more likely to have the smoking lesion and there is no problem.

If the mechanism does affect one's ability to do reasoning, then it may not even be possible for someone to logically decide what to do. (Consider an edge case where the smoking lesion makes you 100% likely to smoke, but logical reasoning tells you not to.) Or it may only be possible to do so in a subset of cases.

(Of course, "people who did X because of Y" is a simplification and should be something like "the difference in propensities for Y between people who do X and people who don't.")

Thomas Larsen

Aug 09, 2022

22

Winning here corresponds to getting the most expected utility, as measured from the start of the problem. We assume we can measure utility with money, so we can e.g. put a dollar value on getting rescued. 

1) In the smoking lesion problem, the exact numbers don't so much matter as the relationship between them: you are simply choosing whether or not to gain an amount of utility or to pass it up.   

2) Defecting on a cooperator in a one-shot true prisoners dilemma is the "best" outcome, so this is exactly right. See this story.  

3) In the hitchhiker example, we might say that we have a negative $1,000,000 value associated with being stranded. This exact amount is usually not specified, the assumption being that this is a large negative amount. 

Thus the outcomes are: 

  • We don't pay and are stranded in the desert, -$1,000,000
  • We don't pay and the driver took us back to the city, $0 
  • We would have paid and are stranded in the desert, -$1,000,000
  • We pay and the driver took us back to the city, -$100

Of course, the middle two bullet points are logically counter to the problem (assuming an open source setting), so the winning decision theories here lose only $100 as opposed to losing $1,000,000 (dying from being stranded in the desert). 

Edit: changed utility -> expected utility. 

I don't think this sorts out the fundamental fuzziness of what "winning" means -- what you need is a theory of counterfactuals

1Thomas Larsen2y
Beliefs here are weakly held, I want to become more right.  I think defining winning as coming away with the most utility is a crisp measure of what makes a good decision theory.  The theory of counterfactuals are, in my mind, what separates the decision theories themselves, and is therefore the core question/fuzziness in solving decision theory. Changing your theory of counterfactuals alters the answer of the fundamental question: "when you change your action/policy, what parts of the world change with you?".  It doesn't seem like there is a directly objective answer to this based on the mechanic -- should you change everything that's causally downstream of you? or everything that's logically dependent on your decision? or everything that's correlated with your decision? A priori these all seem basically reasonable to me, until we plug these into examples and see if they are dominated by other decision theories, as measured by expected utility.  (I think?) in examples like counterfactual mugging, the measuring stick is pretty clearly whichever decision theory gets more expected utility over the whole duration of the universe. It seems fine to lose utility where you start in the middle of the scenario (operationalized by there being any sort of entanglements to outside the scenario).  In my view, the fuzziness is in finding a well-defined way to achieve the goal of lots of expected utility, not in the goal itself. 
2Dagon2y
Over the course of the universe, the best decision theory is a consensus/multiple-evaluation theory.  Evaluate which part of the universe you're in, and what is the likelihood that you're in a causally-unusaual scenario, and use the DT which gives the best outcome.   How a predictor works when your meta-DT gives different answers based on whether you've been predicted, I don't know.  Like a lot of adversarial(-ish) situation, the side with the most predictive power wins.
5 comments, sorted by Click to highlight new comments since: Today at 4:36 PM

In acausal PD, (C, C) stands for bargaining, and bargaining in a different game could be something more complicated than carrying out (C, C). Even in PD itself, bargaining could select a different point on the Pareto frontier, a mixed outcome with some probability between (C, C) and (C, D), or between (C, C) and (D, C). So with acausal coordination, PD should play in three stages: (1) players establish willingness to bargain, which is represented by playing (C, C) in acausal PD (but not yet making moves in actual PD), (2) players run the bargaining algorithm, which let's say selects the point 0.8*(C, C) + 0.2*(D, C), and (3) a shared random number samples let's say (D, C), so the first player plays D and the second plays C.

I have a similar confusion. I thought the definition of winning is objective (and frequentist): after a large number of identically set up experiments, the winning decision is the one that gains the most value. In Newcomb's it's one-boxing, in twin prisoner dilemma it's cooperating, in other PDs it depends on the details of your opponent and on your knowledge of them, in counterfactual mugging it depends on the details of how trustworthy the mugger is, on whom it chooses to pay or charge, etc, the problem is underspecified as presented. If you have an "unfair" Omega who punishes a specific DT agent, the winning strategy is to be the Omega-favored agent.

There is no need for counterfactuals, by the way, just calculate what strategy nets the highest EV. Just like with Newcomb's, in some counterfactual mugger setups only the agents who pay when lost get a chance to win. If you are the type of agent who doesn't pay, the CFM predictor will not give you a chance to win. This is like a lottery, only you pay after losing, which does not matter if the predictor knows what you would do. Not paying when losing is equivalent to not buying a lottery ticket when the expected winning is more than the ticket price. I don't know if this counts as decision theory, probably not.

C,C is second-best, you prefer D,C and Nash says D,D is all you should expect.  C,C is definitely better than C,D or D,D, so in the special case of symmetrical decisions, it's winning.  It bugs me as much as you that this part gets glossed over so often.

Counterfactual Mugging is a win to pay off, in a universe where that sort of thing happens.  You really do want to be correctly predicted to pay off, and enjoy the $10K in those cases where the coin goes your way.

C,C is second-best, you prefer D,C and Nash says D,D is all you should expect. C,C is definitely better than C,D or D,D, so in the special case of symmetrical decisions, it's winning. It bugs me as much as you that this part gets glossed over so often.

I see what you mean, it works as long as both sides have roughly similar behavior.

Counterfactual Mugging is a win to pay off, in a universe where that sort of thing happens. You really do want to be correctly predicted to pay off, and enjoy the $10K in those cases where the coin goes your way.

For me, this would make intuitive sense if there was something in the problem that implied that Omega does this on a regular basis, analogous to the Iterated Prisoner's Dilemma. But as long as the problem is worded as a one-shot, once-in-a-lifetime scenario, then it comes off like the $10,000 is purely fictitious.

one-shot, once-in-a-lifetime scenario

It's less than that, you don't know that you are real and not the hypothetical. If you are the hypothetical, paying up is useful for the real one.

This means that even if you are the real one (which you don't know), you should pay up, or else the hypothetical you wouldn't. Winning behavior/policy is the best map from what you observe/know to decisions, and some (or all) of those observations/knowledge never occur, or even never could occur.