New Answer

New Comment

4 Answers sorted by
top scoring

Aug 09, 2022

122

There is definitely not an "objective" way to define "winning" in general, at least not yet. It's more of an intuition pump that can help make it clearer in each example what the right thing to do is (so that we can judge whether a decision theory recommends doing the right thing). To add some context, prior to Eliezer's innovation of asking what the "winning" choice is, philosophers used to ask what the "rational" choice is, which led to some to say that the "rational" choice in Newcomb's problem is to two-box and to propose decision theories like CDT that two-box. Talking about "winning" makes it clearer or more intuitive that one-box is the right thing to do. But it's not a panacea and often it's not entirely obvious what the "winning" choice is, or different people can disagree about it.

So hypothetically, there could be an undiscovered Decision Theory in which Prisoner A is tricked into cooperating, only for B to betray him, resulting in B achieving the optimal outcome of C,D. Wouldn’t such a result objectively see B “winning”, because he got a higher payoff than C,C?

Yeah, I would say an agent or decision theory that can trick other agents into the C,D outcome would be "more winning". From my perspective, we're not so much "striving to generate algorithms that will guarantee a C,C result" but instead aiming for a decision theory that exploits everyone who can be exploited and achieves C,C with everyone else (to the extent possible).

[-]aditya malik3y10

Could you please provide a simple explanation of your UDT?

JBlack

Aug 09, 2022

Yes, defining what "winning" means is indeed part of the problem of evaluating decision theories, and is largely the difference between many decision theories.

One way of defining "winning" is getting the most you can (on expectation) given the information you have at a given instant of time. In the Counterfactual Mugging situation where you are asked for $100, winning obviously means not paying by this definition. In the other situation you have no choices to make and so automatically "win" because in that local situation there is no better action you can take.

Another way of defining "winning" is getting the most you can (on expectation) across all scenarios weighted by their probabilities. In Counterfactual Mugging, this means paying up because the $100 loss in half the scenarios by probability is greatly outweighed by the $1000 gain in the other half.

Note that the scenario is always presented as "Omega asks you to pay up". While this is the only scenario in which you get to make a decision, it also biases perception of the problem by directing attention away from the equally prevalent scenarios in which Omega just turns up and (if you would have paid) gives you $1000 or (otherwise) tells you that you get nothing.

Jiro

Aug 09, 2022

The smoking lesion problem depends entirely on why people with the smoking lesion smoke more often.

If the mechanism does not affect one's ability to do reasoning, come to logical conclusions, etc., then even though people who smoke usually are more likely to have the smoking lesion, people who smoke because they followed the reasoning telling them to smoke are not more likely to have the smoking lesion and there is no problem.

If the mechanism does affect one's ability to do reasoning, then it may not even be possible for someone to logically decide what to do. (Consider an edge case where the smoking lesion makes you 100% likely to smoke, but logical reasoning tells you not to.) Or it may only be possible to do so in a subset of cases.

(Of course, "people who did X because of Y" is a simplification and should be something like "the difference in propensities for Y between people who do X and people who don't.")

Thomas Larsen

Aug 09, 2022*

Winning here corresponds to getting the most expected utility, as measured from the start of the problem. We assume we can measure utility with money, so we can e.g. put a dollar value on getting rescued.

1) In the smoking lesion problem, the exact numbers don't so much matter as the relationship between them: you are simply choosing whether or not to gain an amount of utility or to pass it up.

2) Defecting on a cooperator in a one-shot true prisoners dilemma is the "best" outcome, so this is exactly right. See this story.

3) In the hitchhiker example, we might say that we have a negative $1,000,000 value associated with being stranded. This exact amount is usually not specified, the assumption being that this is a large negative amount.

Thus the outcomes are:

We don't pay and are stranded in the desert, -$1,000,000
We don't pay and the driver took us back to the city, $0
We would have paid and are stranded in the desert, -$1,000,000
We pay and the driver took us back to the city, -$100

Of course, the middle two bullet points are logically counter to the problem (assuming an open source setting), so the winning decision theories here lose only $100 as opposed to losing $1,000,000 (dying from being stranded in the desert).

Edit: changed utility -> expected utility.

[-]Vivek Hebbar3y10

I don't think this sorts out the fundamental fuzziness of what "winning" means -- what you need is a theory of counterfactuals

1Thomas Larsen3y

Beliefs here are weakly held, I want to become more right. I think defining winning as coming away with the most utility is a crisp measure of what makes a good decision theory. The theory of counterfactuals are, in my mind, what separates the decision theories themselves, and is therefore the core question/fuzziness in solving decision theory. Changing your theory of counterfactuals alters the answer of the fundamental question: "when you change your action/policy, what parts of the world change with you?". It doesn't seem like there is a directly objective answer to this based on the mechanic -- should you change everything that's causally downstream of you? or everything that's logically dependent on your decision? or everything that's correlated with your decision? A priori these all seem basically reasonable to me, until we plug these into examples and see if they are dominated by other decision theories, as measured by expected utility. (I think?) in examples like counterfactual mugging, the measuring stick is pretty clearly whichever decision theory gets more expected utility over the whole duration of the universe. It seems fine to lose utility where you start in the middle of the scenario (operationalized by there being any sort of entanglements to outside the scenario). In my view, the fuzziness is in finding a well-defined way to achieve the goal of lots of expected utility, not in the goal itself.

2Dagon3y

Over the course of the universe, the best decision theory is a consensus/multiple-evaluation theory. Evaluate which part of the universe you're in, and what is the likelihood that you're in a causally-unusaual scenario, and use the DT which gives the best outcome. How a predictor works when your meta-DT gives different answers based on whether you've been predicted, I don't know. Like a lot of adversarial(-ish) situation, the side with the most predictive power wins.

Rendering 5/6 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 6:41 PM

[-]Vladimir_Nesov3y31

In acausal PD, (C, C) stands for bargaining, and bargaining in a different game could be something more complicated than carrying out (C, C). Even in PD itself, bargaining could select a different point on the Pareto frontier, a mixed outcome with some probability between (C, C) and (C, D), or between (C, C) and (D, C). So with acausal coordination, PD should play in three stages: (1) players establish willingness to bargain, which is represented by playing (C, C) in acausal PD (but not yet making moves in actual PD), (2) players run the bargaining algorithm, which let's say selects the point 0.8*(C, C) + 0.2*(D, C), and (3) a shared random number samples let's say (D, C), so the first player plays D and the second plays C.

[-]Shmi3y20

I have a similar confusion. I thought the definition of winning is objective (and frequentist): after a large number of identically set up experiments, the winning decision is the one that gains the most value. In Newcomb's it's one-boxing, in twin prisoner dilemma it's cooperating, in other PDs it depends on the details of your opponent and on your knowledge of them, in counterfactual mugging it depends on the details of how trustworthy the mugger is, on whom it chooses to pay or charge, etc, the problem is underspecified as presented. If you have an "unfair" Omega who punishes a specific DT agent, the winning strategy is to be the Omega-favored agent.

There is no need for counterfactuals, by the way, just calculate what strategy nets the highest EV. Just like with Newcomb's, in some counterfactual mugger setups only the agents who pay when lost get a chance to win. If you are the type of agent who doesn't pay, the CFM predictor will not give you a chance to win. This is like a lottery, only you pay after losing, which does not matter if the predictor knows what you would do. Not paying when losing is equivalent to not buying a lottery ticket when the expected winning is more than the ticket price. I don't know if this counts as decision theory, probably not.

[-]Dagon3y20

C,C is second-best, you prefer D,C and Nash says D,D is all you should expect. C,C is definitely better than C,D or D,D, so in the special case of symmetrical decisions, it's winning. It bugs me as much as you that this part gets glossed over so often.

Counterfactual Mugging is a win to pay off, in a universe where that sort of thing happens. You really do want to be correctly predicted to pay off, and enjoy the $10K in those cases where the coin goes your way.

[-]Nathan11233y30

C,C is second-best, you prefer D,C and Nash says D,D is all you should expect. C,C is definitely better than C,D or D,D, so in the special case of symmetrical decisions, it's winning. It bugs me as much as you that this part gets glossed over so often.

I see what you mean, it works as long as both sides have roughly similar behavior.

Counterfactual Mugging is a win to pay off, in a universe where that sort of thing happens. You really do want to be correctly predicted to pay off, and enjoy the $10K in those cases where the coin goes your way.

For me, this would make intuitive sense if there was something in the problem that implied that Omega does this on a regular basis, analogous to the Iterated Prisoner's Dilemma. But as long as the problem is worded as a one-shot, once-in-a-lifetime scenario, then it comes off like the $10,000 is purely fictitious.

[-]Vladimir_Nesov3y30

one-shot, once-in-a-lifetime scenario

It's less than that, you don't know that you are real and not the hypothetical. If you are the hypothetical, paying up is useful for the real one.

This means that even if you are the real one (which you don't know), you should pay up, or else the hypothetical you wouldn't. Winning behavior/policy is the best map from what you observe/know to decisions, and some (or all) of those observations/knowledge never occur, or even never could occur.

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

14

[ Question ]

Are ya winning, son?

14

14

4 Answers sorted by
top scoring

Aug 09, 2022

Aug 09, 2022

Aug 09, 2022

Aug 09, 2022*

14

[ Question ]

Are ya winning, son?

14

14

4 Answers sorted by top scoring

Aug 09, 2022

Aug 09, 2022

Aug 09, 2022

Aug 09, 2022*

4 Answers sorted by
top scoring