Should VS Would and Newcomb's Paradox

by dadadarren3 min read3rd Jul 202136 comments


AnthropicsNewcomb's ProblemRationalityAI

What One Should Do VS What One Would Do

When talking about how a decision is made, there are two approaches. Perhaps the more typical one is to reason as the decision-maker. From this first-person perspective, we consider all possible actions, imagine their respective outcomes, then choose the one deemed optimal according to some objective. This seems to fit what usually meant by decision-making: what one should do.

Alternatively, we can take an outsider perspective (or a God's eye view if you prefer)  and analyze the decision-maker itself by reductionism. For example, this can be done by physically studying the decision-maker's brain and build a model to deduce its outcome. In contrast to the first type of analysis, this is about what one would do

Conflicting Approaches

My position is that perspective is a primitive reasoning starting point. Therefore the two approaches above, each basing on a different perspective, must not mix. As that is also the cause for anthropic paradoxes. However, even if you do not agree with that, it is still very reasonable to question the compatibility of the two approaches.

For starters, they are based on very different premises. The first-person approach considers the decision-maker as the agent, the outsider approach considers the decision-maker part of the environment, one regards it as the analyzer while the other regards it as the analyzed.

The first-person approach assumes the decision-maker could select different choices. There are alternative actions I can take that will lead to different outcomes. In comparison, the outsider approach regards the decision-maker as nothing more but a complex machine. There is no sense in talking about alternatives as the decision is simply an output of the said machine.

Mixing the two would lead to some troubling problems. For example, the first-person approach makes decisions by evaluating the respective outcomes of all choices. We can use the outsider approach to reductively analyze the decision-maker to deduce its output. So the choices other than the one being ultimately taken are simply never happening. Using this result in the first-person analysis would produce a contradiction: what is the outcome of an action if that action is not taken? It bumps into the principle of explosion, making the evaluation impossible. (See action-counterfactuals for a related discussion.)

However, we never seem to encounter the above problem in real life. That is due to two reasons. First, once we carry out the outsider analysis and deduced the output, we typically won't conduct the first-person analysis. Second, even if we wish to also conduct the first-person analysis, we will ignore the outsider approach's conclusions. I.E. we simply evaluate the outcome of all choices without minding which one is the deduced output, as if the outsider analysis never happened. This means in practice, at least in obviously conflicting situations like this, we intuitively know not to mix the two approaches.

Newcomb's Paradox

Newcomb's paradox is problematic because it does mix the two. Its formulation takes the outsider's approach: Omega would analyze the decision-maker like a machine to deduce its outcome. Yet it wants us to take the first-person approach to answer it: what should you do when facing the two boxes. In short, the question is inconsistent. What should have been asked is "how to design a machine (the decision-maker) for this situation", which would also be answered using the outsider approach.

Designing such a machine is uncontroversial: make it take one box only. So Omega would study it and put a million dollars in for the machine to grab. The contradiction only happens if we take the first-person approach: imagine ourselves in the shoes of the decision-maker. When the two boxes with predetermined content are right in front of me, obviously I should take them both. Doing anything different would need to make some serious and dubious modifications to the common notion of causality. Yet the very act of taking the first-person perspective is inconsistent with the question's setup. For there is no sense in contemplating what I should do when what I would do is part of the problem.


38 comments, sorted by Highlighting new comments since Today at 8:31 AM
New Comment

Yes! It's interesting how the concepts of agency and choice seem so natural and ingrained for us, humans, that we are often tempted to think that they describe reality deeper than they really do. We seem to see agents, preferences, goals, and utilities everywhere, but what if these concepts are not particularly relevant for the actual mechanism of decision-making even from the first-person view?

What if much of the feeling of choice and agency is actually a social adaptation, a storytelling and explanatory device that allows us to communicate and cooperate with other humans (and, perhaps more peculiarly, with our future selves)? While it feels like we are making a choice due to reasons, there are numerous experiments that point to the explanatory, after-the-fact role of reasoning in decision-making. Yes, abstract reasoning also allows us to model conceptually a great many things, but those models just serve as additional data inputs to the true hidden mechanism of decision-making.

It shouldn't be surprising then if for other minds [such as AGI], not having this primal adaptation to being modeled by others, decision-making would feel nothing like choice or agency. We already see it in our simpler AIs -- choice and reasoning mean nothing to a health diagnostic system, it simply computes, it is only for us, humans, to feel like we understand "why" it made a particular choice, we have to add an explanatory module that gives us "reasons" but is completely unnecessary for the decision-making itself!

[+][comment deleted]3mo 1

I talked about it a few years back. If you think of the world as a deterministic or non-deterministic evolution of some initial conditions, you can potentially separate small parts of it as "agents" and study what they would internally consider a better or worse outcome, which is the "should", vs what they actually do, which is the "would". You don't have to internally run a God's eye view algorithm, some agents can still notice themselves "make decisions" (the internal feeling as an artifact of the agency algorithm evolving in time), while understanding that this is only a feeling, and the reality is nothing more than learning what kind of an agent you are, for example whether you one-box or two-box. Or maybe it's what you mean by an outside view.

Notice, however, that the view you have re-discovered is anti-memetic: it contradicts the extremely strong "free choice" output of a subroutine that converts multiple potential maps of the observed world into an action that has a certain degree of optimality, and so is almost instantly internally rejected in most cases. In fact, most agents pretend to find a loophole, often under the guise of compatibilism, that lets them claim that choices matter and that they can affect the outcome by making them, not just passively watch themselves think and act and discover what they would actually do.

If you think of the world as a deterministic or non-deterministic evolution of some initial conditions,

In fact, most agents pretend to find a loophole [..] that lets them claim that choices matter

If the world actually is non deterministic, their choices actually could matter.

"Matter" was a poor choice of words (hah). But no, there is no difference between determinism and non-determinism in terms of how free the choices are. Unless you are willing to concede that your choice is determine by the projection postulate, or by which Everett branch "you" end up in.

If "free" merely means "free of determinism" ,then an undetermined choice is a free choice, and a determined choice is not.

The project ion postulate does nothing without some pre existing state, so why attribute all the choice to it?

I think your actual objection concerns the ability to combine volition (intention, etc) with freedom.

Not necessarily. Non-determinism (that future is not completely defined by the past) doesn't have anything to do with choice. A stone doesn't make choices even if future is intrinsically unpredictable. The question here is why would anyone think that humans are qualitatively different from stones.

Is a computer qualitatively different from a stone? Computers can make choices, in some sense.

I don't think computers have any more free will [free choice] than stones. Do you?

How are you defining free will?

I don't think it can be meaningfully defined. How could you define free choice so that a human would have it, but a complicated mechanical contraption of stones wouldn't?

Why would you want to?

You'd have to draw the line somewhere so it would have any meaning at all. What's the point in the concept if anything can be interpreted as such. What do you mean when you say "free choice" or "choice"?

I define freedom in he libertarian sense, freedom in the compatibilist sense, and so on, separately, rather than trying to find a single true definition.

An agent with desires could be said to lack or have compatibilist free will inasmuch as it is able to act on its desires unimpededly. That could include an AI.

An agent with the ability to make undetermined choices could because to have libertarian free will. That could include an AI, too.

So I dont see the probelm with a "complicated contrivance" having free will.

Both definitions have their issues.

"able to act on its desires unimpededly" has 2 problems. First, it is clearly describing the "agent's" (also not a well-defined category, but let's leave it at that) experience, e.g. desires, not something objective from an outside view. Second, "unimpededly" is also intrinsically vague. Is my desire to fly impeded? Is an addict's desire to quit? (If the answer is "no" to both, what would even count as impediment?) But, I guess, it is fine if we agree that "compatibilist free will" is just a feature of subjective experience.

"ability to make undetermined choices" relies on the ambiguous concept of "choice", but also would be surprisingly abundant in a truly probabilistic world. We'd have to attribute "libertarian free will" to a radioactive isotope that's "choosing" when to decay, or to any otherwise deterministic system that relies on such isotope. I don't think that agrees with intuition of those who find this concept meaningful.

Both definitions have their issues.

All definitions have issues.

We can decide issues of compatibilist free will, up to a point, because it's the same thing as acting under your own volition in the legal sense.

“ability to make undetermined choices” relies on the ambiguous concept of “choice”, but also would be surprisingly abundant in a truly probabilistic world

That would depend on the nature of choice. If the ability to make choices isn't common , then widespread indeterminism would not lead to widespread undetermined choices.

If "free" merely means "free of determinism" ,then an undetermined choice is a free choice, and a determined choice is not.

I think your actual objection concerns the ability to volition, intention, or control with freedom

[+][comment deleted]3mo 1

My underlying motive is quite different. For example, I do not think that metaphysics is against free will. My root position is that thinking from a given perspective, i.e. that each of our own first-person perspectives, is the foundation of reasoning. From there we imagine thinking as other people or things, some part of reasoning remains unchanged with such perspective switches, which gives the idea of objectivity. Nonetheless, reasoning is ultimately based on perspectives, so perspective (and self) shall be regarded as primitive.

This is why I think the Copenhagen-type interpretations got it right: the existence of an observer that is physically unexplainable by reductionism is to be expected. And free will is something experienced by the first-person, by whatever that is regarded as the observer. So free will is not something inside physics' domain. 

I also noticed you discussed the absent-minded driver. I consider "the probability that here is X" is not a valid concept. Because here is perspective defined, using that like its meaning is apparent means reasoning from the perspective of the driver at that moment. So here is primitive, there is no way to analyze the probability. In general self-locating probabilities in anthropics are invalid

My position is similar to dualistic agency. If we want to analyze the decision-maker, then we use physical reductionism. If we want to reason as the first-person, then free will must be a premise in decision-making. The point is, we should not switch perspectives halfway and mix the two analyses, which Newcomb's paradox does. 

I definitely agree with the last paragraph, stick with one perspective. To the predictor you are an algorithm that either one-boxes or not. There is nothing more to design.

I agree with you on self-locating probabilities not being a useful concept for making optimal decisions. However, in the absent-minded driver problem turning with the probability 2/3 to optimize your payout is not talking about a self-locating probability. Not sure if that is what you meant.

I don't understand the point about the Copenhagen-type interpretation at all...

As for the free will, metaphysics is definitely not against it, physics is. The feeling of free will is a human cognitive artifact, not anything reducible or emergent. But it doesn't seem useful to argue this point.

Turning with the probability of 2/3 is not a self-locating probability. It is a valid decision. What is not valid is when at an intersection ask "what is the probability that here is X?", this is a self-locating probability. It needs to employ the first-person perspective to make sense of "here", while also needs a god's eye view to treat the location as unknown. i.e. mixing two perspectives. We can't assign a value to it then make a decision basing on that.

If we consider perspective as fundamental in reasoning then physics cannot be regarded as the description of an objective reality, rather it is the description of how the world interacts with a perspective center. So physics not describing the observer itself is to be expected. Yet free will (and subjective experience in general) are only relevant to the self. So physics cannot be against free will as it is not something within its domain of study.

That is all assuming perspective is the fundamental form of reasoning. If we consider objective reasoning as fundamental, then physics as the description of the objective reality is the foundation of any perspective experiences such as free will. And it would be right to say free will is not compatible with physics.

The former considers reasoning as the first-person as the foundation, the other considers reasoning objectively as the foundation.

There is a problem, but it is not quite an inconsistency. The problem is the assumption that Omega is a perfect predictor. That is: for the system Omega + Agent + everything else, Omega can always find a fixed point such that Omega's prediction is always correct for the state of the system some time after making that prediction and subsequent evolution of the system based on that. Even in a completely deterministic universe, this is asking too much. Some systems just don't have fixed points.

The problem becomes much more reasonable when you consider an almost perfect predictor. It ruins the purity of the original question, but the space of possibilities becomes much broader.

It becomes more like a bluffing game. Omega can observe tells with superhuman acuity, you don't know what those tells are, and certainly don't know how to mask them. If your psychology was such that you'd take both boxes, you were already showing signs of it before you went into the room. Note that this is not supernatural, retro-causal, nor full of anthropic paradoxes but something people already do. Omega is just very much better at it than other humans are.

In this viewpoint, the obvious solution is to be so good at hiding your tells that Omega thinks you're a one-boxer while actually being a two-boxer. But you don't know how to do that, nobody else who tried succeeded, and you have to think you've got a 99.9% chance of success for it to be more worthwhile than obviously taking one box.

Indeed starting with an imperfect predictor helps. Classic CDT implicitly assumes that you are not a typical subject, but one of those who can go toe-to-toe with Omega. In the limit of 100% accuracy the space of such subjects is empty, but CDT insists on acting as if you are one anyway.

I don't think Omega being a perfect predictor is essential to the paradox. Assume you are playing this game with me. Say my prediction is only 51% correct. I will fill an envelope according to the prescribed rule. I read you then give you the envelope (box B). After you put it in your pocket I put 1000 dollars on the table. Do you suggest not taking the 1000 dollar will make you richer? If you thinking you should take the 1000 in this case, then how good would I need to be for you to give that up? (somewhere between 51% and 99.9% I presume) I do not see a good reason for this cutoff. 

I think the underlying rationale for two-boxing is to deny first-person decision-making in that particular situation. e.g. not conducting the causal analysis when facing the 1000 dollars. Which is your strategy, commit to taking one box only, let Omega read you, and stick to that decision. 

"After you put it in your pocket I put 1000 dollars on the table. Do you suggest not taking the 1000 dollar will make you richer?"

Unlike the Omega problem, this is way too underspecified to make a sensible answer. It depends upon the details of how you get your 51% success rate.

Do you always predict they're going to take two boxes, and only 51% of people actually did? Then obviously I will have $1000 instead of $0, and always be richer.

Or maybe you just get these feelings about people sometimes. In later carefully controlled tests it turns out that you get this feeling for about 2% of "easily read" people, you're right about 90% of the time in both directions for them, and it isn't correlated with how certain they themselves are about taking the money. This is more definite than any real-world situation will ever be, but illustrates the principle. In this scenario, if I'm in the 98% then your "prediction" is uncorrelated with my eventual intent, and I will be $1000 richer if I take the money.

Otherwise, I'm in the 2%. If I choose to take the money, there's a 90% chance that showed up in some outward signs before you gave me the envelope, and I get $1000. There's a 10% chance that it didn't, and I get $1001000 for an expected payout of $101000. Note that this is an expected payout because I don't know what is in the envelope. If I choose not to take the money, the same calculation gives $901000 expected payout.

Since I don't know whether I'm easily read or not, I'm staking a 98% chance of a $1000 gain against a 2% chance of a $800000 loss. This is a bad idea, and on balance loses me money.

Well, to my defense you didn't specify how is Omega 99.9% accurate either. But that does not matter. Let me change the question to fit your framework.

I get this feeling for some "easily read" people. I am about 51% right in both directions of them, and it isn't correlated with how certain they themselves are about taking the money. Now, suppose you are one of the "easily read" people and you know it. After putting the envelope in your pocket, would you also take the 1000 dollar on the table? Will rejecting it make you richer?

No, I wouldn't take the money on the table in this case.

I'm easily read, so I already gave off signs of what my decision would turn out to be. You're not very good at picking them up, but enough that if people in my position take the money then there's a 49% chance that the envelope contains a million dollars. If they don't, then there's a 51% chance that it does.

I'm not going to take $1000 if it is associated with a 2% reduction in the chance of me having a $1000000 in the envelope. On average, that would make me poorer. In the strict local causal sense I would be richer taking the money, but that reasoning is subject to Simpson's paradox: action Take appears better than Leave for both cases Million and None in the envelope, but is worse when the cases are combined because the weights are not independent. Even a very weak correlation is enough because the pay-offs are so disparate.

I guess that is our disagreement. I would say not taking the money require some serious modification to causal analysis (e.g. retro-causal). You think there doesn't need to be, it is perfectly resolved by Simpson's paradox.

From this first-person perspective, we consider all possible actions, imagine their respective outcomes, then choose the one deemed optimal according to some objective.


When the two boxes with predetermined content are right in front of me, obviously I should take them both.

These seem to conflict. What do you expect the outcome to be if you take one box? What if you take two?

With the two boxes with predetermined content right in front of me, two-boxing makes me 1000 dollars richer than one-boxing.

From an outsider's view, making the decision-maker one-box will cause it 999,000 dollars richer than making it two-box.

I think both are correct. Mixing the two analyses together is not.

Wait.  Will you and the outsider observe the same results?  It's hard not to think that one of the two people is simply incorrect in their prediction.  They cannot both be correct.

They will observe the same result. Say the result is the opaque box is empty.

From a first-person perspective, if I had chosen this box only then I would have gone empty-handed.

From an outsider's perspective, making a one-boxing decision-maker would cause the box to be filled with 1 million dollars.

This "disagreement" is due to the two having different reasoning starting points. In anthropics, the same reason leads to robust perspectivism. I.E. Two people sharing all their information can give different answers to the same probability question.

I'm still missing something (which from this observer's standpoint feels like disagreeing).  Let me restate and tell me which statement is wrong.  You are in front of two boxes which Omega has prepared based on prediction of your decision.  There is an observer watching you.  You and the observer both assign very high probability that Omega has predicted correctly.  Despite this, you believe you can two-box and both boxes will be filled, and the observer believes that if you two-box, only the smaller amount will be filled.

Fast-forward to after you've opened both boxes.  The second box was empty.  The observer feels vindicated.  You feel your prediction was correct, even though it doesn't match the reality you find yourself in.

I think you're just wrong.  You got less by two-boxing, so your prediction was incorrect.

Alternate fast-forward to after you open only one box, and found $1M.  I think you were wrong in expecting to find it empty.


I understand anthropic reasoning is difficult - both in understanding precisely what future experience is being predicted, and in enumerating the counterfactuals that let you apply evidence.  Neither of those is relevant to Newcomb's problem, since the probability of objective outcomes is given.

“ You are in front of two boxes ..... you believe you can two-box and both boxes will be filled”

No....That is not first-person decision. I do not think if I choose to two-box both will be filled. I think the two boxes' contents are predetermined. Whatever I choose can no longer change what is already inside. Two-boxing is better because it gives me 1000 dollar more. So my decision is right regardless if the second box is empty or not. 

Outsiders and the first-person give different counterfactuals even when facing the same outcome. Say the outcome is two-boxing and the second box is empty. The outsider would think the counterfactual is to make the machine (decision-maker) always one-box, so the second box is filled. The first-person would think the counterfactual is that I have only chosen the second box which is empty. 

Facing the same outcome while giving different counterfactuals is the same reason for perspective disagreement in anthropics. 

One more try - I'm misunderstanding your use of words "correct" and/or "choose".  I understand difficulties and disagreements in anthropics for not-resolved probabilities with branching experience-measurement points.  But I don't see how it applies AFTER the results are known in a fairly linear choice.

Does your first-person decision include any actual decision?  do you think there are two universes you might find yourself in (or some other definition of "possible")?  If you think the boxes' contents are determined, AND you think Omega predicted correctly, does this not imply that your "decision" is predetermined as well?  

I totally get the "all choice is illusion" perspective, but I don't see much reason to call it "anthropic reasoning".

I think I kind of getting where our disagreement lies. You agree with the "all choices are illusions". By this, there is no point in thinking about "how should I decide". We can discuss what kind of decision-maker would benefit most in this situation, which is the "outsider perspective". Obviously, one-boxing decision-makers are going to be better off. 

The controversy is if we reason as the first-person when facing the two boxes. Regardless of the content of the opaque box, two-boxing should give me 1000 dollars more. The causal analysis is quite straightforward. This seems to be a contradiction with the first paragraph. 

What I am suggesting is the two reasoning are parallel to each other. They are based on different premises. The "god's eye view" treats the decision-maker as an ordinary part of the environment like a machine. Whereas the first-person analysis treats the self as something unique: a primitively identified irreducible perspective center, i.e. THE agent-- as opposed to part of the environment.  (Similar to how a dualist agent consider itself) Here free will is a premise. I think they are both correct, yet because they are based on different perspectives (thus different premises) they cannot be mixed together. (Kind of like deductions from different axiomatic systems cannot be mixed.) So from a first-person perspective, I cannot put how Omega has analyzed me (like a machine) thus filled the box into consideration. For the same reason, from a god's eye view, we cannot imagine being the decision-maker himself when facing the two boxes and choose.

If I understand correctly, what you have in mind is that those two approaches must be put together to arrive at a complete solution. Then the conflict must be resolved somehow. It is done by letting the god's eye view dominate over the first-person approach. This makes sense because after all treating oneself as special does not seem objective. Yet that would deny free will which could make all casual decision-making processes into question. Also, this brings to a metaphysical debate of which is more fundamental? Reasoning from a first-person perspective or reasoning objectively?

I bring up anthropics because I think this is the exact same reason which leads to the paradoxes in that field, mixing reasoning from different perspectives. If you do not agree with treating perspectives as premises and keeping two approaches separate then there is indeed little connection between that and Newcomb's paradox. 

Neither analysis is correct. Both are incomplete. The outsider's viewpoint is less wrong.

The first person argument presented here finds a local maximum and stops there. Yes, if they ignore Omega's abilities and climb the two-box hill then they can get 1000 dollars more. No mention of whether they're on the right hill, and no analysis of whether this is reasonable given their knowledge of Omega.

The outsider's view as stated fails to account for a possibly limited ability of the decision-maker to choose what sort of decision-maker they can be. Omega knows (in this infallible predictor version), but the decision-maker might not. Or they might know but be powerless to update (not all possible agents have even local "free will").