# 0

Consider Newcomb's problem.

Let 'general' be the claim that Omega is always right.

Let 'instance' be the claim that Omega is right about a particular prediction.

Assume you, the player, are not told the rules of the game until after Omega has made its prediction.

Consider 2 variants of Newcomb's problem.

1. Omega is a perfect predictor.  In this variant, you assign a prior of 1 to P(general).  You are then obligated to believe that Omega has correctly predicted your action.  In this case Eliezer's conclusion is correct, and you should one-box.  It's still unclear whether you have free will, and hence have any choice in what you do next, but you can't lose by one-boxing.

But you can't assign a prior of 1 to P(general), because you're a Bayesian.  You derive your prior for P(general) from the (finite) empirical data.  Say you begin with a prior of 0.5 before considering any observations.  Then you observe all of Omega's N predictions, and each time, Omega gets it right, and you update:

P(general | instance) = P(instance | general) P(instance) / P(general)
= P(instance) / P(general)

Omega would need to make an infinite number of correct predictions before you could assign a prior of 1 to P(general).  So this case is theoretically impossible, and should not be considered.

2. Omega is a "nearly perfect" predictor.  You assign P(general) a value very, very close to 1.  You must, however, do the math and try to compare the expected payoffs, at least in an order-of-magnitude way, and not just use verbal reasoning as if we were medieval scholastics.

The argument for two-boxing is that your action now can't affect what Omega did in the past.  That is, we are using a model which includes not just P(instance | general), but also the interaction of your action, the contents of the boxes, and the claim that Omega cannot violate causality.  P ( P(\$1M box is empty | you one-box) = P(\$1M box is empty | you two-box) ) >= P(Omega cannot violate causality), and that needs to be entered into the computation.

Numerically, two-boxers claim that the high probability they assign to our understanding of causality being basically correct more than cancels out the high probability of Omega being correct.

The argument for one-boxing is that you aren't entirely sure you understand physics, but you know Omega has a really good track record--so good that it is more likely that your understanding of physics is false than that you can falsify Omega's prediction.  This is a strict reliance on empirical observations as opposed to abstract reason: count up how often Omega has been right and compute a prior.

However, if we're going to be strict empiricists, we should double down on that, and set our prior on P(cannot violate causality) strictly empirically--based on all observations regarding whether or not things in the present can affect things in the past.

This includes up to every particle interaction in our observable universe.  The number is not so high as that, as probably a large number of interactions could occur in which the future affects the past without our noticing.  But the number of observations any one person has made in which events in the future seem to have failed to affect events in the present is certainly very large, and the accumulated wisdom of the entire human race on the issue must provide more bits in favor of the hypothesis that causality can't be violated, than the bits for Omega's infallibility based on the comparatively paltry number of observations of Omega's predictions, unless Omega is very busy indeed.  And even if Omega has somehow made enough observations, most of them are as inaccessible to you as observations of the laws of causality working on the dark side of the moon.  You, personally, cannot have observed Omega make more correct predictions than the number of events you have observed in which the future failed to affect the present.

You could compute a new payoff matrix that made it rational to one-box, but the ratio between the payoffs would need to be many orders of magnitude higher.  You'd have to compute it in utilons rather than dollars, because the utility of dollars doesn't scale linearly.  And that means you'd run into the problem that humans have some upper bound on utility--they aren't cognitively complex enough to achieve utility levels 10^10 times greater than "won \$1,000".  So it still might not be rational to one-box, because the utility payoff under the one box might need to be larger than you, as a human, could experience.

## Pre-commitment

The case in which you get to think about what to do before Omega studies you and makes its decision is more complicated, because your probability calculation then also depends on what you think you would have done before Omega made its decision.  This only affects the partition of your probability calculation in which Omega can alter the past, however, so numerically it doesn't make a big difference.

The trick here is that most statements of Newcomb's are ambiguous as to whether you are told the rules before Omega studies you, and as to which decision they're asking you about when they ask if you one-box or two-box.  Are they asking about what you pre-commit to, or what you eventually do?  These decisions are separate, but not isolatable.

As long as we focus on the single decision at the point of action, then the analysis above (modified as just mentioned) still follows.  If we ask what the player should plan to do before Omega makes its decision, then the question is just whether you have a good enough poker face to fool Omega.  Here it takes no causality violation for Omega to fill the boxes in accordance with your plans, so that factor does not enter in, and you should plan to one-box.

If you are a deterministic AI, that implies that you will one-box.  If you're a GOFAI built according to the old-fashioned symbolic logic AI designs talked about on LW (which, BTW, don't work), it implies you will probably one-box even if you're not deterministic, as otherwise you would need to be inconsistent, which is not allowed with GOFAI architectures.  If you're a human, you'd theoretically be better off if you could suddenly see things differently when it's time to choose boxes, but that's not psychologically plausible.  In no case is there a paradox, or any real difficulty to the decision to one-box.

## Iterated Games

Everything changes with iterated interactions.  It's useful to develop a reputation for one-boxing, because this may convince people that you will keep your word even when it seems disadvantageous to you.  It's useful to convince people that you would one-box, and it's even beneficial, in certain respects, to spread the false belief in the Bayesian community that Bayesians should one-box.

Read Eliezer's post carefully, and I think you'll agree that the reasoning Eliezer gives for one-boxing is not that it is the rational solution to a one-off game--it's that it's a winning policy to be the kind of person who one-boxes.  That's not an argument that the payoff matrix of an instantaneous decision favors one-boxing; it's an argument for a LessWrongian morality.  It's the same basic argument as that honoring commitments is a good long-term strategy.  But the way Eliezer stated it has given many people the false impression that one-boxing is actually the rational choice in an instantaneous one-shot game (and that's the only interpretation which would make it interesting).

The one-boxing argument is so appealing because it offers a solution to difficult coordination problems.  It makes it appear that rational altruism and a rational utopia are within our reach.

But this is wishful thinking, not math, and I believe that the social norm of doing the math is even more important than a social norm of one-boxing.

# 0

New Comment
26 comments, sorted by Click to highlight new comments since:

You're using words like "reputation", and understand how having a reputation for one-boxing is preferable, when we're discussing the level where Omega has access to the source code of your brain and can just tell whether you'll one-box or not, as a matter of calculation.

So the source-code of your brain just needs to decide whether it'll be a source-code that will be one-boxing or not. This isn't really about "precommittment" for that one specific scenario. Omega doesn't need to know whether you have precomitted or not, Omega isn't putting money in the boxes based on whether you have precommitted or not. It's putting money based on the decision you'll arrive to, even if you yourself don't know the decision yet.

You can't make the decision in advance, because you may not know the exact parameters of the decision you'll be asked to make (one-boxing & two-boxing are just examples of one particular type of decision). You can decide however whether you're the sort of person who accepts their decisions can be deterministically predicted in advance with sufficient certainty, or whether you'll be claiming that other people predicting your choice must be a violation of causality (it's not).

So the source-code of your brain just needs to decide whether it'll be a source-code that will be one-boxing or not.

First, in the classic Newcomb when you meet Omega that's a surprise to you. You don't get to precommit to deciding one way or the other because you had no idea such a situation will arise: you just get to decide now.

You can decide however whether you're the sort of person who accepts their decisions can be deterministically predicted in advance with sufficient certainty, or whether you'll be claiming that other people predicting your choice must be a violation of causality (it's not).

Why would you make such a decision if you don't expect to meet Omega and don't care much about philosophical head-scratchers?

And, by the way, predicting your choice is not a violation of causality, but believing that your choice (of the boxes, not of the source code) affects what's in the boxes is.

Second, you are assuming that the brain is free to reconfigure and rewrite its software which is clearly not true for humans and all existing agents.

The argument for one-boxing is that you aren't entirely sure you understand physics, but you know Omega has a really good track record--so good that it is more likely that your understanding of physics is false than that you can falsify Omega's prediction. This is a strict reliance on empirical observations as opposed to abstract reason: count up how often Omega has been right and compute a prior.

Isn't it that you aren't entirely sure that you understand psychology, or that you do understand psychology well enough to think that you're predictable? My understanding is that many people have run Newcomb's Problem-style experiments at philosophy departments (or other places) and have a sufficiently high accuracy that it makes sense to one-box at such events, even against fallible human predictors.

I can believe that it would make sense to commit ahead of time to one-box at such an event. Doing so would affect your behavior in a way that the predictor might pick up on.

Hmm. Thinking about this convinces me that there's a big problem here in how we talk about the problem, because if we allow people who already knew about Newcomb's Problem to play, there are really 4 possible actions, not 2:

• intended to one-box, one-boxed
• intended to one-box, two-boxed
• intended to two-box, one-boxed
• intended to two-box, two-boxed

I don't know if the usual statement of Newcomb's problem specifies whether the subjects learns the rules of the game before or after the predictor makes a prediction. It seems to me that's a critical factor. If the subject is told the rules of the game before the predictor observes the subject and makes a prediction, then we're just saying Omega is a very good lie detector, and the problem is not even about decision theory, but about psychology: Do you have a good enough poker face to lie to Omega? If not, pre-commit to one-box.

We shouldn't ask, "Should you two-box?", but, "Should you two-box now, given how you would have acted earlier?" The various probabilities in the present depend on what you thought in the past. Under the proposition that Omega is perfect at predicting, the person inclined to 2-box should still 2-box, 'coz that \$1M probably ain't there.

So Newcomb's problem isn't a paradox. If we're talking just about the final decision, the one made by a subject after Omega's prediction, then the subject should probably two-box (as argued in the post). If we're talking about two decisions, one before and one after the box-opening, then all we're asking is whether you can convince Omega that you're going to one-box if you aren't. Then it would not be terribly hard to say that a predictor might be so good (say, an Amazing Kreskin-level cold-reader of humans, or that you are an AI) that your only hope is to precommit to one-box.

I don't think this gets Parfit's Hitchhiker right. You need a decision theory that, when safely returned to the city, pays the rescuer even though they have no external obligation to do so. Otherwise they won't have rescued you.

I don't think that what you need has any bearing on what reality has actually given you. Nor can we talk about different decision theories here--as long as we are talking about maximizing expected utility, we have our decision theory; that is enough specification to answer the Newcomb one-shot question. We can only arrive at a different outcome by stating the problem differently, or by sneaking in different metaphysics, or by just doing bad logic (in this case, usually allowing contradictory beliefs about free will in different parts of the analysis.)

Your comment implies you're talking about policy, which must be modelled as an iterated game. I don't deny that one-boxing is good in the iterated game.

My concern in this post is that there's been a lack of distinction in the community between "one-boxing is the best policy" and "one-boxing is the best decision at one point in time in a decision-theoretic analysis, which assumes complete freedom of choice at that moment." This lack of distinction has led many people into wishful or magical rather than rational thinking.

I don't think that what you need has any bearing on what reality has actually given you.

As far as I can tell, I would pay Parfit's Hitchhiker because of intuitions that were rewarded by natural selection. It would be nice to have a formalization that agrees with those intuitions.

or by sneaking in different metaphysics

This seems wrong to me, if you're explicitly declaring different metaphysics (if you mean the thing by metaphysics that I think you mean). If I view myself as a function that generates an output based on inputs, and my decision-making procedure being the search for the best such function (for maximizing utility), then this could be considered as different metaphysics from trying to cause the most increase in utility for myself by making decisions, but it's not obvious that the latter leads to better decisions.

P ( P(\$1M box is empty | you one-box) = P(\$1M box is empty | you two-box) ) >= P(Omega cannot violate causality)

For what it's worth: this is incorrect. If P(omega cannot violate causality) = 1, that still wouldn't mean P ( P(\$1M box is empty | you one-box) = P(\$1M box is empty | you two-box) ) = 1. If If P(omega cannot violate causality) = 1, P(\$1M box is empty | you one-box) < P(\$1M box is empty | you two-box) due to Omega and the agent being subjunctively dependent on the agent's decision procedure. (Unless, of course, the agent doesn't believe there is subjunctive dependence going on.) Subjunctive dependence doesn't violate causality.

"Omega" is philosophical shorthand for "please accept this part of the thought experiment as a premise". Newcomb's problem isn't supposed to be realistic, it's supposed to isolate a corner-case in reasoning and let us consider it apart from everything else.While it's true that in reality you can't assign probability 1 to Omega being a perfect predictor, the thought experiment nevertheless asks you to do so anyways--because otherwise the underlying issue would be too obscured by irrelevant details to solve it philosophically..

If you rule out probabilities of 1, what do you assign to the probability that Omega is cheating, and somehow gimmicking the boxes to change the contents the instant you indicate your choice, before the contents are revealed?

Presumably the mechanisms of "correct prediction" are irrelevant, and once your expectation that this instance will be predicted correctly gets above million-to-one, you one-box.

The argument for one-boxing is that you aren't entirely sure you understand physics, but you know Omega has a really good track record--so good that it is more likely that your understanding of physics is false than that you can falsify Omega's prediction.  This is a strict reliance on empirical observations as opposed to abstract reason: count up how often Omega has been right and compute a prior.

Eh? I'm pretty sure I'd still be a one-boxer if I'd completely understand physics. I one-box because of Omega and I being subjunctively dependent on my decision procedure.

[-]ike6y10

What part of physics implies someone cannot scan your brain and simulate inputs so as to perfectly predict your actions?

The part of physics that implies someone cannot scan your brain and simulate inputs so as to perfectly predict your actions is quantum mechanics. But I don't think invoking it is the best response to your question. Though it does make me wonder how Eliezer reconciles his thoughts on one-boxing with his many-worlds interpretation of QM. Doesn't many-worlds imply that every game with Omega creates worlds in which Omega is wrong?

If they can perfectly predict your actions, then you have no choice, so talking about which choice to make is meaningless. If you believe you should one-box based if Omega can perfectly predict your actions, but two-box otherwise, then you are better off trying to two-box: In that case, you've already agreed that you should two=box if Omega can't perfectly predict your actions. If Omega can, you won't be able to two-box unless Omega already predicted that you would, so it won't hurt to try to 2-box.

If they can perfectly predict your actions, then you have no choice, so talking about which choice to make is meaningless.

No, it just makes you deterministic. You still have a choice to make, as you don't know what Omega predicted (until you make your choice).

If you find an Omega, then you are in an environment where Omega is possible. Perhaps we are all simulated and QM is optional. Maybe we have easily enough determinism in our brains that Omega can make predictions, much as quantum mechanics ought to in some sense prevent predicting where a cannonball will fly but in practice does not. Perhaps it's a hypothetical where we're AI to begin with so deterministic behavior is just to be expected.

[-]ike6y10

If they can perfectly predict your actions, then you have no choice, so talking about which choice to make is meaningless.

This was argued against in the Sequences and in general, doesn't seem to be a strong argument. It seems perfectly compatible to believe your actions follow deterministically and still talk about decision theory - all the functional decision theory stuff is assuming a deterministic decision process, I think.

Re QM: sometimes I've seen it stipulated that the world in which the scenario happens is deterministic. It's entirely possible that the amount of noise generated by QM isn't enough to affect your choice (besides for a very unlikely "your brain has a couple bits changed randomly in exactly the right way to change your choice", but that should be way too many orders of magnitude unlikely so as to not matter in any expected utility calculation).

This was argued against in the Sequences and in general, doesn't seem to be a strong argument. It seems perfectly compatible to believe your actions follow deterministically and still talk about decision theory - all the functional decision theory stuff is assuming a deterministic decision process, I think.

It is compatible to believe your actions follow deterministically and still talk about decision theory. It is not compatible to believe your actions follow deterministically, and still talk about decision theory from a first-person point of view, as if you could by force of will violate your programming.

To ask what choice a deterministic entity should make presupposes both that it does, and does not, have choice. Presupposing a contradiction means STOP, your reasoning has crashed and you can prove any conclusion if you continue.

It is not compatible to believe your actions follow deterministically, and still talk about decision theory from a first-person point of view,

So it's the pronouns that matter? If I keep using "Aris Katsaris" rather than "I" that makes a difference to whether the person I'm talking about makes decisions that can be deterministally predicted?

Whether someone can predict your decisions has ZERO relevancy on whether you are the one making the decisions or not. This sort of confusion where people think that "free will" means "being unpredictable" is nonsensical - it's the very opposite. For the decisions to be yours, they must be theoretically predictable, arising from the contents of your brains. Adding in randomness and unpredictability, like e.g. using dice or coinflips reduces the ownership of the decisions and hence the free will.

This is old and tired territory.

Old and tired, maybe, but clearly there is not much consensus yet (even if, ahem, some people consider it to be as clear as day).

Note that who makes the decision is a matter of control and has nothing to do with freedom. A calculator controls its display and so the "decision" to output 4 in response to 2+2 it its own, in a way. But applying decision theory to a calculator is nonsensical and there is no free choice involved.

[-]ike6y00

Have you read http://lesswrong.com/lw/rb/possibility_and_couldness/ and the related posts and have some disagreement with them?

I just now read that one post. It isn't clear how you think it's relevant. I'm guessing you think that it implies that positing free will is invalid.

You don't have to believe in free will to incorporate it into a model of how humans act. We're all nominalists here; we don't believe that the concepts in our theories actually exist somewhere in Form-space.

When someone asks the question, "Should you one-box?", they're using a model which uses the concept of free will. You can't object to that by saying "You don't really have free will." You can object that it is the wrong model to use for this problem, but then you have to spell out why, and what model you want to use instead, and what question you actually want to ask, since it can't be that one.

People in the LW community don't usually do that. I see sloppy statements claiming that humans "should" one-box, based on a presumption that they have no free will. That's making a claim within a paradigm while rejecting the paradigm. It makes no sense.

Consider what Eliezer says about coin flips:

We've previously discussed how probability is in the mind. If you are uncertain about whether a classical coin has landed heads or tails, that is a fact about your state of mind, not a property of the coin. The coin itself is either heads or tails. But people forget this, and think that coin.probability == 0.5, which is the Mind Projection Fallacy: treating properties of the mind as if they were properties of the external world.

The mind projection fallacy is treating the word "probability" not in a nominalist way, but in a philosophical realist way, as if they were things existing in the world. Probabilities are subjective. You don't project them onto the external world. That doesn't make "coin.probability == 0.5" a "false" statement. It correctly specifies the distribution of possibilities given the information available within the mind making the probability assessment. I think that is what Eliezer is trying to say there.

"Free will" is a useful theoretical construct in a similar way. It may not be a thing in the world, but it is a model for talking about how we make decisions. We can only model our own brains; you can't fully simulate your own brain within your own brain; you can't demand that we use the territory as our map.

[-]ike6y00

It's not just the one post, it's the whole sequence of related posts.

It's hard for me to summarize it all and do it justice, but it disagrees with the way you're framing this. I would suggest you read some of that sequence and/or some of the decision theory papers for a defense of "should" notions being used even when believing in a deterministic world, which you reject. I don't really want to argue the whole thing from scratch, but that is where our disagreement would lie.

Let's say I build my Omega by using a perfect predictor plus a source of noise that's uncorrelated with the prediction. It seems weird that you'd deterministically two-box against such an Omega, even though you deterministically one-box against a perfect predictor. Are you sure you did the math right?

It seems weird that you'd deterministically two-box against such an Omega

Even in the case when the random noise dominates and the signal is imperceptibly small?

I think the more relevant case is when the random noise is imperceptibly small. Of course you two-box if it's basically random.

So, at one point in my misspent youth I played with the idea of building an experimental Omega and looked into the subject in some detail.

In Martin Gardiner's writeup on this back in 1973 reprinted in The Night Is Large the essay explained that the core idea still works if Omega can just predict with 90% accuracy.

Your choice of ONE box pays nothing if you're predicted (incorrectly) to two box, and pays \$1M if predicted correctly at 90%, for a total EV of \$900,000 (== (0.1)0 + (0.9)1,000,000).

Your choice of TWO box pays \$1k if you're predicted (correctly) to two box, and pays \$1,001,000 if you're predicted to only one box for a total EV of \$101k (== 900 + 100,100 == (0.9)1,000 + (0.1)1,001,000).

So the expected profit from one boxing in a normal game, with Omega accuracy of 90% would be \$799k.

Also, by adjusting the game's payouts we could hypothetically make any amount of genuine human predictability (even just a reliable 51% accuracy) be enough to motivate one boxing.

The super simplistic conceptual question here is the distinction between two kinds of sincerity. One kind of sincerity is assessed at the time of the promise. The other kind of sincerity is assessed retrospectively by seeing whether the promise was upheld.

Then the standard version of the game tries to put a wedge between these concepts by supposing that maybe an initially sincere promise might be violated by the intervention of something like "free will", and it tries to make this seem slightly more magical (more of a far mode question?) by imagining that the promise was never even uttered, but rather the promise was stolen from the person by the magical mind reading "Omega" entity before the promise was ever even imagined by the person as being possible to make.

One thing that seems clear to me is that if one boxing is profitable but not certain then you might wish you could have done something in the past that would make it clear that you'll one box, so that you land in the part of Omega's calculations where the prediction is easy, rather than being one of the edge cases where Omega really has to work for its brier score.

On the other hand, the setup is also (probably purposefully) quite fishy. The promise that "you made" is originally implicit, and depending on your understanding of the game maybe extremely abstract. Omega doesn't just tell you what it predicted. If you get one box and get nothing and complain then Omega will probably try to twist it around and blame you for its failed prediction. If it all works then you seem to be getting free money, and why is anyone handing out free money?

The whole thing just "feels like the setup for a scam". Like you one box, get a million, then in your glow of positive trust you give some money to their charitable cause. Then it turns out the charitable cause was fake. Then it turns out the million dollars was counterfeit but your donation was real. Sucker!

And yet... you know, parents actually are pretty good at knowing when their kids are telling the truth or lying. And parents really do give their kids a free lunch. And it isn't really a scam, it is just normal life as a mortal human being.

But also in the end, for someone to look their parents in the eyes and promise to be home before 10PM and really mean it for reals at the time of the promise, and then be given the car keys, and then come home at 1AM... that also happens. And wouldn't it be great to just blame that on "free will" and "the 10% of the time that Omega's predictions fail"?

Looping this back around to the larger AGI question, it seems like what we're basically hoping for is to learn how to become a flawless Omega (or at least build some software that can do this job) at least for the restricted case of an AGI that we can give the car keys without fear that after it has the car keys it will play the "free will" card and grind us all up into fuel paste after promising not to.