The following may well be the most controversial dilemma in the history of decision theory:

    A superintelligence from another galaxy, whom we shall call Omega, comes to Earth and sets about playing a strange little game.  In this game, Omega selects a human being, sets down two boxes in front of them, and flies away.

    Box A is transparent and contains a thousand dollars.
    Box B is opaque, and contains either a million dollars, or nothing.

    You can take both boxes, or take only box B.

    And the twist is that Omega has put a million dollars in box B iff Omega has predicted that you will take only box B.

    Omega has been correct on each of 100 observed occasions so far - everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars.  (We assume that box A vanishes in a puff of smoke if you take only box B; no one else can take box A afterward.)

    Before you make your choice, Omega has flown off and moved on to its next game.  Box B is already empty or already full.

    Omega drops two boxes on the ground in front of you and flies off.

    Do you take both boxes, or only box B?

    And the standard philosophical conversation runs thusly:

    One-boxer:  "I take only box B, of course.  I'd rather have a million than a thousand."

    Two-boxer:  "Omega has already left.  Either box B is already full or already empty.  If box B is already empty, then taking both boxes nets me $1000, taking only box B nets me $0.  If box B is already full, then taking both boxes nets $1,001,000, taking only box B nets $1,000,000.  In either case I do better by taking both boxes, and worse by leaving a thousand dollars on the table - so I will be rational, and take both boxes."

    One-boxer:  "If you're so rational, why ain'cha rich?"

    Two-boxer:  "It's not my fault Omega chooses to reward only people with irrational dispositions, but it's already too late for me to do anything about that."

    There is a large literature on the topic of Newcomblike problems - especially if you consider the Prisoner's Dilemma as a special case, which it is generally held to be.  "Paradoxes of Rationality and Cooperation" is an edited volume that includes Newcomb's original essay.  For those who read only online material, this PhD thesis summarizes the major standard positions.

    I'm not going to go into the whole literature, but the dominant consensus in modern decision theory is that one should two-box, and Omega is just rewarding agents with irrational dispositions.  This dominant view goes by the name of "causal decision theory".

    As you know, the primary reason I'm blogging is that I am an incredibly slow writer when I try to work in any other format.  So I'm not going to try to present my own analysis here.  Way too long a story, even by my standards.

    But it is agreed even among causal decision theorists that if you have the power to precommit yourself to take one box, in Newcomb's Problem, then you should do so.  If you can precommit yourself before Omega examines you; then you are directly causing box B to be filled.

    Now in my field - which, in case you have forgotten, is self-modifying AI - this works out to saying that if you build an AI that two-boxes on Newcomb's Problem, it will self-modify to one-box on Newcomb's Problem, if the AI considers in advance that it might face such a situation.  Agents with free access to their own source code have access to a cheap method of precommitment.

    What if you expect that you might, in general, face a Newcomblike problem, without knowing the exact form of the problem?  Then you would have to modify yourself into a sort of agent whose disposition was such that it would generally receive high rewards on Newcomblike problems.

    But what does an agent with a disposition generally-well-suited to Newcomblike problems look like?  Can this be formally specified?

    Yes, but when I tried to write it up, I realized that I was starting to write a small book.  And it wasn't the most important book I had to write, so I shelved it.  My slow writing speed really is the bane of my existence.  The theory I worked out seems, to me, to have many nice properties besides being well-suited to Newcomblike problems.  It would make a nice PhD thesis, if I could get someone to accept it as my PhD thesis.  But that's pretty much what it would take to make me unshelve the project.  Otherwise I can't justify the time expenditure, not at the speed I currently write books.

    I say all this, because there's a common attitude that "Verbal arguments for one-boxing are easy to come by, what's hard is developing a good decision theory that one-boxes" - coherent math which one-boxes on Newcomb's Problem without producing absurd results elsewhere.  So I do understand that, and I did set out to develop such a theory, but my writing speed on big papers is so slow that I can't publish it.  Believe it or not, it's true.

    Nonetheless, I would like to present some of my motivations on Newcomb's Problem - the reasons I felt impelled to seek a new theory - because they illustrate my source-attitudes toward rationality.  Even if I can't present the theory that these motivations motivate...

    First, foremost, fundamentally, above all else:

    Rational agents should WIN.

    Don't mistake me, and think that I'm talking about the Hollywood Rationality stereotype that rationalists should be selfish or shortsighted.  If your utility function has a term in it for others, then win their happiness.  If your utility function has a term in it for a million years hence, then win the eon.

    But at any rate, WIN.  Don't lose reasonably, WIN.

    Now there are defenders of causal decision theory who argue that the two-boxers are doing their best to win, and cannot help it if they have been cursed by a Predictor who favors irrationalists.  I will talk about this defense in a moment.  But first, I want to draw a distinction between causal decision theorists who believe that two-boxers are genuinely doing their best to win; versus someone who thinks that two-boxing is the reasonable or the rational thing to do, but that the reasonable move just happens to predictably lose, in this case.  There are a lot of people out there who think that rationality predictably loses on various problems - that, too, is part of the Hollywood Rationality stereotype, that Kirk is predictably superior to Spock.

    Next, let's turn to the charge that Omega favors irrationalists.  I can conceive of a superbeing who rewards only people born with a particular gene, regardless of their choices.  I can conceive of a superbeing who rewards people whose brains inscribe the particular algorithm of "Describe your options in English and choose the last option when ordered alphabetically," but who does not reward anyone who chooses the same option for a different reason.  But Omega rewards people who choose to take only box B, regardless of which algorithm they use to arrive at this decision, and this is why I don't buy the charge that Omega is rewarding the irrational.  Omega doesn't care whether or not you follow some particular ritual of cognition; Omega only cares about your predicted decision.

    We can choose whatever reasoning algorithm we like, and will be rewarded or punished only according to that algorithm's choices, with no other dependency - Omega just cares where we go, not how we got there.

    It is precisely the notion that Nature does not care about our algorithm, which frees us up to pursue the winning Way - without attachment to any particular ritual of cognition, apart from our belief that it wins.  Every rule is up for grabs, except the rule of winning.

    As Miyamoto Musashi said - it's really worth repeating:

    "You can win with a long weapon, and yet you can also win with a short weapon.  In short, the Way of the Ichi school is the spirit of winning, whatever the weapon and whatever its size."

    (Another example:  It was argued by McGee that we must adopt bounded utility functions or be subject to "Dutch books" over infinite times.  But:  The utility function is not up for grabs.  I love life without limit or upper bound:  There is no finite amount of life lived N where I would prefer a 80.0001% probability of living N years to an 0.0001% chance of living a googolplex years and an 80% chance of living forever.  This is a sufficient condition to imply that my utility function is unbounded.  So I just have to figure out how to optimize for that morality.  You can't tell me, first, that above all I must conform to a particular ritual of cognition, and then that, if I conform to that ritual, I must change my morality to avoid being Dutch-booked.  Toss out the losing ritual; don't change the definition of winning.  That's like deciding to prefer $1000 to $1,000,000 so that Newcomb's Problem doesn't make your preferred ritual of cognition look bad.)

    "But," says the causal decision theorist, "to take only one box, you must somehow believe that your choice can affect whether box B is empty or full - and that's unreasonable!  Omega has already left!  It's physically impossible!"

    Unreasonable?  I am a rationalist: what do I care about being unreasonable?  I don't have to conform to a particular ritual of cognition.  I don't have to take only box B because I believe my choice affects the box, even though Omega has already left.  I can just... take only box B.

    I do have a proposed alternative ritual of cognition which computes this decision, which this margin is too small to contain; but I shouldn't need to show this to you.  The point is not to have an elegant theory of winning - the point is to win; elegance is a side effect.

    Or to look at it another way:  Rather than starting with a concept of what is the reasonable decision, and then asking whether "reasonable" agents leave with a lot of money, start by looking at the agents who leave with a lot of money, develop a theory of which agents tend to leave with the most money, and from this theory, try to figure out what is "reasonable".  "Reasonable" may just refer to decisions in conformance with our current ritual of cognition - what else would determine whether something seems "reasonable" or not?

    From James Joyce (no relation), Foundations of Causal Decision Theory:

    Rachel has a perfectly good answer to the "Why ain't you rich?" question.  "I am not rich," she will say, "because I am not the kind of person the psychologist thinks will refuse the money.  I'm just not like you, Irene.  Given that I know that I am the type who takes the money, and given that the psychologist knows that I am this type, it was reasonable of me to think that the $1,000,000 was not in my account.  The $1,000 was the most I was going to get no matter what I did.  So the only reasonable thing for me to do was to take it."

    Irene may want to press the point here by asking, "But don't you wish you were like me, Rachel?  Don't you wish that you were the refusing type?"  There is a tendency to think that Rachel, a committed causal decision theorist, must answer this question in the negative, which seems obviously wrong (given that being like Irene would have made her rich).  This is not the case.  Rachel can and should admit that she does wish she were more like Irene.  "It would have been better for me," she might concede, "had I been the refusing type."  At this point Irene will exclaim, "You've admitted it!  It wasn't so smart to take the money after all."  Unfortunately for Irene, her conclusion does not follow from Rachel's premise.  Rachel will patiently explain that wishing to be a refuser in a Newcomb problem is not inconsistent with thinking that one should take the $1,000 whatever type one is.  When Rachel wishes she was Irene's type she is wishing for Irene's options, not sanctioning her choice.

    It is, I would say, a general principle of rationality - indeed, part of how I define rationality - that you never end up envying someone else's mere choices.  You might envy someone their genes, if Omega rewards genes, or if the genes give you a generally happier disposition.  But Rachel, above, envies Irene her choice, and only her choice, irrespective of what algorithm Irene used to make it.  Rachel wishes just that she had a disposition to choose differently.

    You shouldn't claim to be more rational than someone and simultaneously envy them their choice - only their choice.  Just do the act you envy.

    I keep trying to say that rationality is the winning-Way, but causal decision theorists insist that taking both boxes is what really wins, because you can't possibly do better by leaving $1000 on the table... even though the single-boxers leave the experiment with more money.  Be careful of this sort of argument, any time you find yourself defining the "winner" as someone other than the agent who is currently smiling from on top of a giant heap of utility.

    Yes, there are various thought experiments in which some agents start out with an advantage - but if the task is to, say, decide whether to jump off a cliff, you want to be careful not to define cliff-refraining agents as having an unfair prior advantage over cliff-jumping agents, by virtue of their unfair refusal to jump off cliffs.  At this point you have covertly redefined "winning" as conformance to a particular ritual of cognition.  Pay attention to the money!

    Or here's another way of looking at it:  Faced with Newcomb's Problem, would you want to look really hard for a reason to believe that it was perfectly reasonable and rational to take only box B; because, if such a line of argument existed, you would take only box B and find it full of money?  Would you spend an extra hour thinking it through, if you were confident that, at the end of the hour, you would be able to convince yourself that box B was the rational choice?  This too is a rather odd position to be in.  Ordinarily, the work of rationality goes into figuring out which choice is the best - not finding a reason to believe that a particular choice is the best.

    Maybe it's too easy to say that you "ought to" two-box on Newcomb's Problem, that this is the "reasonable" thing to do, so long as the money isn't actually in front of you.  Maybe you're just numb to philosophical dilemmas, at this point.  What if your daughter had a 90% fatal disease, and box A contained a serum with a 20% chance of curing her, and box B might contain a serum with a 95% chance of curing her?  What if there was an asteroid rushing toward Earth, and box A contained an asteroid deflector that worked 10% of the time, and box B might contain an asteroid deflector that worked 100% of the time?

    Would you, at that point, find yourself tempted to make an unreasonable choice?

    If the stake in box B was something you could not leave behind?  Something overwhelmingly more important to you than being reasonable?  If you absolutely had to win - really win, not just be defined as winning?

    Would you wish with all your power that the "reasonable" decision was to take only box B?

    Then maybe it's time to update your definition of reasonableness.

    Alleged rationalists should not find themselves envying the mere decisions of alleged nonrationalists, because your decision can be whatever you like.  When you find yourself in a position like this, you shouldn't chide the other person for failing to conform to your concepts of reasonableness.  You should realize you got the Way wrong.

    So, too, if you ever find yourself keeping separate track of the "reasonable" belief, versus the belief that seems likely to be actually true.  Either you have misunderstood reasonableness, or your second intuition is just wrong.

    Now one can't simultaneously define "rationality" as the winning Way, and define "rationality" as Bayesian probability theory and decision theory.  But it is the argument that I am putting forth, and the moral of my advice to Trust In Bayes, that the laws governing winning have indeed proven to be math.  If it ever turns out that Bayes fails - receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions - then Bayes has to go out the window.  "Rationality" is just the label I use for my beliefs about the winning Way - the Way of the agent smiling from on top of the giant heap of utility.  Currently, that label refers to Bayescraft.

    I realize that this is not a knockdown criticism of causal decision theory - that would take the actual book and/or PhD thesis - but I hope it illustrates some of my underlying attitude toward this notion of "rationality".

    You shouldn't find yourself distinguishing the winning choice from the reasonable choice.  Nor should you find yourself distinguishing the reasonable belief from the belief that is most likely to be true.

    That is why I use the word "rational" to denote my beliefs about accuracy and winning - not to denote verbal reasoning, or strategies which yield certain success, or that which is logically provable, or that which is publicly demonstrable, or that which is reasonable.

    As Miyamoto Musashi said:

    "The primary thing when you take a sword in your hands is your intention to cut the enemy, whatever the means. Whenever you parry, hit, spring, strike or touch the enemy's cutting sword, you must cut the enemy in the same movement. It is essential to attain this. If you think only of hitting, springing, striking or touching the enemy, you will not be able actually to cut him."

    New Comment
    617 comments, sorted by Click to highlight new comments since:
    Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

    Either box B is already full or already empty.

    I'm not going to go into the whole literature, but the dominant consensus in modern decision theory is that one should two-box, and Omega is just rewarding agents with irrational dispositions. This dominant view goes by the name of "causal decision theory".

    I suppose causal decision theory assumes causality only works in one temporal direction. Confronted with a predictor that was right 100 out of 100 times, I would think it very likely that backward-in-time causation exists, and take only B. I assume this would, as you say, produce absurd results elsewhere.

    Decisions aren't physical.

    The above statement is at least hard to defend. Your decisions are physical and occur inside of you... So these two-boxers are using the wrong model amongst these two (see the drawings....)

    If you are a part of physics, so is your decision, so it must account for the correlation between your thought processes and the superintelligence. Once it accounts for that, you decide to one box, because you understood the entanglement of the computation done by omega and the physical process going inside your skull.

    If the entanglement is there, you are not looking at it from the outside, you are inside the process.

    Our minds have this quirk that makes us think there are two moments, you decide, and then you cheat, you get to decide again. But if you are only allowed to decide once, which is the case, you are rational by one-boxing.

    I think you capture the essence of the solution, here.
    1Motor Vehicle
    Is it possible for someone to explain why, if your decision is a part of physics, your decision must account for the correlation between thought processes and the superintelligence? 
    From what I understand, to be a "Rational Agent" in game theory means someone who maximises their utility function (and not the one you ascribe to them). To say Omega is rewarding irrational agents isn't necessarily fair, since payoffs aren't always about the money. Lottery tickets are a good example this. What if my utility function says the worst outcome is living the rest of my life with regrets that I didn't one box? Then I can one box and still be a completely rational agent.

    You're complicating the problem too much by bringing in issues like regret. Assume for sake of argument that Newcomb's problem is to maximize the amount of money you receive. Don't think about extraneous utility issues.

    Fair point. There are too many hidden variables already without me explicitly adding more. If Newcomb's problem is to maximise money recieved (with no regard for what it seen as reasonable), the "Why ain't you rich argument seems like a fairly compelling one doesn't it? Winning the money is all that matters. I just realised that all I've really done is paraphrase the original post. Curse you source monitoring error!
    Lottery tickets exploit a completely different failure of rationality, that being our difficulties with small probabilities and big numbers, and our problems dealing with scale more generally. (ETA: The fantasies commonly cited in the context of lotteries' "true value" are a symptom of this failure.) It's not hard to come up with a game-theoretic agent that maximizes its payoffs against that kind of math. Second-guessing other agents' models is considerably harder. I haven't given much thought to this particular problem for a while, but my impression is that Newcomb exposes an exploit in simpler decision theories that's related to that kind of recursive modeling: naively, if you trust Omega's judgment of your psychology, you pick the one-box option, and if you don't, you pick up both boxes. Omega's track record gives us an excellent reason to trust its judgment from a probabilistic perspective, but it's trickier to come up with an algorithm that stabilizes on that solution without immediately trying to outdo itself.
    So for my own clarification, if I buy a lottery ticket with a perfect knowledge of how probable it is my ticket will win, does this make me irrational?
    Well, I fail to see any need for backward-in-time causation to get the prediction right 100 out of 100 times. As far as I understand, similar experiments have been performed in practice and homo sapiens are quite split in two groups 'one-boxers' and 'two-boxers' who generally have strong preferences towards one or other due to whatever differences in their education, logic experience, genetics, reasoning style or whatever factors that are somewhat stable specific to that individual. Having perfect predictive power (or even the possibility of it existing) is implied and suggested, but it's not really given, it's not really necessary, and IMHO it's not possible and not useful to use this 'perfect predictive power' in any reasoning here. From the given data in the situation (100 out of 100 that you saw), you know that Omega is a super-intelligent sorter who somehow manages to achieve 99.5% or better accuracy in sorting people into one-boxers and two-boxers. This accuracy seems also higher than the accuracy of most (all?) people in self-evaluation, i.e., as in many other decision scenarios, there is a significant difference in what people believe they would decide in situation X, and what they actually decide if it happens. [citation might be needed, but I don't have one at the moment, I do recall reading papers about such experiments]. The 'everybody is a perfect logician/rationalist and behaves as such' assumption often doesn't hold up in real life even for self-described perfect rationalists who make strong conscious effort to do so. In effect, data suggests that probably Omega knows your traits and decision chances (taking into account you taking into account all this) better than you do - it's simply smarter than homo sapiens. Assuming that this is really so, it's better for you to choose option B. Assuming that this is not so, and you believe that you can out-analyze Omega's perception of yourself, then you should choose the opposite of whatever Omega would t
    So what you're saying is that the only reason this problem is a problem is because the problem hasn't been defined narrowly enough. You don't know what Omega is capable of, so you don't know which choice to make. So there is no way to logically solve the problem (with the goal of maximizing utility) without additional information. Here's what I'd do: I'd pick up B, open it, and take A iff I found it empty. That way, Omega's decision of what to put in the box would have to incorporate the variable of what Omega put in the box, causing an infinite regress which will use all cpu cycles until the process is terminated. Although that'll probably result in the AI picking an easier victim to torment and not even giving me a measly thousand dollars.
    Okay... so since you already know, in advance of getting the boxes, that that's what you'd know, Omega can deduce that. So you open Box B, find it empty, and then take Box A. Enjoy your $1000. Omega doesn't need to infinite loop that one; he knows that you're the kind of person who'd try for Box A too.
    No, putting $1 million in box B works to. Origin64 opens box B, takes the money, and doesn't take box A. It's like "This sentence is true." - whatever Omega does makes the prediction valid.
    Which means you might end up with either amount of money, since you don't really know enough about Omega , instead of just the one box winnings. So you should still just one box?
    Not how Omega looks at it. By definition, Omega looks ahead, sees a branch in which you would go for Box A, and puts nothing in Box B. There's no cheating Omega... just like you can't think "I'm going to one-box, but then open Box A after I've pocketed the million" there's no "I'm going to open Box B first, and decide whether or not to open Box A afterward". Unless Omega is quite sure that you have precommitted to never opening Box A ever, Box B contains nothing; the strategy of leaving Box A as a possibility if Box B doesn't pan out is a two-box strategy, and Omega doesn't allow it.
    Well, this isn't quite true. What Omega cares about is whether you will open Box A. From Omega's perspective it makes no difference whether you've precommitted to never opening it, or whether you've made no such precommitment but it turns out you won't open it for other reasons.
    Assuming that Omega's "prediction" is in good faith, and that we can't "break" him as a predictor as a side effect of exploiting casuality loops etc. in order to win.
    I'm not sure I understood that, but if I did, then yes, assuming that Omega is as described in the thought experiment. Of course, if Omega has other properties (for example, is an unreliable predictor) other things follow.
    If you look in box B before deciding whether to choose box A, then you can force Omega to be wrong. That sounds like so much fun that I might choose it over the $1000.
    That's the popular understanding (or lack thereof) here and among philosophers in general. Philosophers just don't get math. If the decision theory is called causal but doesn't itself make any references to physics, then that's a slightly misleading name. I've written on that before The math doesn't go "hey hey, the theory is named causal therefore you can't treat 2 robot arms controlled by 2 control computers that run one function on one state, the same as 2 robot arms controlled by 1 computer". Confused sloppy philosophers do. Also, the best case is to be predicted to 1-box but 2-box in reality. If the prediction works by backwards causality, well then causal decision theory one-boxes. If the prediction works by simulation, the causal decision theory can either have world model where both the value inside predictor and the value inside actual robot are represented by same action A, and 1-box, or it can have uncertainty as of whenever the world outside of it is normal reality or predictor's simulator, where it will again one box (assuming it cares about the real money even if it is inside predictor, which it would if it needs money to pay for e.g. it's child's education). It will also 1-box in simulator and 2-box in reality if it can tell those apart.
    I'm confused. Causal decision theory was invented or formalised almost entirely by philosophers. It takes the 'causal' in its name from its reliance on inductive logic and inference. It doesn't make sense to claim that philosophers are being sloppy about the word 'causal' here, and claiming that causal decision theory will accept backwards causality and one-box is patently false unless you mean something other than what the symbol 'causal decision theory' refers to when you say 'causal decision theory'.
    Firstly, the notion that the actions should be chosen based on their consequences, taking the actions as cause of the consequences, was definitely not invented by philosophers. Secondarily, the logical causality is not identical to physical causality (the latter is dependent on specific laws of physics). Thirdly, not all philosophers are sloppy; some are very sloppy some are less sloppy. Fourth, anything that was not put in mathematical form to be manipulated using formal methods, is not formalized. When you formalize stuff you end up stripping notion of self unless explicitly included as part of formalism, stripping notion of the time where the math is working unless explicitly included as part of formalism, and so on, ending up without the problem. Maybe you are correct; it is better to let symbol 'causal decision theory' to refer to confused philosophy. Then we would need some extra symbol for how the agents implementable using mathematics actually decide (and how robots that predict outcomes of their actions on a world model actually work), which is very very similar to 'causal decision theory' sans all the human preconditions of what self is.
    I notice I actually agree with you - if we did try, using mathematics, to implement agents who decide and predict in the manner you describe, we'd find it incorrect to describe these agents as causal decision theory agents. In fact, I also expect we'd find ourselves disillusioned with CDT in general, and if philosophers brought it up, we'd direct them to instead engage with the much more interesting agents we've mathematically formalised.
    Well, each philosopher's understanding of CDT seem to differ from the other: The notion that the actions should be chosen based on consequences - as expressed in the formula here - is perfectly fine, albeit incredibly trivial. Can formalize that all the way into agent. Written such agents myself. Still need a symbol to describe this type of agent. But philosophers go from this to "my actions should be chosen based on consequences", and it is all about the true meaning of self and falls within the purview of your conundrums of philosophy . Having 1 computer control 2 robots arms wired in parallel, and having 2 computers running exact same software as before, controlling 2 robot arms, there's no difference for software engineering, its a minor detail that has been entirely abstracted from software. There is difference for philosophizing thought because you can't collapse logical consequences and physical causality into one thing in the latter case. edit: anyhow. to summarize my point: In terms of agents actually formalized in software, one-boxing is only a matter of implementing predictor into world model somehow, either as second servo controlled by same control variables, or as uncertain world state outside the senses (in the unseen there's either real world or simulator that affects real world via hand of predictor). No conceptual problems what so ever. edit: Good analogy, 'twin paradox' in special relativity. There's only paradox if nobody done the math right.
    @Nick_Tarleton Agreed, the problem immediately reminded me of "retroactive preparation" and time-loop logic. It is not really the same reasonning, but it has the same "turn causality on its head" aspect. If I don't have proof of the reliability of Omega's predictions, I find myself less likely to be "unreasonnable" when the stakes are higher (that is, I'm more likely to two-box if it's about saving the world). I find it highly unlikely that an entity wandering across worlds can predict my actions to this level of detail, as it seems way harder than traveling through space or teleporting money. I might risk a net loss of $1 000 to figure it out (much like I'd be willing to spend $1000 to interact with such a space-traveling stuff-teleporting entity), but not a loss of a thousand lives. In the game as the article describe it, I would only one-box if "the loss of what box A contains and nothing in B" was an acceptable outcome. I would be increasingly likely to one-box as the probability of the AI being actually able to predict my actions in advance increases.
    The thing is, this 'modern decision theory', rather than being some sort of central pillar as you'd assume from the name, is mostly philosophers "struggling in the periphery to try to tell us something", as Feynman once said about philosophers of science. When it comes to any actual software which does something, this everyday notion of 'causality' proves to be a very slippery concept. This Rude Goldberg machine - like model of the world, where you push a domino and it pushes another domino, and the chain goes to your reward, that's just very approximate physics that people tend to use to make decisions, it's not fundamental, and interesting models of decision making are generally set up to learn that from observed data (which of course makes it impossible to do lazy philosophy involving various verbal hypotheticals where the observations that would lead the agent to believe the problem set up are not specified).

    People seem to have pretty strong opinions about Newcomb's Problem. I don't have any trouble believing that a superintelligence could scan you and predict your reaction with 99.5% accuracy.

    I mean, a superintelligence would have no trouble at all predicting that I would one-box... even if I hadn't encountered the problem before, I suspect.

    Ultimately you either interpret "superintelligence" as being sufficient to predict your reaction with significant accuracy, or not. If not, the problem is just a straightforward probability question, as explained here, and becomes uninteresting.

    Otherwise, if you interpret "superintelligence" as being sufficient to predict your reaction with significant accuracy (especially a high accuracy like >99.5%), the words of this sentence...

    And the twist is that Omega has put a million dollars in box B iff Omega has predicted that you will take only box B.

    ...simply mean "One-box to win, with high confidence."

    Summary: After disambiguating "superintelligence" (making the belief that Omega is a superintelligence pay rent), Newcomb's problem turns into either a straightforward probability question or a fairly simple issue of rearranging the words in equivalent ways to make the winning answer readily apparent.

    If you won't explicitly state your analysis, maybe we can try 20 questions?

    I have suspected that supposed "paradoxes" of evidential decision theory occur because not all the evidence was considered. For example, the fact that you are using evidential decision theory to make the decision.


    Hmm, changed my mind, should have thought more before writing... the EDT virus has early symptoms of causing people to use EDT before progressing to terrible illness and death. It seems EDT would then recommend not using EDT.

    I one-box, without a moment's thought.

    The "rationalist" says "Omega has already left. How could you think that your decision now affects what's in the box? You're basing your decision on the illusion that you have free will, when in fact you have no such thing."

    To which I respond "How does that make this different from any other decision I'll make today?"


    I think the two box person is confused about what it is to be rational, it does not mean "make a fancy argument," it means start with the facts, abstract from them, and reason about your abstractions.

    In this case if you start with the facts you see that 100% of people who take only box B win big, so rationally, you do the same. Why would anyone be surprised that reason divorced from facts gives the wrong answer?

    Precisely. I've been reading a lot about the Monty Hall problem recently, and I feel that it's a relevant conundrum. The confused rationalist will say: but my choice CANNOT cause a linear entaglement, the reward is predecided. But the functional rationalist will see that agents who one-box (or switch doors, in the case of Monty Hall) consistently win. It is demonstrably a more effective strategy. You work with the facts and evidence available to you. Regardless of how counter-intuitive the resulting strategy becomes.
    Precisely. I've been reading a lot about the Monty Hall Problem recently (, and I feel that it's a relevant conundrum. The confused rationalist will say: but my choice CANNOT cause a linear entaglement, the reward is predecided. But the functional rationalist will see that agents who one-box (or switch doors, in the case of Monty Hall) consistently win. It is demonstrably a more effective strategy. You work with the facts and evidence available to you and abstract out from there. Regardless of how counter-intuitive the resulting strategy becomes.

    This dilemma seems like it can be reduced to:

    1. If you take both boxes, you will get $1000
    2. If you only take box B, you will get $1M Which is a rather easy decision.

    There's a seemingly-impossible but vital premise, namely, that your action was already known before you acted. Even if this is completely impossible, it's a premise, so there's no point arguing it.

    Another way of thinking of it is that, when someone says, "The boxes are already there, so your decision cannot affect what's in them," he is wrong. It has been assumed that your decision does affect what's in them, so the fact that you cannot imagine how that is possible is wholly irrelevant.

    In short, I don't understand how this is controversial when the decider has all the information that was provided.

    Actually, we don't know that our decision affects the contents of Box B. In fact, we're told that it contains a million dollars if-and-only-if Omega predicts we will only take Box B. It is possible that we could pick Box B even tho Omega predicted we would take both boxes. Omega has only observed to have predicted correctly 100 times. And if we are sufficiently doubtful whether Omega would predict that we would take only Box B, it would be rational to take both boxes. Only if we're somewhat confident of Omega's prediction can we confidently one-box and rationally expect it to contain a million dollars.
    51% confidence would suffice. * Two-box expected value: 0.51 $1K + 0.49 $1.001M = $491000 * One-box expected value: 0.51 $1M + 0.49 $0 = $510000
    You're saying that we live in a universe where Newcomb's problem is impossible because the future doesn't effect the past. I'll re-phrase this problem in such a way that it seems plausible in our universe: I've got really nice scanning software. I scan your brain down to the molecule, and make a virtual representation of it on a computer. I run virtual-you in my software, and give virtual-you Newcomb's problem. Virtual-you answers, and I arrange my boxes according to that answer. I come back to real-you. You've got no idea what's going on. I explain the scenario to you and I give you Newcomb's problem. How do you answer? This particular instance of the problem does have an obvious, relatively uncomplicated solution: Lbh unir ab jnl bs xabjvat jurgure lbh ner rkcrevrapvat gur cneg bs gur fvzhyngvba, be gur cneg bs gur syrfu-naq-oybbq irefvba. Fvapr lbh xabj gung obgu jvyy npg vqragvpnyyl, bar-obkvat vf gur fhcrevbe bcgvba. If for any reason you suspect that the Predictor can reach a sufficient level of accuracy to justify one-boxing, you one box. It doesn't matter what sort of universe you are in.
    Not that I disagree with the one-boxing conclusion, but this formulation requires physically reducible free will (which has recently been brought back into discussion). It would also require knowing the position and momentum of a lot of particles to arbitrary precision, which is provably impossible.
    We don't need a perfect simulation for the purposes of this problem in the abstract - we just need a situation such that the problem-solver assigns better-than-chance predicting power to the Predictor, and a sufficiently high utility differential between winning and losing. The "perfect whole brain simulation" is an extreme case which keeps things intuitively clear. I'd argue that any form of simulation which performs better than chance follows the same logic. The only way to escape the conclusion via simulation is if you know something that Omega doesn't - for example, you might have some secret external factor modify your "source code" and alter your decision after Omega has finished examining you. Beating Omega essentially means that you need to keep your brain-state in such a form that Omega can't deduce that you'll two-box. As Psychohistorian3 pointed out, the power that you've assigned to Omega predicting accurately is built into the problem. Your estimate of the probability that you will succeed in deception via the aforementioned method or any other is fixed by the problem. In the real world, you are free to assign whatever probability you want to your ability to deceive Omega's predictive mechanisms, which is why this problem is counter intuitive.
    7Eliezer Yudkowsky
    Also: You can't simultaneously claim that any rational being ought to two-box, this being the obvious and overdetermined answer, and also claim that it's impossible for anyone to figure out that you're going to two-box.
    Right, any predictor with at least a 50.05% accuracy is worth one-boxing upon (well, maybe a higher percentage for those with concave functions in money). A predictor with sufficiently high accuracy that it's worth one-boxing isn't unrealistic or counterintuitive at all in itself, but it seems (to me at least) that many people reach the right answer for the wrong reason: the "you don't know whether you're real or a simulation" argument. Realistically, while backwards causality isn't feasible, neither is precise mind duplication. The decision to one-box can be rationally reached without those reasons: you choose to be the kind of person to (predictably) one-box, and as a consequence of that, you actually do one-box.
    Oh, that's fair. I was thinking of "you don't know whether you're real or a simulation" as an intuitive way to prove the case for all "conscious" simulations. It doesn't have to be perfect - you could just as easily be an inaccurate simulation, with no way to know that you are a simulation and no way to know that you are inaccurate with respect to an original. I was trying to get people to generalize downwards from the extreme intuitive example- Even with decreasing accuracy, as the simulation becomes so rough as to lose "consciousness" and "personhood", the argument keeps holding.
    Yeah, the argument would hold just as much with an inaccurate simulation as with an accurate one. The point I was trying to make wasn't so much that the simulation isn't going to be accurate enough, but that a simulation argument shouldn't be a prerequisite to one-boxing. If the experiment were performed with human predictors (let's say a psychologist who predicts correctly 75% of the time), one-boxing would still be rational despite knowing you're not a simulation. I think LW relies on computationalism as a substitute for actually being reflectively consistent in problems such as these.
    The trouble with real world examples is that we start introducing knowledge into the problem that we wouldn't ideally have. The psychologist's 75% success rate doesn't necessarily apply to you - in the real world you can make a different estimate than the one that is given. If you're an actor or a poker player, you'll have a much different estimate of how things are going to work out. Psychologists are just messier versions of brain scanners - the fundamental premise is that they are trying to access your source code. And what's more - suppose the predictions weren't made by accessing your source code? The direction of causality does matter. If Omega can predict the future, the causal lines flow backwards from your choice to Omega's past move. If Omega is scanning your brain, the causal lines go from your brain-state to Omega's decision. If there are no causal lines between your brain/actions and Omega's choice, you always two-box. Real world example: what if I substituted your psychologist for a sociologist, who predicted you with above-chance accuracy using only your demographic factors? In this scenario, you aught to two-box - If you disagree, let me know and I can explain myself. In the real world, you don't know to what extent your psychologist is using sociology (or some other factor outside your control). People can't always articulate why, but their intuition (correctly) begins to make them deviate from the given success% estimate as more of these real-world variables get introduced.
    True, the 75% would merely be a past history (and I am in fact a poker player). Indeed, if the factors used were entirely or mostly comprised of factors beyond my control (and I knew this), I would two-box. However, two-boxing is not necessarily optimal because of a predictor whose prediction methods you do not know the mechanics of. In the limited predictor problem, the predictor doesn't use simulations/scanners of any sort but instead uses logic, and yet one-boxers still win.
    agreed. To add on to this: It's worth pointing out that Newcomb's problem always takes the form of Simpson's paradox. The one boxers beat the two boxers as a whole, but among agents predicted to one-box, the two boxers win, and among agents predicted to two-box, the two boxers win. The only reason to one-box is when your actions (which include both the final decision and the thoughts leading up to it) effect Omega's prediction. The general rule is: "Try to make Omega think you're one-boxing, but two-box whenever possible." It's just that in Newcomb's problem proper, fulfilling the first imperative requires actually one-boxing.
    So you would never one-box unless the simulator did some sort of scan/simulation upon your brain? But it's better to one-box and be derivable as the kind of person to (probably) one-box than to two-box and be derivable as the kind of person to (probably) two-box. Your final decision never affects the actual arrangement of the boxes, but its causes do.
    I'd one-box when Omega had sufficient access to my source-code. It doesn't have to be through scanning - Omega might just be a great face-reading psychologist. We're in agreement. As we discussed, this only applies insofar as you can control the factors that lead you to be classified as a one-boxer or a two-boxer. You can alter neither demographic information nor past behavior. But when (and only when) one-boxing causes you to be derived as a one-boxer, you should obviously one box. Well, that's true for this universe. I just assume we're playing in any given universe, some of which include Omegas who can tell the future (which implies bidirectional causality) - since Psychohistorian3 started out with that sort of thought when I first commented.
    Ok, so we do agree that it can be rational to one-box when predicted by a human (if they predict based upon factors you control such as your facial cues). This may have been a misunderstanding between us then, because I thought you were defending the computationalist view that you should only one-box if you might be an alternate you used in the prediction.
    yes, we do agree on that.
    Assuming that you have no information other than the base rate, and that it's equally likely to be wrong either way.
    An alternate solution which results in even more winning is to cerqvpg gung V znl or va fhpu n fvghngvba va gur shgher. Unir n ubbqyhz cebzvfr gung vs V'z rire va n arjpbzoyvxr fvghngvba gung ur jvyy guerngra gb oernx zl yrtf vs V qba'g 2-obk. Cnl gur ubbqyhz $500 gb frpher uvf cebzvfr. Gura pbzcyrgryl sbetrg nobhg gur jubyr neenatrzrag naq orpbzr n bar-obkre. Fpnaavat fbsgjner jvyy cerqvpg gung V 1-obk, ohg VEY V'z tbvat gb 2-obk gb nibvq zl yrtf trggvat oebxra.
    1Marion Z.
    But you've perfectly forgotten about the hoodlum, so you will in fact one box. Or, does the hoodlum somehow show up and threaten you in the moment between the scanner filling the boxes and you making your decision? That seems to add an element of delay and environmental modification that I don't think exists in the original problem, unless I'm misinterpreting.  Also, I feel like by analyzing your brain to some arbitrarily precise standard, the scanner could see 3 things:  You are (or were at some point in the past) likely to think of this solution, you are/were likely to actually go through with this solution, and the hoodlum's threat would, in fact, cause you to two-box, letting the scanner predict that you will two-box.
    Your decision doesn't affect what's in the boxes, but your decision procedure does, and that already exists when the question's being assigned. It may or may not be possible to derive your decision from the decision procedure you're using in the general case -- I haven't actually done the reduction, but at first glance it looks cognate to some problems that I know are undecidable -- but it's clearly possible in some cases, and it's at least not completely absurd to imagine an Omega with a very high success rate. As best I can tell, most of the confusion here comes from a conception of free will that decouples the decision from the procedure leading to it.
    Yeah, agreed. I often describe this as NP being more about what kind of person I am than it is about what decision I make, but I like your phrasing better.

    I'd love to say I'd find some way of picking randomly just to piss Omega off, but I'd probably just one-box it. A million bucks is a lot of money.

    2Ramana Kumar
    Would that make you a supersuperintelligence? Since I presume by "picking randomly" you mean randomly to Omega, in other words Omega cannot find and process enough information to predict you well. Otherwise what does "picking randomly" mean?
    The definition of omega as something that can predict your actions leads it to have some weird powers. You could pick a box based on the outcome of a quantum event with a 50% chance, then omega would have to vanish in a puff of physical implausibility.
    1Ramana Kumar
    What's wrong with Omega predicting a "quantum event"? "50% chance" is not an objective statement, and it may well be that Omega can predict quantum events. (If not, can you explain why not, or refer me to an explanation?)
    From wikipedia "In the formalism of quantum mechanics, the state of a system at a given time is described by a complex wave function (sometimes referred to as orbitals in the case of atomic electrons), and more generally, elements of a complex vector space.[9] This abstract mathematical object allows for the calculation of probabilities of outcomes of concrete experiments." This is the best formalism we have for predicting things at this scale and it only spits out probabilities. I would be surprised if something did a lot better!
    0Ramana Kumar
    As I understand it, probabilities are observed because there are observers in two different amplitude blobs of configuration space (to use the language of the quantum physics sequence) but "the one we are in" appears to be random to us. And mathematically I think quantum mechanics is the same under this view in which there is no "inherent, physical" randomness (so it would still be the best formalism we have for predicting things). Could you say what "physical randomness" could be if we don't allow reference to quantum mechanics? (i.e. is that the only example? and more to the point, does the notion make any sense?)
    You seem to have transitioned to another argument here... please clarify what this has to do with omega and its ability to predict your actions.
    0Ramana Kumar
    The new argument is about whether there might be inherently unpredictable things. If not, then your picking a box based on the outcome of a "quantum event" shouldn't make Omega any less physically plausible,
    What I didn't understand is why you removed quantum experiments from the discussion. I believe it is very plausible to have something that is physically unpredictable, as long as the thing doing the predicting is bound by the same laws as what you are trying to predict. Consider a world made of reversible binary gates with the same number of inputs as outputs (that is every input has a unique output, and vice versa). We want to predict one complex gate. Not a problem, just clone all the inputs and copy the gate. However you have to do that only using reversible binary gates. Lets start with cloning the bits. In is what you are trying to copy without modifying so that you can predict what affect it will have on the rest of the system. You need a minimum of two outputs, so you need another input B. You get to create the gate in order to copy the bit and predict the system. The ideal truth table looks something like In | B | Out | Copy 0 | 0 | 0 | 0 0 | 1 | 0 | 0 1 | 0 | 1 | 1 1 | 1 | 1 | 1 This violates our reversibility assumption. The best copier we could make is In | B | Out | Copy 0 | 0 | 0 | 0 0 | 1 | 1 | 0 1 | 0 | 0 | 1 1 | 1 | 1 | 1 This copies precisely, but mucks up the output making our copy useless for prediction. If you could control B, or knew the value of B then we could correct the Output. But as I have shown here finding out the value of a bit is non-trivial. The best we could do would be to find sources of bits with statistically predictable properties then use them for duplicating other bits. The world is expected to be reversible, and the no cloning theorem applies to reality which I think is stricter than my example. However I hope I have shown how a simple lawful universe can be hard to predict by something inside it. In short, stop thinking of yourself (and Omega) as an observer outside physics that does not interact with the world. Copying is disturbing.
    Even though I do not have time to reflect on the attempted proof and even though the attempted proof is best described as a stab at a sketch of a proof and even though this "reversible logic gates" approach to a proof probably cannot be turned into an actual proof and even though Nick Tarleton just explained why the "one box or two box depending on an inherently unpredictable event" strategy is not particularly relevant to Newcomb's, I voted this up and I congratulate the author (whpearson) because it is an attempt at an original proof of something very cool (namely, limits to an agent's ability to learn about its environment) and IMHO probably relevant to the Friendliness project. More proofs and informed stabs at proofs, please!
    I suspect Omega would know you were going to do that, and would be able to put the box in a superposition dependent on the same quantum event, so that in the branches where you 1-box, box B contains $1million, and where you 2-box it's empty.
    Exactly what I was thinking.

    It's often stipulated that if Omega predicts you'll use some randomizer it can't predict, it'll punish you by acting as if it predicted two-boxing.

    (And the most favourable plausible outcome for randomizing would be scaling the payoff appropriately to the probability assigned.)
    Newcomb's problem doesn't specify how Omega chooses the 'customers'. It's a quite realistic possibility that it simply has not offered the choice to anyone that would use a randomizer, and cherrypicked only the people which have at least 99.9% 'prediction strength'.

    It's a great puzzle. I guess this thread will degenerate into arguments pro and con. I used to think I'd take one box, but I read Joyce's book and that changed my mind.

    For the take-one-boxers:

    Do you believe, as you sit there with the two boxes in front of you, that their contents are fixed? That there is a "fact of the matter" as to whether box B is empty or not? Or is box B in a sort of intermediate state, halfway between empty and full? If so, do you generally consider that things momentarily out of sight may literally change their physical sta... (read more)

    Na-na-na-na-na-na, I am so sorry you only got $1000!

    Me, I'm gonna replace my macbook pro, buy an apartment and a car and take a two week vacation in the Bahamas, and put the rest in savings!


    Point: arguments don't matter, winning does.

    Oops. I had replied to this until I saw its parent was nearly 3 years old. So as I don't (quite) waste the typing:

    Do you believe, as you sit there with the two boxes in front of you, that their contents are fixed?


    That there is a "fact of the matter" as to whether box B is empty or not?


    Or is box B in a sort of intermediate state, halfway between empty and full?


    If so, do you generally consider that things momentarily out of sight may literally change their physical states into something indeterminate?


    Do you picture box B literally becoming empty and full as you change your opinion back and forth?

    If not, if you think box B is definitely either full or empty and there is no unusual physical state describing the contents of that box, then would you agree that nothing you do now can change the contents of the box?


    And if so, then taking the additional box cannot reduce what you get in box B.

    No, it can't. (But it already did.)

    If I take both boxes how much money do I get? $1,000

    If I take one box how much money do I get? $10,000,000 (or whatever it was instantiated to.)

    It seems that my questions were more useful than yours. Perhaps Joyce b... (read more)

    Yes. Yes. No. No. Yes. No, it can't. (But it already did.) If I take both boxes how much money do I get? $1,000 If I take one box how much money do I get? $10,000,000 (or whatever it was instantiated to.) It seems that my questions were more useful than yours. Perhaps Joyce beffudled you? It could be that he missed something. (Apart from counter-factual $9,999,000.) I responded to all your questions with the answers you intended to make the point that I don't believe those responses are at all incompatible with making the decision that earns you lots and lots of money.

    To quote E.T. Jaynes:

    "This example shows also that the major premise, “If A then B” expresses B only as a logical consequence of A; and not necessarily a causal physical consequence, which could be effective only at a later time. The rain at 10 AM is not the physical cause of the clouds at 9:45 AM. Nevertheless, the proper logical connection is not in the uncertain causal direction (clouds =⇒ rain), but rather (rain =⇒ clouds) which is certain, although noncausal. We emphasize at the outset that we are concerned here with logical connections, because some discussions and applications of inference have fallen into serious error through failure to see the distinction between logical implication and physical causation. The distinction is analyzed in some depth by H. A. Simon and N. Rescher (1966), who note that all attempts to interpret implication as expressing physical causation founder on the lack of contraposition expressed by the second syllogism (1–2). That is, if we tried to interpret the major premise as “A is the physical cause of B,” then we would hardly be able to accept that “not-B is the physical cause of not-A.” In Chapter 3 we shall see that attempts to interpret plausible inferences in terms of physical causation fare no better."

    @: Hal Finney:

    Certainly the box is either full or empty. But the only way to get the money in the hidden box is to precommit to taking only that one box. Not pretend to precommit, really precommit. If you try to take the $1,000, well then I guess you really hadn't precommitted after all. I might vascillate, I might even be unable to make such a rigid precommitment with myself (though I suspect I am), but it seems hard to argue that taking only one box is not the correct choice.

    I'm not entirely certain that acting rationally in this situation doesn't require an element of doublethink, but thats a topic for another post.

    I would be interested in know if your opinion would change if the "predictions" of the super-being were wrong .5% of the time, and some small number of people ended up with the $1,001,000 and some ended up with nothing. Would you still 1 box it?

    If a bunch of people have played the game already, then you can calculate the average payoff for a 1-boxer and that of a 2-boxer and pick the best one.

    I suppose I might still be missing something, but this still seems to me just a simple example of time inconsistency, where you'd like to commit ahead of time to something that later you'd like to violate if you could. You want to commit to taking the one box, but you also want to take the two boxes later if you could. A more familiar example is that we'd like to commit ahead of time to spending effort to punish people who hurt us, but after they hurt us we'd rather avoid spending that effort as the harm is already done.

    If I know that the situation has resolved itself in a manner consistent with the hypothesis that Omega has successfully predicted people's actions many times over, I have a high expectation that it will do so again.

    In that case, what I will find in the boxes is not independent of my choice, but dependent on it. By choosing to take two boxes, I cause there to be only $1,000 there. By choosing to take only one, I cause there to be $1,000,000. I can create either condition by choosing one way or another. If I can select between the possibilities, I prefer... (read more)

    Prediction <-> our choice, if we use the 100/100 record as equivalent with complete predictive accuracy. The "weird thing going on here" is that one value is set (that's what "he has already flown away" does), yet we are being told that we can change the other value. You see these reactions: 1) No, we can't toggle the other value, actually. Choice is not really in the premise, or is breaking the premise. 2) We can toggle the choice value, and it will set the predictive value accordingly. The prior value of the prediction does not exist or is not relevant. We have already equated "B wins" with "prediction value = B" wlog. If we furthermore have equated "choice value = B" with "prediction value = B" wlog, we have two permissible arrays of values: all A, or all B. Now our knowledge is restricted to choice value. We can choose A or B. Since the "hidden" values are known to be identical to the visible value, we should pick the visible value in accordance with what we want for a given other value. Other thoughts: -Locally, it appears that you cannot "miss out" because within a value set, your choice value is the only possible one in identity with the other values. -This is a strange problem, because generally paradox provokes these kinds of responses. In this case, however, fixing a value does not cause a contradiction both ways. If you accept the premise and my premises above, there should be no threat of complications from Omega or anything else. -if 1 and 2 really are the only reactions, and 2 ->onebox, any twoboxers must believe 1. But this is absurd. So whence the twoboxers?

    I don't know the literature around Newcomb's problem very well, so excuse me if this is stupid. BUT: why not just reason as follows:

    1. If the superintelligence can predict your action, one of the following two things must be the case:

    a) the state of affairs whether you pick the box or not is already absolutely determined (i.e. we live in a fatalistic universe, at least with respect to your box-picking)

    b) your box picking is not determined, but it has backwards causal force, i.e. something is moving backwards through time.

    If a), then practical reason is ... (read more)


    Once we can model the probabilities of the various outcomes in a noncontroversial fashion, the specific choice to make depends on the utility of the various outcomes. $1,001,000 might be only marginally better than $1,000,000 -- or that extra $1,000 could have some significant extra utility.

    If we assume that Omega almost never makes a mistake and we allow the chooser to use true randomization (perhaps by using quantum physics) in making his choice, then Omega must make his decision in part through seeing into the future. In this case the chooser should obviously pick just B.

    Hanson: I suppose I might still be missing something, but this still seems to me just a simple example of time inconsistency

    In my motivations and in my decision theory, dynamic inconsistency is Always Wrong. Among other things, it always implies an agent unstable under reflection.

    A more familiar example is that we'd like to commit ahead of time to spending effort to punish people who hurt us, but after they hurt us we'd rather avoid spending that effort as the harm is already done.

    But a self-modifying agent would modify to not rather avoid it.

    Gowder: If... (read more)

    I don't see why this needs to be so drawn out.

    I know the rules of the game. I also know that Omega is super intelligent, namely, Omega will accurately predict my action. Since Omega knows that I know this, and since I know that he knows I know this, I can rationally take box B, content in my knowledge that Omega has predicted my action correctly.

    I don't think it's necessary to precommit to any ideas, since Omega knows that I'll be able to rationally deduce the winning action given the premise.

    We don't even need a superintelligence. We can probably predict on the basis of personality type a person's decision in this problem with an 80% accuracy, which is already sufficient that a rational person would choose only box B.

    The possibility of time inconsistency is very well established among game theorists, and is considered a problem of the game one is playing, rather than a failure to analyze the game well. So it seems you are disagreeing with most all game theorists in economics as well as most decision theorists in philosophy. Maybe perhaps they are right and you are wrong?

    The interesting thing about this game is that Omega has magical super-powers that allow him to know whether or not you will back out on your commitment ahead of time, and so you can make your commitment credible by not being going to back out on your commitment. If that makes any sense.

    Robin, remember I have to build a damn AI out of this theory, at some point. A self-modifying AI that begins anticipating dynamic inconsistency - that is, a conflict of preference with its own future self - will not stay in such a state for very long... did the game theorists and economists work a standard answer for what happens after that?

    If you like, you can think of me as defining the word "rationality" to refer to a different meaning - but I don't really have the option of using the standard theory, here, at least not for longer than 50 milliseconds.

    If there's some nonobvious way I could be wrong about this point, which seems to me quite straightforward, do let me know.

    In reality, either I am going to take one box or two. So when the two-boxer says, "If I take one box, I'll get amount x," and "If I take two boxes, I'll get amount x+1000," one of these statements is objectively counterfactual. Let's suppose he is going to in fact take both boxes. Then his second takement is factual and his first statement counterfactual. Then his two statements are:

    1)Although I am not in fact going to take only one box, were I to take only box, I would get amount x, namely the amount that would be in the box.

    2)I am in ... (read more)

    Eleizer: whether or not a fixed future poses a problem for morality is a hotly disputed question which even I don't want to touch. Fortunately, this problem is one that is pretty much wholly orthogonal to morality. :-)

    But I feel like in the present problem the fixed future issue is a key to dissolving the problem. So, assume the box decision is fixed. It need not be the case that the stress is fixed too. If the stress isn't fixed, then it can't be relevant to the box decision (the box is fixed regardless of your decision between stress and no-stress).... (read more)

    Paul, being fixed or not fixed has nothing to do with it. Suppose I program a deterministic AI to play the game (the AI picks a box.)

    The deterministic AI knows that it is deterministic, and it knows that I know too, since I programmed it. So I also know whether it will take one or both boxes, and it knows that I know this.

    At first, of course, it doesn't know itself whether it will take one or both boxes, since it hasn't completed running its code yet. So it says to itself, "Either I will take only one box or both boxes. If I take only one box, the pro... (read more)

    I practice historical European swordsmanship, and those Musashi quotes have a certain resonance to me*. Here is another (modern) saying common in my group:

    If it's stupid, but it works, then it ain't stupid.

    • you previously asked why you couldn't find similar quotes from European sources - I believe this is mainly a language barrier: The English were not nearly the swordsmen that the French, Italians, Spanish, and Germans were (though they were pretty mean with their fists). You should be able to find many quotes in those other languages.

    Eliezer, I don't read the main thrust of your post as being about Newcomb's problem per se. Having distinguished between 'rationality as means' to whatever end you choose, and 'rationality as a way of discriminating between ends', can we agree that the whole specks / torture debate was something of a red herring ? Red herring, because it was a discussion on using rationality to discriminate between ends, without having first defined one's meta-objectives, or, if one's meta-objectives involved hedonism, establishing the rules for performing math over subje... (read more)

    Unknown: your last question highlights the problem with your reasoning. It's idle to ask whether I'd go and jump off a cliff if I found my future were determined. What does that question even mean?

    Put a different way, why should we ask an "ought" question about events that are determined? If A will do X whether or not it is the case that a rational person will do X, why do we care whether or not it is the case that a rational person will do X? I submit that we care about rationality because we believe it'll give us traction on our problem of ... (read more)

    Paul, it sounds like you didn't understand. A chess playing computer program is completely deterministic, and yet it has to consider alternatives in order to make its move. So also we could be deterministic and we would still have to consider all the possibilities and their benefits before making a move.

    So it makes sense to ask whether you would jump off a cliff if you found out that the future is determined. You would find out that the future is determined without knowing exactly which future is determined, just like the chess program, and so you would ha... (read more)

    I do understand. My point is that we ought not to care whether we're going to consider all the possibilities and benefits.

    Oh, but you say, our caring about our consideration process is a determined part of the causal chain leading to our consideration process, and thus to the outcome.

    Oh, but I say, we ought not to care* about that caring. Again, recurse as needed. Nothing you can say about the fact that a cognition is in the causal chain leading to a state of affairs counts as a point against the claim that we ought not to care about whether or not we have that cognition if it's unavoidable.

    The paradox is designed to give your decision the practical effect of causing Box B to contain the money or not, without actually labeling this effect "causation." But I think that if Box B acts as though its contents are caused by your choice, then you should treat it as though they were. So I don't think the puzzle is really something deep; rather, it is a word game about what it means to cause something.

    Perhaps it would be useful to think about how Omega might be doing its prediction. For example, it might have the ability to travel into the f... (read more)

    I have two arguments for going for Box B. First, for a scientist it's not unusual that every rational argument (=theory) predicts that only two-boxing makes sense. Still, if the experiment again and again refutes that, it's obviously the theory that's wrong and there's obviously something more to reality than that which fueled the theories. Actually, we even see dilemmas like Newcomb's in the contextuality of quantum measurements. Measurement tops rationality or theory, every time. That's why science is successful and philosophy is not.

    Second, there's no q... (read more)

    Paul, if we were determined, what would you mean when you say that "we ought not to care"? Do you mean to say that the outcome would be better if we didn't care? The fact that the caring is part of the causal chain does have something to do with this: the outcome may be determined by whether or not we care. So if you consider one outcome better than another (only one really possible, but both possible as far as you know), then either "caring" or "not caring" might be preferable, depending on which one would lead to each outcome.

    Eliezer, if a smart creature modifies itself in order to gain strategic advantages from committing itself to future actions, it must think could better achieve its goals by doing so. If so, why should we be concerned, if those goals do not conflict with our goals?

    I think Anonymous, Unknown and Eliezer have been very helpful so far. Following on from them, here is my take:

    There are many ways Omega could be doing the prediction/placement and it may well matter exactly how the problem is set up. For example, you might be deterministic and he is precalculating your choice (much like we might be able to do with an insect or computer program), or he might be using a quantum suicide method, (quantum) randomizing whether the million goes in and then destroying the world iff you pick the wrong option (This will lead to us ... (read more)

    Be careful of this sort of argument, any time you find yourself defining the "winner" as someone other than the agent who is currently smiling from on top of a giant heap.

    This made me laugh. Well said!

    There's only one question about this scenario for me - is it possible for a sufficiently intelligent being to fully, fully model an individual human brain? If so, (and I think it's tough to argue 'no' unless you think there's a serious glass ceiling for intelligence) choose box B. If you try and second-guess (or, hell, googolth-guess) Omega, you're ... (read more)

    How does the box know? I could open B with the intent of opening only B or I could open B with the intent of then opening A. Perhaps Omega has locked the boxes such that they only open when you shout your choice to the sky. That would beat my preferred strategy of opening B before deciding which to choose. I open boxes without choosing to take them all the time.

    Are our common notions about boxes catching us here? In my experience, opening a box rarely makes nearby objects disintegrate. It is physically impossible to "leave $1000 on the table,&qu... (read more)

    Eliezer, if a smart creature modifies itself in order to gain strategic advantages from committing itself to future actions, it must think could better achieve its goals by doing so. If so, why should we be concerned, if those goals do not conflict with our goals?

    Well, there's a number of answers I could give to this:

    *) After you've spent some time working in the framework of a decision theory where dynamic inconsistencies naturally Don't Happen - not because there's an extra clause forbidding them, but because the simple foundations just don't give rise t... (read more)

    So it seems you are disagreeing with most all game theorists in economics as well as most decision theorists in philosophy. Maybe perhaps they are right and you are wrong?

    Maybe perhaps we are right and they are wrong?

    The issue is to be decided, not by referring to perceived status or expertise, but by looking at who has the better arguments. Only when we cannot evaluate the arguments does making an educated guess based on perceived expertise become appropriate.

    Again: how much do we want to bet that Eliezer won't admit that he's wrong in this case? Do we have someone willing to wager another 10 credibility units?

    Caledonian: you can stop talking about wagering credibility units now, we all know you don't have funds for the smallest stake.

    Ben Jones: if we assume that Omega is perfectly simulating the human mind, then when we are choosing between B and A+B, we don't know whether we are in reality or simulation. In reality, our choice does not affect the million, but in the simulation this will. So we should reason "I'd better take only box B, because if this is the simulation then that will change whether or not I get the million in reality".

    There is a big difference between having time inconsistent preferences, and time inconsistent strategies because of the strategic incentives of the game you are playing. Trying to find a set of preferences that avoids all strategic conflicts between your different actions seems a fool's errand.

    What we have here is an inability to recognize that causality no longer flows only from 'past' to 'future'.

    If we're given a box that could contain $1,000 or nothing, we calculate the expected value of the superposition of these two possibilities. We don't actually expect that there's a superposition within the box - we simply adopt a technique to help compensate for what we do not know. From our ignorant perspective, either case could be real, although in actuality either the box has the money or it does not.

    This is similar. The amount of money in the b... (read more)

    How about simply multiplying? Treat Omega as a fair coin toss. 50% of a million is half-a-million, and that's vastly bigger than a thousand. You can ignore the question of whether omega has filled the box, in deciding that the uncertain box is more important. So much more important, that the chance of gaining an extra 1000 isn't worth the bother of trying to beat the puzzle. You just grab the important box.


    After you've spent some time working in the framework of a decision theory where dynamic inconsistencies naturally Don't Happen - not because there's an extra clause forbidding them, but because the simple foundations just don't give rise to them - then an intertemporal preference reversal starts looking like just another preference reversal.

    ... Roughly, self-modifying capability in a classical causal decision theorist doesn't fix the problem that gives rise to the intertemporal preference reversals, it just makes one temporal self win out over all the oth... (read more)

    There is a big difference between having time inconsistent preferences, and time inconsistent strategies because of the strategic incentives of the game you are playing.

    I can see why a human would have time-inconsistent strategies - because of inconsistent preferences between their past and future self, hyperbolic discounting functions, that sort of thing. I am quite at a loss to understand why an agent with a constant, external utility function should experience inconsistent strategies under any circumstance, regardless of strategic incentives. Expected... (read more)

    The entire issue of casual versus inferential decision theory, and of the seemingly magical powers of the chooser in the Newcomb problem, are serious distractions here, as Eliezer has the same issue in an ordinary commitment situation, e.g., punishment. I suggest starting this conversation over from such an ordinary simple example.

    Let me restate: Two boxes appear. If you touch box A, the contents of box B are vaporized. If you attempt to open box B, box A and it's contents are vaporized. Contents as previously specified. We could probably build these now.

    Experimentally, how do we distinguish this from the description in the main thread? Why are we taking Omega seriously when if the discussion dealt with the number of angels dancing on the head of pin the derision would be palpable? The experimental data point to taking box B. Even if Omega is observed delivering the boxes, and making the specified claims regarding their contents, why are these claims taken on faith as being an accurate description of the problem?

    Let's take Bayes seriously.

    Sometime ago there was a posting about something like "If all you knew was that the past 5 mornings the sun rose, what would you assign the probability the that sun would rise next morning? It came out so something like 5/6 or 4/5 or so.

    But of course that's not all we know, and so we'd get different numbers.

    Now what's given here is that Omega has been correct on a hundred occasions so far. If that's all we know, we should estimate the probability of him being right next time at about 99%. So if you're a one-boxer your exp... (read more)

    Eliezer, I have a question about this: "There is no finite amount of life lived N where I would prefer a 80.0001% probability of living N years to an 0.0001% chance of living a googolplex years and an 80% chance of living forever. This is a sufficient condition to imply that my utility function is unbounded."

    I can see that this preference implies an unbounded utility function, given that a longer life has a greater utility. However, simply stated in that way, most people might agree with the preference. But consider this gamble instead:

    A: Live 5... (read more)

    If this was the only chance you ever get to determine your lifespan - then choose B. In the real world, it would probably be a better idea to discard both options and use your natural lifespan to search for alternative paths to immortality.
    I disagree, not surprisingly, since I was the author of the comment to which you are responding. I would choose A, and I think anyone sensible would choose A. There's not much one can say here in the way of argument, but it is obvious to me that choosing B here is following your ideals off a cliff. Especially since I can add a few hundred 9s there, and by your argument you should still choose B.

    they would just insist that there is an important difference between deciding to take only box B at 7:00am vs 7:10am, if Omega chooses at 7:05am

    But that's exactly what strategic inconsistency is about. Even if you had decided to take only box B at 7:00am, by 7:06am a rational agent will just change his mind and choose to take both boxes. Omega knows this, hence it will put nothing into box B. The only way out is if the AI self-commits to take only box B is a way that's verifiable by Omega.

    When the stakes are high enough I one-box, while gritting my teeth. Otherwise, I'm more interested in demonstrating my "rationality" (Eliezer has convinced me to use those quotes).

    Perhaps we could just specify an agent that uses reverse causation in only particular situations, as it seems that humans are capable of doing.

    Paul G, almost certainly, right? Still, as you say, it has little bearing on one's answer to the question.

    In fact, not true, it does. Is there anything to stop myself making a mental pact with all my simulation buddies (and 'myself', whoever he be) to go for Box B?

    In arguing for the single box, Yudkowsky has made an assumption that I disagree with: at the very end, he changes the stakes and declares that your choice should still be the same.

    My way of looking at it is similar to what Hendrik Boom has said. You have a choice between betting on Omega being right and betting on Omega being wrong.

    A = Contents of box A

    B = What may be in box B (if it isn't empty)

    A is yours, in the sense that you can take it and do whatever you want with it. One thing you can do with A is pay it for a chance to win B if Omega is right. Y... (read more)

    IMO there's less to Newcomb's paradox than meets the eye. It's basically "A future-predicting being who controls the set of choices could make rational choices look silly by making sure they had bad outcomes". OK, yes, he could. Surprised?

    What I think makes it seem paradoxical is that the paradox both assures us that Omega controls the outcome perfectly, and cues us that this isn't so ("He's already left" etc). Once you settle what it's really saying either way, the rest follows.

    Yes, this is really an issue of whether your choice causes Omega's action or not. The only way for Omega to be a perfect predictor is for your choice to actually cause Omega's action. (For example, Omega 'sees the future' and acts based on your choice). If your choice causes Omega's action, then choosing B is the rational decision, as it causes the box to have the million.

    If your choice does not cause Omega's action, then choosing both boxes is the winning approach. in this case, Omega is merely giving big awards to some people and small awards to ot... (read more)

    the dominant consensus in modern decision theory is that one should two-box...there's a common attitude that "Verbal arguments for one-boxing are easy to come by, what's hard is developing a good decision theory that one-boxes"

    Those are contrary positions, right?

    Robin Hason:
    Punishment is ordinary, but Newcomb's problem is simple! You can't have both.

    The advantage of an ordinary situation like punishment is that game theorists can't deny the fact on the ground that governments exist, but they can claim it's because we're all irrational, which doesn't leave many directions to go in.


    I agree that "rationality" should be the thing that makes you win but the Newcomb paradox seems kind of contrived.

    If there is a more powerful entity throwing good utilities at normally dumb decisions and bad utilities at normally good decisions then you can make any dumb thing look genius because you are under different rules than the world we live in at present.

    I would ask Alpha for help and do what he tells me to do. Alpha is an AI that is also never wrong when it comes to predicting the future, just like Omega. Alpha would examine omega and ... (read more)

    To me, the decision is very easy. Omega obviously possesses more prescience about my box-taking decision than I do myself. He's been able to guess correct in the past, so I'd see no reason to doubt him with myself. With that in mind, the obvious choice is to take box B.

    If Omega is so nearly always correct, then determinism is shown to exist (at least to some extent). That being the case, causality would be nothing but an illusion. So I'd see no problem with it working in "reverse".

    Fascinating. A few days after I read this, it struck me that a form of Newcomb's Problem actually occurs in real life--voting in a large election. Here's what I mean.

    Say you're sitting at home pondering whether to vote. If you decide to stay home, you benefit by avoiding the minor inconvenience of driving and standing in line. (Like gaining $1000.) If you decide to vote, you'll fail to avoid the inconvenience, meanwhile you know your individual vote almost certainly won't make a statistical difference in getting your candidate elected. (Which would be like... (read more)

    A very good point. I'm the type to stay home from the polls. But I'd also one-box..... hm. I think it may have to do with the very weak correlation between my choice to vote and the choice of those of a similar mind to me to vote as opposed to the very strong correlation between my choice to one-box and Omega's choice to put $1,000,000 in box B.
    Rational agents defect against a bunch of irrational fools who are mostly choosing for signalling purposes and who may well vote for the other guy even if they cooperate.

    "If it ever turns out that Bayes fails - receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions - then Bayes has to go out the window."

    What exactly do you mean by mere decisions? I can construct problems where agents that use few computational resources win. Bayesian agents by your own admission have to use energy to get in mutual information with the environment (a state I am still suspecious of), so they have to use energy, meaning they lose.

    The premise is that a rational agent would start out convinced that this story about the alien that knows in advance what they'll decide appears to be false.

    The Kolomogorov complexity of the story about the alien is very large because we have to hypothesize some mechanism by which it can extrapolate the contents of minds. Even if I saw the alien land a million times and watched the box-picking connect with the box contents as they're supposed to, it is simpler to assume that the boxes are some stage magic trick, or even that they are an exception to the u... (read more)

    It is not possible for an agent to make a rational choice between 1 or 2 boxes if the agent and Omega can both be simulated by Turing machines. Proof: Omega predicts the agent's decision by simulating it. This requires Omega to have greater algorithmic complexity than the agent (including the nonzero complexity of the compiler or interpreter). But a rational choice by the agent requires that it simulate Omega, which requires that the agent have greater algorithmic complexity instead.

    In other words, the agent X, with complexity K(X), must model Omega whi... (read more)

    Um, AIXI is not computable. Relatedly, K(AIXI) is undefined, as AIXI is not a finite object. Also, A can simulate B, even when K(B)>K(A). For example, one could easily define a computer program which, given sufficient computing resources, simulates all Turing machines on all inputs. This must obviously include those with much higher Kolmogorov complexity. Yes, you run into issues of two Turing machines/agents/whatever simulating each other. (You could also get this from the recursion theorem.) What happens then? Simple: neither simulation ever halts.
    Not so. I don't need to simulate a hungry tiger in order to stay safely (and rationally) away from it, even though I don't know the exact methods by which its brain will identify me as a tasty treat. If you think that one can't "rationally" stay away from hungry tigers, then we're using the word "rationally" vastly differently.

    Okay, maybe I am stupid, maybe I am unfamiliar with all the literature on the problem, maybe my English sucks, but I fail to understand the following:
    Is the agent aware of the fact that one boxers get 1 000 000 at the moment Omega "scans" him and presents the boxes?


    Is agent told about this after Omega "has left"?


    Is agent unaware of the fact that Omega rewards one-boxers at all?
    P.S.: Also, as most "decision paradoxes", this one will have different solutions depending on the context (is the agent a starving child in Africa, or a "megacorp" CEO)

    I'm a convinced two-boxer, but I'll try to put my argument without any bias. It seems to me the way this problem has been put has been an attempt to rig it for the one boxers. When we talk about "precommitment" it is suggested the subject has an advance knowledge of Omega and what is to happen. The way I thought the paradox worked, was that Omega would scan/analyze a person and make its prediction, all before the person ever heard of the dilemna. Therefore, a person has no way to develop an intention of being a one-boxer or a two-boxer t... (read more)

    The key point you've missed in your analysis, however, is that Omega is almost always correct in his predictions. It doesn't matter how Omega does it - that is a separate problem. You don't have enough information about his process of prediction to make any rational judgment about it except for the fact that it is a very, very good process. Brain scans, reversed causality, time travel, none of those ideas matter. In the paradox as originally posed, all you have are guesses about how he may have done it, and you would be an utter fool to give higher weight to those guesses than to the fact that Omega is always right. The if observations (that Omega is always right) disagree with theory (that Omega cannot possibly be right), it is the theory that is wrong, every time. Thus the rational agent should, in this situation, give extremely low weight to his understanding of the way the universe works, since it is obviously flawed (the existence of a perfect predictor proves this). The question really comes down to 100% chance of getting $1000 plus a nearly 0% chance of getting $1.01 million, vs nearly 100% chance of getting $1 million. What really blows my mind about making the 2-box choice is that you can significantly reduce Omega's ability to predict the outcome, and unless you are absolutely desperate for that $1000* the 2-box choice doesn't become superior until Omega is only roughly 50% accurate (at 50.1% the outcome equalizes). Only then do you expect to get more money, on average, by choosing both boxes. In other words, if you think Omega is doing anything but flipping a coin to determine the contents of box B, you are better off choosing box B. *I could see the value of $1000 rising significantly if, for example, a man is holding a gun to your head and will kill you in two minutes if you don't give him $1000. In this case, any uncertainty of Omega's abilities are overshadowed by the certainty of the $1000. This inverts if the man with the gun is demanding more

    If the alien is able to predict your decision, it follows that your decision is a function of your state at the time the alien analyzes you. Then, there is no meaningful question of "what should you do?" Either you are in a universe in which you are disposed to choose the one box AND the alien has placed the million dollars, or you are in a universe in which you are disposed to take both boxes AND the alien has placed nothing. If the former, you will have the subjective experience of "deciding to take the one box", which is itself a det... (read more)