The following may well be the most controversial dilemma in the history of decision theory:
A superintelligence from another galaxy, whom we shall call Omega, comes to Earth and sets about playing a strange little game. In this game, Omega selects a human being, sets down two boxes in front of them, and flies away.
Box A is transparent and contains a thousand dollars.
Box B is opaque, and contains either a million dollars, or nothing.You can take both boxes, or take only box B.
And the twist is that Omega has put a million dollars in box B iff Omega has predicted that you will take only box B.
Omega has been correct on each of 100 observed occasions so far - everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars. (We assume that box A vanishes in a puff of smoke if you take only box B; no one else can take box A afterward.)
Before you make your choice, Omega has flown off and moved on to its next game. Box B is already empty or already full.
Omega drops two boxes on the ground in front of you and flies off.
Do you take both boxes, or only box B?
And the standard philosophical conversation runs thusly:
One-boxer: "I take only box B, of course. I'd rather have a million than a thousand."
Two-boxer: "Omega has already left. Either box B is already full or already empty. If box B is already empty, then taking both boxes nets me $1000, taking only box B nets me $0. If box B is already full, then taking both boxes nets $1,001,000, taking only box B nets $1,000,000. In either case I do better by taking both boxes, and worse by leaving a thousand dollars on the table - so I will be rational, and take both boxes."
One-boxer: "If you're so rational, why ain'cha rich?"
Two-boxer: "It's not my fault Omega chooses to reward only people with irrational dispositions, but it's already too late for me to do anything about that."
There is a large literature on the topic of Newcomblike problems - especially if you consider the Prisoner's Dilemma as a special case, which it is generally held to be. "Paradoxes of Rationality and Cooperation" is an edited volume that includes Newcomb's original essay. For those who read only online material, this PhD thesis summarizes the major standard positions.
I'm not going to go into the whole literature, but the dominant consensus in modern decision theory is that one should two-box, and Omega is just rewarding agents with irrational dispositions. This dominant view goes by the name of "causal decision theory".
As you know, the primary reason I'm blogging is that I am an incredibly slow writer when I try to work in any other format. So I'm not going to try to present my own analysis here. Way too long a story, even by my standards.
But it is agreed even among causal decision theorists that if you have the power to precommit yourself to take one box, in Newcomb's Problem, then you should do so. If you can precommit yourself before Omega examines you; then you are directly causing box B to be filled.
Now in my field - which, in case you have forgotten, is self-modifying AI - this works out to saying that if you build an AI that two-boxes on Newcomb's Problem, it will self-modify to one-box on Newcomb's Problem, if the AI considers in advance that it might face such a situation. Agents with free access to their own source code have access to a cheap method of precommitment.
What if you expect that you might, in general, face a Newcomblike problem, without knowing the exact form of the problem? Then you would have to modify yourself into a sort of agent whose disposition was such that it would generally receive high rewards on Newcomblike problems.
But what does an agent with a disposition generally-well-suited to Newcomblike problems look like? Can this be formally specified?
Yes, but when I tried to write it up, I realized that I was starting to write a small book. And it wasn't the most important book I had to write, so I shelved it. My slow writing speed really is the bane of my existence. The theory I worked out seems, to me, to have many nice properties besides being well-suited to Newcomblike problems. It would make a nice PhD thesis, if I could get someone to accept it as my PhD thesis. But that's pretty much what it would take to make me unshelve the project. Otherwise I can't justify the time expenditure, not at the speed I currently write books.
I say all this, because there's a common attitude that "Verbal arguments for one-boxing are easy to come by, what's hard is developing a good decision theory that one-boxes" - coherent math which one-boxes on Newcomb's Problem without producing absurd results elsewhere. So I do understand that, and I did set out to develop such a theory, but my writing speed on big papers is so slow that I can't publish it. Believe it or not, it's true.
Nonetheless, I would like to present some of my motivations on Newcomb's Problem - the reasons I felt impelled to seek a new theory - because they illustrate my source-attitudes toward rationality. Even if I can't present the theory that these motivations motivate...
First, foremost, fundamentally, above all else:
Rational agents should WIN.
Don't mistake me, and think that I'm talking about the Hollywood Rationality stereotype that rationalists should be selfish or shortsighted. If your utility function has a term in it for others, then win their happiness. If your utility function has a term in it for a million years hence, then win the eon.
But at any rate, WIN. Don't lose reasonably, WIN.
Now there are defenders of causal decision theory who argue that the two-boxers are doing their best to win, and cannot help it if they have been cursed by a Predictor who favors irrationalists. I will talk about this defense in a moment. But first, I want to draw a distinction between causal decision theorists who believe that two-boxers are genuinely doing their best to win; versus someone who thinks that two-boxing is the reasonable or the rational thing to do, but that the reasonable move just happens to predictably lose, in this case. There are a lot of people out there who think that rationality predictably loses on various problems - that, too, is part of the Hollywood Rationality stereotype, that Kirk is predictably superior to Spock.
Next, let's turn to the charge that Omega favors irrationalists. I can conceive of a superbeing who rewards only people born with a particular gene, regardless of their choices. I can conceive of a superbeing who rewards people whose brains inscribe the particular algorithm of "Describe your options in English and choose the last option when ordered alphabetically," but who does not reward anyone who chooses the same option for a different reason. But Omega rewards people who choose to take only box B, regardless of which algorithm they use to arrive at this decision, and this is why I don't buy the charge that Omega is rewarding the irrational. Omega doesn't care whether or not you follow some particular ritual of cognition; Omega only cares about your predicted decision.
We can choose whatever reasoning algorithm we like, and will be rewarded or punished only according to that algorithm's choices, with no other dependency - Omega just cares where we go, not how we got there.
It is precisely the notion that Nature does not care about our algorithm, which frees us up to pursue the winning Way - without attachment to any particular ritual of cognition, apart from our belief that it wins. Every rule is up for grabs, except the rule of winning.
As Miyamoto Musashi said - it's really worth repeating:
"You can win with a long weapon, and yet you can also win with a short weapon. In short, the Way of the Ichi school is the spirit of winning, whatever the weapon and whatever its size."
(Another example: It was argued by McGee that we must adopt bounded utility functions or be subject to "Dutch books" over infinite times. But: The utility function is not up for grabs. I love life without limit or upper bound: There is no finite amount of life lived N where I would prefer a 80.0001% probability of living N years to an 0.0001% chance of living a googolplex years and an 80% chance of living forever. This is a sufficient condition to imply that my utility function is unbounded. So I just have to figure out how to optimize for that morality. You can't tell me, first, that above all I must conform to a particular ritual of cognition, and then that, if I conform to that ritual, I must change my morality to avoid being Dutch-booked. Toss out the losing ritual; don't change the definition of winning. That's like deciding to prefer $1000 to $1,000,000 so that Newcomb's Problem doesn't make your preferred ritual of cognition look bad.)
"But," says the causal decision theorist, "to take only one box, you must somehow believe that your choice can affect whether box B is empty or full - and that's unreasonable! Omega has already left! It's physically impossible!"
Unreasonable? I am a rationalist: what do I care about being unreasonable? I don't have to conform to a particular ritual of cognition. I don't have to take only box B because I believe my choice affects the box, even though Omega has already left. I can just... take only box B.
I do have a proposed alternative ritual of cognition which computes this decision, which this margin is too small to contain; but I shouldn't need to show this to you. The point is not to have an elegant theory of winning - the point is to win; elegance is a side effect.
Or to look at it another way: Rather than starting with a concept of what is the reasonable decision, and then asking whether "reasonable" agents leave with a lot of money, start by looking at the agents who leave with a lot of money, develop a theory of which agents tend to leave with the most money, and from this theory, try to figure out what is "reasonable". "Reasonable" may just refer to decisions in conformance with our current ritual of cognition - what else would determine whether something seems "reasonable" or not?
From James Joyce (no relation), Foundations of Causal Decision Theory:
Rachel has a perfectly good answer to the "Why ain't you rich?" question. "I am not rich," she will say, "because I am not the kind of person the psychologist thinks will refuse the money. I'm just not like you, Irene. Given that I know that I am the type who takes the money, and given that the psychologist knows that I am this type, it was reasonable of me to think that the $1,000,000 was not in my account. The $1,000 was the most I was going to get no matter what I did. So the only reasonable thing for me to do was to take it."
Irene may want to press the point here by asking, "But don't you wish you were like me, Rachel? Don't you wish that you were the refusing type?" There is a tendency to think that Rachel, a committed causal decision theorist, must answer this question in the negative, which seems obviously wrong (given that being like Irene would have made her rich). This is not the case. Rachel can and should admit that she does wish she were more like Irene. "It would have been better for me," she might concede, "had I been the refusing type." At this point Irene will exclaim, "You've admitted it! It wasn't so smart to take the money after all." Unfortunately for Irene, her conclusion does not follow from Rachel's premise. Rachel will patiently explain that wishing to be a refuser in a Newcomb problem is not inconsistent with thinking that one should take the $1,000 whatever type one is. When Rachel wishes she was Irene's type she is wishing for Irene's options, not sanctioning her choice.
It is, I would say, a general principle of rationality - indeed, part of how I define rationality - that you never end up envying someone else's mere choices. You might envy someone their genes, if Omega rewards genes, or if the genes give you a generally happier disposition. But Rachel, above, envies Irene her choice, and only her choice, irrespective of what algorithm Irene used to make it. Rachel wishes just that she had a disposition to choose differently.
You shouldn't claim to be more rational than someone and simultaneously envy them their choice - only their choice. Just do the act you envy.
I keep trying to say that rationality is the winning-Way, but causal decision theorists insist that taking both boxes is what really wins, because you can't possibly do better by leaving $1000 on the table... even though the single-boxers leave the experiment with more money. Be careful of this sort of argument, any time you find yourself defining the "winner" as someone other than the agent who is currently smiling from on top of a giant heap of utility.
Yes, there are various thought experiments in which some agents start out with an advantage - but if the task is to, say, decide whether to jump off a cliff, you want to be careful not to define cliff-refraining agents as having an unfair prior advantage over cliff-jumping agents, by virtue of their unfair refusal to jump off cliffs. At this point you have covertly redefined "winning" as conformance to a particular ritual of cognition. Pay attention to the money!
Or here's another way of looking at it: Faced with Newcomb's Problem, would you want to look really hard for a reason to believe that it was perfectly reasonable and rational to take only box B; because, if such a line of argument existed, you would take only box B and find it full of money? Would you spend an extra hour thinking it through, if you were confident that, at the end of the hour, you would be able to convince yourself that box B was the rational choice? This too is a rather odd position to be in. Ordinarily, the work of rationality goes into figuring out which choice is the best - not finding a reason to believe that a particular choice is the best.
Maybe it's too easy to say that you "ought to" two-box on Newcomb's Problem, that this is the "reasonable" thing to do, so long as the money isn't actually in front of you. Maybe you're just numb to philosophical dilemmas, at this point. What if your daughter had a 90% fatal disease, and box A contained a serum with a 20% chance of curing her, and box B might contain a serum with a 95% chance of curing her? What if there was an asteroid rushing toward Earth, and box A contained an asteroid deflector that worked 10% of the time, and box B might contain an asteroid deflector that worked 100% of the time?
Would you, at that point, find yourself tempted to make an unreasonable choice?
If the stake in box B was something you could not leave behind? Something overwhelmingly more important to you than being reasonable? If you absolutely had to win - really win, not just be defined as winning?
Would you wish with all your power that the "reasonable" decision was to take only box B?
Then maybe it's time to update your definition of reasonableness.
Alleged rationalists should not find themselves envying the mere decisions of alleged nonrationalists, because your decision can be whatever you like. When you find yourself in a position like this, you shouldn't chide the other person for failing to conform to your concepts of reasonableness. You should realize you got the Way wrong.
So, too, if you ever find yourself keeping separate track of the "reasonable" belief, versus the belief that seems likely to be actually true. Either you have misunderstood reasonableness, or your second intuition is just wrong.
Now one can't simultaneously define "rationality" as the winning Way, and define "rationality" as Bayesian probability theory and decision theory. But it is the argument that I am putting forth, and the moral of my advice to Trust In Bayes, that the laws governing winning have indeed proven to be math. If it ever turns out that Bayes fails - receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions - then Bayes has to go out the window. "Rationality" is just the label I use for my beliefs about the winning Way - the Way of the agent smiling from on top of the giant heap of utility. Currently, that label refers to Bayescraft.
I realize that this is not a knockdown criticism of causal decision theory - that would take the actual book and/or PhD thesis - but I hope it illustrates some of my underlying attitude toward this notion of "rationality".
You shouldn't find yourself distinguishing the winning choice from the reasonable choice. Nor should you find yourself distinguishing the reasonable belief from the belief that is most likely to be true.
That is why I use the word "rational" to denote my beliefs about accuracy and winning - not to denote verbal reasoning, or strategies which yield certain success, or that which is logically provable, or that which is publicly demonstrable, or that which is reasonable.
As Miyamoto Musashi said:
"The primary thing when you take a sword in your hands is your intention to cut the enemy, whatever the means. Whenever you parry, hit, spring, strike or touch the enemy's cutting sword, you must cut the enemy in the same movement. It is essential to attain this. If you think only of hitting, springing, striking or touching the enemy, you will not be able actually to cut him."
Either box B is already full or already empty.
I'm not going to go into the whole literature, but the dominant consensus in modern decision theory is that one should two-box, and Omega is just rewarding agents with irrational dispositions. This dominant view goes by the name of "causal decision theory".
I suppose causal decision theory assumes causality only works in one temporal direction. Confronted with a predictor that was right 100 out of 100 times, I would think it very likely that backward-in-time causation exists, and take only B. I assume this would, as you say, produce absurd results elsewhere.
Decisions aren't physical.
The above statement is at least hard to defend. Your decisions are physical and occur inside of you... So these two-boxers are using the wrong model amongst these two (see the drawings....) http://lesswrong.com/lw/r0/thou_art_physics/
If you are a part of physics, so is your decision, so it must account for the correlation between your thought processes and the superintelligence. Once it accounts for that, you decide to one box, because you understood the entanglement of the computation done by omega and the physical process going inside your skull.
If the entanglement is there, you are not looking at it from the outside, you are inside the process.
Our minds have this quirk that makes us think there are two moments, you decide, and then you cheat, you get to decide again. But if you are only allowed to decide once, which is the case, you are rational by one-boxing.
You're complicating the problem too much by bringing in issues like regret. Assume for sake of argument that Newcomb's problem is to maximize the amount of money you receive. Don't think about extraneous utility issues.
People seem to have pretty strong opinions about Newcomb's Problem. I don't have any trouble believing that a superintelligence could scan you and predict your reaction with 99.5% accuracy.
I mean, a superintelligence would have no trouble at all predicting that I would one-box... even if I hadn't encountered the problem before, I suspect.
Ultimately you either interpret "superintelligence" as being sufficient to predict your reaction with significant accuracy, or not. If not, the problem is just a straightforward probability question, as explained here, and becomes uninteresting.
Otherwise, if you interpret "superintelligence" as being sufficient to predict your reaction with significant accuracy (especially a high accuracy like >99.5%), the words of this sentence...
...simply mean "One-box to win, with high confidence."
Summary: After disambiguating "superintelligence" (making the belief that Omega is a superintelligence pay rent), Newcomb's problem turns into either a straightforward probability question or a fairly simple issue of rearranging the words in equivalent ways to make the winning answer readily apparent.
If you won't explicitly state your analysis, maybe we can try 20 questions?
I have suspected that supposed "paradoxes" of evidential decision theory occur because not all the evidence was considered. For example, the fact that you are using evidential decision theory to make the decision.
Agree/disagree?
Hmm, changed my mind, should have thought more before writing... the EDT virus has early symptoms of causing people to use EDT before progressing to terrible illness and death. It seems EDT would then recommend not using EDT.
I one-box, without a moment's thought.
The "rationalist" says "Omega has already left. How could you think that your decision now affects what's in the box? You're basing your decision on the illusion that you have free will, when in fact you have no such thing."
To which I respond "How does that make this different from any other decision I'll make today?"
I think the two box person is confused about what it is to be rational, it does not mean "make a fancy argument," it means start with the facts, abstract from them, and reason about your abstractions.
In this case if you start with the facts you see that 100% of people who take only box B win big, so rationally, you do the same. Why would anyone be surprised that reason divorced from facts gives the wrong answer?
This dilemma seems like it can be reduced to:
There's a seemingly-impossible but vital premise, namely, that your action was already known before you acted. Even if this is completely impossible, it's a premise, so there's no point arguing it.
Another way of thinking of it is that, when someone says, "The boxes are already there, so your decision cannot affect what's in them," he is wrong. It has been assumed that your decision does affect what's in them, so the fact that you cannot imagine how that is possible is wholly irrelevant.
In short, I don't understand how this is controversial when the decider has all the information that was provided.
I'd love to say I'd find some way of picking randomly just to piss Omega off, but I'd probably just one-box it. A million bucks is a lot of money.
It's often stipulated that if Omega predicts you'll use some randomizer it can't predict, it'll punish you by acting as if it predicted two-boxing.
It's a great puzzle. I guess this thread will degenerate into arguments pro and con. I used to think I'd take one box, but I read Joyce's book and that changed my mind.
For the take-one-boxers:
Do you believe, as you sit there with the two boxes in front of you, that their contents are fixed? That there is a "fact of the matter" as to whether box B is empty or not? Or is box B in a sort of intermediate state, halfway between empty and full? If so, do you generally consider that things momentarily out of sight may literally change their physical sta... (read more)
Na-na-na-na-na-na, I am so sorry you only got $1000!
Me, I'm gonna replace my macbook pro, buy an apartment and a car and take a two week vacation in the Bahamas, and put the rest in savings!
Suckah!
Point: arguments don't matter, winning does.
Oops. I had replied to this until I saw its parent was nearly 3 years old. So as I don't (quite) waste the typing:
Yes.
Yes.
No.
No.
Yes.
No, it can't. (But it already did.)
If I take both boxes how much money do I get? $1,000
If I take one box how much money do I get? $10,000,000 (or whatever it was instantiated to.)
It seems that my questions were more useful than yours. Perhaps Joyce b... (read more)
To quote E.T. Jaynes:
"This example shows also that the major premise, “If A then B” expresses B only as a logical consequence of A; and not necessarily a causal physical consequence, which could be effective only at a later time. The rain at 10 AM is not the physical cause of the clouds at 9:45 AM. Nevertheless, the proper logical connection is not in the uncertain causal direction (clouds =⇒ rain), but rather (rain =⇒ clouds) which is certain, although noncausal. We emphasize at the outset that we are concerned here with logical connections, because some discussions and applications of inference have fallen into serious error through failure to see the distinction between logical implication and physical causation. The distinction is analyzed in some depth by H. A. Simon and N. Rescher (1966), who note that all attempts to interpret implication as expressing physical causation founder on the lack of contraposition expressed by the second syllogism (1–2). That is, if we tried to interpret the major premise as “A is the physical cause of B,” then we would hardly be able to accept that “not-B is the physical cause of not-A.” In Chapter 3 we shall see that attempts to interpret plausible inferences in terms of physical causation fare no better."
@: Hal Finney:
Certainly the box is either full or empty. But the only way to get the money in the hidden box is to precommit to taking only that one box. Not pretend to precommit, really precommit. If you try to take the $1,000, well then I guess you really hadn't precommitted after all. I might vascillate, I might even be unable to make such a rigid precommitment with myself (though I suspect I am), but it seems hard to argue that taking only one box is not the correct choice.
I'm not entirely certain that acting rationally in this situation doesn't require an element of doublethink, but thats a topic for another post.
I would be interested in know if your opinion would change if the "predictions" of the super-being were wrong .5% of the time, and some small number of people ended up with the $1,001,000 and some ended up with nothing. Would you still 1 box it?
I suppose I might still be missing something, but this still seems to me just a simple example of time inconsistency, where you'd like to commit ahead of time to something that later you'd like to violate if you could. You want to commit to taking the one box, but you also want to take the two boxes later if you could. A more familiar example is that we'd like to commit ahead of time to spending effort to punish people who hurt us, but after they hurt us we'd rather avoid spending that effort as the harm is already done.
If I know that the situation has resolved itself in a manner consistent with the hypothesis that Omega has successfully predicted people's actions many times over, I have a high expectation that it will do so again.
In that case, what I will find in the boxes is not independent of my choice, but dependent on it. By choosing to take two boxes, I cause there to be only $1,000 there. By choosing to take only one, I cause there to be $1,000,000. I can create either condition by choosing one way or another. If I can select between the possibilities, I prefer... (read more)
I don't know the literature around Newcomb's problem very well, so excuse me if this is stupid. BUT: why not just reason as follows:
a) the state of affairs whether you pick the box or not is already absolutely determined (i.e. we live in a fatalistic universe, at least with respect to your box-picking)
b) your box picking is not determined, but it has backwards causal force, i.e. something is moving backwards through time.
If a), then practical reason is ... (read more)
Laura,
Once we can model the probabilities of the various outcomes in a noncontroversial fashion, the specific choice to make depends on the utility of the various outcomes. $1,001,000 might be only marginally better than $1,000,000 -- or that extra $1,000 could have some significant extra utility.
If we assume that Omega almost never makes a mistake and we allow the chooser to use true randomization (perhaps by using quantum physics) in making his choice, then Omega must make his decision in part through seeing into the future. In this case the chooser should obviously pick just B.
Hanson: I suppose I might still be missing something, but this still seems to me just a simple example of time inconsistency
In my motivations and in my decision theory, dynamic inconsistency is Always Wrong. Among other things, it always implies an agent unstable under reflection.
A more familiar example is that we'd like to commit ahead of time to spending effort to punish people who hurt us, but after they hurt us we'd rather avoid spending that effort as the harm is already done.
But a self-modifying agent would modify to not rather avoid it.
Gowder: If... (read more)
I don't see why this needs to be so drawn out.
I know the rules of the game. I also know that Omega is super intelligent, namely, Omega will accurately predict my action. Since Omega knows that I know this, and since I know that he knows I know this, I can rationally take box B, content in my knowledge that Omega has predicted my action correctly.
I don't think it's necessary to precommit to any ideas, since Omega knows that I'll be able to rationally deduce the winning action given the premise.
We don't even need a superintelligence. We can probably predict on the basis of personality type a person's decision in this problem with an 80% accuracy, which is already sufficient that a rational person would choose only box B.
The possibility of time inconsistency is very well established among game theorists, and is considered a problem of the game one is playing, rather than a failure to analyze the game well. So it seems you are disagreeing with most all game theorists in economics as well as most decision theorists in philosophy. Maybe perhaps they are right and you are wrong?
The interesting thing about this game is that Omega has magical super-powers that allow him to know whether or not you will back out on your commitment ahead of time, and so you can make your commitment credible by not being going to back out on your commitment. If that makes any sense.
Robin, remember I have to build a damn AI out of this theory, at some point. A self-modifying AI that begins anticipating dynamic inconsistency - that is, a conflict of preference with its own future self - will not stay in such a state for very long... did the game theorists and economists work a standard answer for what happens after that?
If you like, you can think of me as defining the word "rationality" to refer to a different meaning - but I don't really have the option of using the standard theory, here, at least not for longer than 50 milliseconds.
If there's some nonobvious way I could be wrong about this point, which seems to me quite straightforward, do let me know.
In reality, either I am going to take one box or two. So when the two-boxer says, "If I take one box, I'll get amount x," and "If I take two boxes, I'll get amount x+1000," one of these statements is objectively counterfactual. Let's suppose he is going to in fact take both boxes. Then his second takement is factual and his first statement counterfactual. Then his two statements are:
1)Although I am not in fact going to take only one box, were I to take only box, I would get amount x, namely the amount that would be in the box.
2)I am in ... (read more)
Eleizer: whether or not a fixed future poses a problem for morality is a hotly disputed question which even I don't want to touch. Fortunately, this problem is one that is pretty much wholly orthogonal to morality. :-)
But I feel like in the present problem the fixed future issue is a key to dissolving the problem. So, assume the box decision is fixed. It need not be the case that the stress is fixed too. If the stress isn't fixed, then it can't be relevant to the box decision (the box is fixed regardless of your decision between stress and no-stress).... (read more)
Paul, being fixed or not fixed has nothing to do with it. Suppose I program a deterministic AI to play the game (the AI picks a box.)
The deterministic AI knows that it is deterministic, and it knows that I know too, since I programmed it. So I also know whether it will take one or both boxes, and it knows that I know this.
At first, of course, it doesn't know itself whether it will take one or both boxes, since it hasn't completed running its code yet. So it says to itself, "Either I will take only one box or both boxes. If I take only one box, the pro... (read more)
I practice historical European swordsmanship, and those Musashi quotes have a certain resonance to me*. Here is another (modern) saying common in my group:
If it's stupid, but it works, then it ain't stupid.
Eliezer, I don't read the main thrust of your post as being about Newcomb's problem per se. Having distinguished between 'rationality as means' to whatever end you choose, and 'rationality as a way of discriminating between ends', can we agree that the whole specks / torture debate was something of a red herring ? Red herring, because it was a discussion on using rationality to discriminate between ends, without having first defined one's meta-objectives, or, if one's meta-objectives involved hedonism, establishing the rules for performing math over subje... (read more)
Unknown: your last question highlights the problem with your reasoning. It's idle to ask whether I'd go and jump off a cliff if I found my future were determined. What does that question even mean?
Put a different way, why should we ask an "ought" question about events that are determined? If A will do X whether or not it is the case that a rational person will do X, why do we care whether or not it is the case that a rational person will do X? I submit that we care about rationality because we believe it'll give us traction on our problem of ... (read more)
Paul, it sounds like you didn't understand. A chess playing computer program is completely deterministic, and yet it has to consider alternatives in order to make its move. So also we could be deterministic and we would still have to consider all the possibilities and their benefits before making a move.
So it makes sense to ask whether you would jump off a cliff if you found out that the future is determined. You would find out that the future is determined without knowing exactly which future is determined, just like the chess program, and so you would ha... (read more)
I do understand. My point is that we ought not to care whether we're going to consider all the possibilities and benefits.
Oh, but you say, our caring about our consideration process is a determined part of the causal chain leading to our consideration process, and thus to the outcome.
Oh, but I say, we ought not to care* about that caring. Again, recurse as needed. Nothing you can say about the fact that a cognition is in the causal chain leading to a state of affairs counts as a point against the claim that we ought not to care about whether or not we have that cognition if it's unavoidable.
The paradox is designed to give your decision the practical effect of causing Box B to contain the money or not, without actually labeling this effect "causation." But I think that if Box B acts as though its contents are caused by your choice, then you should treat it as though they were. So I don't think the puzzle is really something deep; rather, it is a word game about what it means to cause something.
Perhaps it would be useful to think about how Omega might be doing its prediction. For example, it might have the ability to travel into the f... (read more)
I have two arguments for going for Box B. First, for a scientist it's not unusual that every rational argument (=theory) predicts that only two-boxing makes sense. Still, if the experiment again and again refutes that, it's obviously the theory that's wrong and there's obviously something more to reality than that which fueled the theories. Actually, we even see dilemmas like Newcomb's in the contextuality of quantum measurements. Measurement tops rationality or theory, every time. That's why science is successful and philosophy is not.
Second, there's no q... (read more)
Paul, if we were determined, what would you mean when you say that "we ought not to care"? Do you mean to say that the outcome would be better if we didn't care? The fact that the caring is part of the causal chain does have something to do with this: the outcome may be determined by whether or not we care. So if you consider one outcome better than another (only one really possible, but both possible as far as you know), then either "caring" or "not caring" might be preferable, depending on which one would lead to each outcome.
Eliezer, if a smart creature modifies itself in order to gain strategic advantages from committing itself to future actions, it must think could better achieve its goals by doing so. If so, why should we be concerned, if those goals do not conflict with our goals?
I think Anonymous, Unknown and Eliezer have been very helpful so far. Following on from them, here is my take:
There are many ways Omega could be doing the prediction/placement and it may well matter exactly how the problem is set up. For example, you might be deterministic and he is precalculating your choice (much like we might be able to do with an insect or computer program), or he might be using a quantum suicide method, (quantum) randomizing whether the million goes in and then destroying the world iff you pick the wrong option (This will lead to us ... (read more)
Be careful of this sort of argument, any time you find yourself defining the "winner" as someone other than the agent who is currently smiling from on top of a giant heap.
This made me laugh. Well said!
There's only one question about this scenario for me - is it possible for a sufficiently intelligent being to fully, fully model an individual human brain? If so, (and I think it's tough to argue 'no' unless you think there's a serious glass ceiling for intelligence) choose box B. If you try and second-guess (or, hell, googolth-guess) Omega, you're ... (read more)
How does the box know? I could open B with the intent of opening only B or I could open B with the intent of then opening A. Perhaps Omega has locked the boxes such that they only open when you shout your choice to the sky. That would beat my preferred strategy of opening B before deciding which to choose. I open boxes without choosing to take them all the time.
Are our common notions about boxes catching us here? In my experience, opening a box rarely makes nearby objects disintegrate. It is physically impossible to "leave $1000 on the table,&qu... (read more)
Eliezer, if a smart creature modifies itself in order to gain strategic advantages from committing itself to future actions, it must think could better achieve its goals by doing so. If so, why should we be concerned, if those goals do not conflict with our goals?
Well, there's a number of answers I could give to this:
*) After you've spent some time working in the framework of a decision theory where dynamic inconsistencies naturally Don't Happen - not because there's an extra clause forbidding them, but because the simple foundations just don't give rise t... (read more)
Maybe perhaps we are right and they are wrong?
The issue is to be decided, not by referring to perceived status or expertise, but by looking at who has the better arguments. Only when we cannot evaluate the arguments does making an educated guess based on perceived expertise become appropriate.
Again: how much do we want to bet that Eliezer won't admit that he's wrong in this case? Do we have someone willing to wager another 10 credibility units?
Caledonian: you can stop talking about wagering credibility units now, we all know you don't have funds for the smallest stake.
Ben Jones: if we assume that Omega is perfectly simulating the human mind, then when we are choosing between B and A+B, we don't know whether we are in reality or simulation. In reality, our choice does not affect the million, but in the simulation this will. So we should reason "I'd better take only box B, because if this is the simulation then that will change whether or not I get the million in reality".
There is a big difference between having time inconsistent preferences, and time inconsistent strategies because of the strategic incentives of the game you are playing. Trying to find a set of preferences that avoids all strategic conflicts between your different actions seems a fool's errand.
What we have here is an inability to recognize that causality no longer flows only from 'past' to 'future'.
If we're given a box that could contain $1,000 or nothing, we calculate the expected value of the superposition of these two possibilities. We don't actually expect that there's a superposition within the box - we simply adopt a technique to help compensate for what we do not know. From our ignorant perspective, either case could be real, although in actuality either the box has the money or it does not.
This is similar. The amount of money in the b... (read more)
How about simply multiplying? Treat Omega as a fair coin toss. 50% of a million is half-a-million, and that's vastly bigger than a thousand. You can ignore the question of whether omega has filled the box, in deciding that the uncertain box is more important. So much more important, that the chance of gaining an extra 1000 isn't worth the bother of trying to beat the puzzle. You just grab the important box.
After you've spent some time working in the framework of a decision theory where dynamic inconsistencies naturally Don't Happen - not because there's an extra clause forbidding them, but because the simple foundations just don't give rise to them - then an intertemporal preference reversal starts looking like just another preference reversal.
... Roughly, self-modifying capability in a classical causal decision theorist doesn't fix the problem that gives rise to the intertemporal preference reversals, it just makes one temporal self win out over all the oth... (read more)
There is a big difference between having time inconsistent preferences, and time inconsistent strategies because of the strategic incentives of the game you are playing.
I can see why a human would have time-inconsistent strategies - because of inconsistent preferences between their past and future self, hyperbolic discounting functions, that sort of thing. I am quite at a loss to understand why an agent with a constant, external utility function should experience inconsistent strategies under any circumstance, regardless of strategic incentives. Expected... (read more)
The entire issue of casual versus inferential decision theory, and of the seemingly magical powers of the chooser in the Newcomb problem, are serious distractions here, as Eliezer has the same issue in an ordinary commitment situation, e.g., punishment. I suggest starting this conversation over from such an ordinary simple example.
Let me restate: Two boxes appear. If you touch box A, the contents of box B are vaporized. If you attempt to open box B, box A and it's contents are vaporized. Contents as previously specified. We could probably build these now.
Experimentally, how do we distinguish this from the description in the main thread? Why are we taking Omega seriously when if the discussion dealt with the number of angels dancing on the head of pin the derision would be palpable? The experimental data point to taking box B. Even if Omega is observed delivering the boxes, and making the specified claims regarding their contents, why are these claims taken on faith as being an accurate description of the problem?
Let's take Bayes seriously.
Sometime ago there was a posting about something like "If all you knew was that the past 5 mornings the sun rose, what would you assign the probability the that sun would rise next morning? It came out so something like 5/6 or 4/5 or so.
But of course that's not all we know, and so we'd get different numbers.
Now what's given here is that Omega has been correct on a hundred occasions so far. If that's all we know, we should estimate the probability of him being right next time at about 99%. So if you're a one-boxer your exp... (read more)
Eliezer, I have a question about this: "There is no finite amount of life lived N where I would prefer a 80.0001% probability of living N years to an 0.0001% chance of living a googolplex years and an 80% chance of living forever. This is a sufficient condition to imply that my utility function is unbounded."
I can see that this preference implies an unbounded utility function, given that a longer life has a greater utility. However, simply stated in that way, most people might agree with the preference. But consider this gamble instead:
A: Live 5... (read more)
they would just insist that there is an important difference between deciding to take only box B at 7:00am vs 7:10am, if Omega chooses at 7:05am
But that's exactly what strategic inconsistency is about. Even if you had decided to take only box B at 7:00am, by 7:06am a rational agent will just change his mind and choose to take both boxes. Omega knows this, hence it will put nothing into box B. The only way out is if the AI self-commits to take only box B is a way that's verifiable by Omega.
When the stakes are high enough I one-box, while gritting my teeth. Otherwise, I'm more interested in demonstrating my "rationality" (Eliezer has convinced me to use those quotes).
Perhaps we could just specify an agent that uses reverse causation in only particular situations, as it seems that humans are capable of doing.
Paul G, almost certainly, right? Still, as you say, it has little bearing on one's answer to the question.
In fact, not true, it does. Is there anything to stop myself making a mental pact with all my simulation buddies (and 'myself', whoever he be) to go for Box B?
In arguing for the single box, Yudkowsky has made an assumption that I disagree with: at the very end, he changes the stakes and declares that your choice should still be the same.
My way of looking at it is similar to what Hendrik Boom has said. You have a choice between betting on Omega being right and betting on Omega being wrong.
A = Contents of box A
B = What may be in box B (if it isn't empty)
A is yours, in the sense that you can take it and do whatever you want with it. One thing you can do with A is pay it for a chance to win B if Omega is right. Y... (read more)
IMO there's less to Newcomb's paradox than meets the eye. It's basically "A future-predicting being who controls the set of choices could make rational choices look silly by making sure they had bad outcomes". OK, yes, he could. Surprised?
What I think makes it seem paradoxical is that the paradox both assures us that Omega controls the outcome perfectly, and cues us that this isn't so ("He's already left" etc). Once you settle what it's really saying either way, the rest follows.
Yes, this is really an issue of whether your choice causes Omega's action or not. The only way for Omega to be a perfect predictor is for your choice to actually cause Omega's action. (For example, Omega 'sees the future' and acts based on your choice). If your choice causes Omega's action, then choosing B is the rational decision, as it causes the box to have the million.
If your choice does not cause Omega's action, then choosing both boxes is the winning approach. in this case, Omega is merely giving big awards to some people and small awards to ot... (read more)
the dominant consensus in modern decision theory is that one should two-box...there's a common attitude that "Verbal arguments for one-boxing are easy to come by, what's hard is developing a good decision theory that one-boxes"
Those are contrary positions, right?
Robin Hason:
Punishment is ordinary, but Newcomb's problem is simple! You can't have both.
The advantage of an ordinary situation like punishment is that game theorists can't deny the fact on the ground that governments exist, but they can claim it's because we're all irrational, which doesn't leave many directions to go in.
I agree that "rationality" should be the thing that makes you win but the Newcomb paradox seems kind of contrived.
If there is a more powerful entity throwing good utilities at normally dumb decisions and bad utilities at normally good decisions then you can make any dumb thing look genius because you are under different rules than the world we live in at present.
I would ask Alpha for help and do what he tells me to do. Alpha is an AI that is also never wrong when it comes to predicting the future, just like Omega. Alpha would examine omega and ... (read more)
To me, the decision is very easy. Omega obviously possesses more prescience about my box-taking decision than I do myself. He's been able to guess correct in the past, so I'd see no reason to doubt him with myself. With that in mind, the obvious choice is to take box B.
If Omega is so nearly always correct, then determinism is shown to exist (at least to some extent). That being the case, causality would be nothing but an illusion. So I'd see no problem with it working in "reverse".
Fascinating. A few days after I read this, it struck me that a form of Newcomb's Problem actually occurs in real life--voting in a large election. Here's what I mean.
Say you're sitting at home pondering whether to vote. If you decide to stay home, you benefit by avoiding the minor inconvenience of driving and standing in line. (Like gaining $1000.) If you decide to vote, you'll fail to avoid the inconvenience, meanwhile you know your individual vote almost certainly won't make a statistical difference in getting your candidate elected. (Which would be like... (read more)
"If it ever turns out that Bayes fails - receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions - then Bayes has to go out the window."
What exactly do you mean by mere decisions? I can construct problems where agents that use few computational resources win. Bayesian agents by your own admission have to use energy to get in mutual information with the environment (a state I am still suspecious of), so they have to use energy, meaning they lose.
The premise is that a rational agent would start out convinced that this story about the alien that knows in advance what they'll decide appears to be false.
The Kolomogorov complexity of the story about the alien is very large because we have to hypothesize some mechanism by which it can extrapolate the contents of minds. Even if I saw the alien land a million times and watched the box-picking connect with the box contents as they're supposed to, it is simpler to assume that the boxes are some stage magic trick, or even that they are an exception to the u... (read more)
It is not possible for an agent to make a rational choice between 1 or 2 boxes if the agent and Omega can both be simulated by Turing machines. Proof: Omega predicts the agent's decision by simulating it. This requires Omega to have greater algorithmic complexity than the agent (including the nonzero complexity of the compiler or interpreter). But a rational choice by the agent requires that it simulate Omega, which requires that the agent have greater algorithmic complexity instead.
In other words, the agent X, with complexity K(X), must model Omega whi... (read more)
Okay, maybe I am stupid, maybe I am unfamiliar with all the literature on the problem, maybe my English sucks, but I fail to understand the following:
-
Is the agent aware of the fact that one boxers get 1 000 000 at the moment Omega "scans" him and presents the boxes?
OR
Is agent told about this after Omega "has left"?
OR
Is agent unaware of the fact that Omega rewards one-boxers at all?
-
P.S.: Also, as most "decision paradoxes", this one will have different solutions depending on the context (is the agent a starving child in Africa, or a "megacorp" CEO)
I'm a convinced two-boxer, but I'll try to put my argument without any bias. It seems to me the way this problem has been put has been an attempt to rig it for the one boxers. When we talk about "precommitment" it is suggested the subject has an advance knowledge of Omega and what is to happen. The way I thought the paradox worked, was that Omega would scan/analyze a person and make its prediction, all before the person ever heard of the dilemna. Therefore, a person has no way to develop an intention of being a one-boxer or a two-boxer t... (read more)
If the alien is able to predict your decision, it follows that your decision is a function of your state at the time the alien analyzes you. Then, there is no meaningful question of "what should you do?" Either you are in a universe in which you are disposed to choose the one box AND the alien has placed the million dollars, or you are in a universe in which you are disposed to take both boxes AND the alien has placed nothing. If the former, you will have the subjective experience of "deciding to take the one box", which is itself a det... (read more)
Yes, but when I tried to write it up, I realized that I was starting to write a small book. And it wasn't the most important book I had to write, so I shelved it. My slow writing speed really is the bane of my existence. The theory I worked out seems, to me, to have many nice properties besides being well-suited to Newcomblike problems. It would make a nice PhD thesis, if I could get someone to accept it as my PhD thesis. But that's pretty much what it would take to make me unshelve the project. Otherwise I can't justify the time expenditure, not at ... (read more)
Isn't this the exact opposite arguement from the one that was made in Dust Specks vs 50 Years of Torture?
Correct me if I'm wrong, but the argument in this post seems to be "Don't cling to a supposedly-perfect 'causal decision theory' if it would make you lose gracefully, take the action that makes you WIN."
And the argument for preferring 50 Years of Torture over 3^^^3 Dust Specks is that "The moral theory is perfect. It must be clung to, even when the result is a major loss."
How can both of these be true?
(And yes, I am defining "pr... (read more)
One belated point, some people seem to think that Omega's successful prediction is virtually impossible and that the experiment is a purely fanciful speculation. However it seems to me entirely plausible that having you fill out a questionnaire while being brain scanned might well bring this situation into practicality in the near future. The questions, if filled out correctly, could characterize your personality type with enough accuracy to give a very strong prediction about what you will do. And if you lie, in the future that might be detected with a br... (read more)
Somehow I'd never thought of this as a rationalist's dilemma, but rather a determinism vs free will illustration. I still see it that way. You cannot both believe you have a choice AND that Omega has perfect prediction.
The only "rational" (in all senses of the word) response I support is: shut up and multiply. Estimate the chance that he has predicted wrong, and if that gives you +expected value, take both boxes. I phrase this as advice, but in fact I mean it as prediction of rational behavior.
If you really want to impress an inspector who can see your internal state, by altering your utility function to conform to their wishes, then one strategy would be to create a trusted external "brain surgeon" agent with the keys to your utility function to change it back again after your utility function has been inspected - and then forget all about the existence of the surgeon.
The inspector will be able to see the lock on your utility function - but those are pretty standard issue.
As a rationalist, it might be worthwhile to take the one box just so those Omega know-it-alls will be wrong for once.
If random number generators not determinable by Omega exist, generate one bit of entropy. If not, take the million bucks. Quantum randomness anyone?
Given how many times Eliezer has linked to it, it's a little surprising that nobody seems to have picked up on this yet, but the paragraph about the utility function not being up for grabs seems to have a pretty serious technical flaw:
Let p = 80% and let q be one in a million. I'm pretty... (read more)
Benja, the notion is that "live forever" does not have any finite utility, since it is bounded below by a series of finite lifetimes whose utility increases without bound.
thinks -- Okay, so if I understand you correctly now, the essential thing I was missing that you meant to imply was that the utility of living forever must necessarily be equal to (cannot be larger than) the limit of the utilities of living a finite number of years. Then, if u(live forever) is finite, p times the difference between u(live forever) and u(live n years) must become arbitrarily small, and thus, eventually smaller than q times the difference between u(live n years) and u(live googolplex years). You then arrive at a contradiction, from which you... (read more)
There are two ways of thinking about the problem.
1. You see the problem as decision theorist, and see a conflict between the expected utility recommendation and the dominance principle. People who have seen the problem this way have been led into various forms of causal decision theory.
2. You see the problem as game theorist, and are trying to figure out the predictor's utility function, what points are focal and why. People who have seen the problem this way have been led into various discussions of tacit coordination.
Newcomb's scenario is a paradox, ... (read more)
Re: First, foremost, fundamentally, above all else: Rational agents should WIN.
When Deep Blue beat Gary Kasparov, did that prove that Gary Kasparov was "irrational"?
It seems as though it would be unreasonable to expect even highly rational agents to win - if pitted against superior competition. Rational agents can lose in other ways as well - e.g. by not having access to useful information.
Since there are plenty of ways in which rational agents can lose, "winning" seems unlikely to be part of a reasonable definition of rationality.
I think I've solved it.
I'm a little late to this, and given the amount of time people smarter than myself have spent thinking about this it seems naive even to myself to think that I have found a solution to this problem. That being said, try as I might, I can't find a good counter argument to this line of reasoning. Here goes...
The human brain's function is still mostly a black box to us, but the demonstrated predictive power of this alien is strong evidence that this is not the case with him. If he really can predict human decisions, than the mere fact ... (read more)
Cross-posting from Less Wrong, I think there's a generalized Russell's Paradox problem with this theory of rationality:
Eliezer, why didn't you answer the question I asked at the beginning of the comment section of this post?
The 'delayed choice' experiments of Wheeler & others appear to show a causality that goes backward in time. So, I would take just Box B.
I would use a true quantum random generator. 51% of the time I would take only one box. Otherwise I would take two boxes. Thus Omega has to guess that I will only take one box, but I have a 49% chance of taking home another $1000. My expected winnings will be $1000490 and I am per Eliezer's definition more rational than he.
I'm a bit nervous, this is my first comment here, and I feel quite out of my league.
Regarding the "free will" aspect, can one game the system? My rational choice would be to sit right there, arms crossed, and choose no box. Instead, having thus disproved Omega's infallibility, I'd wait for Omega to come back around, and try to weasel some knowledge out of her.
Rationally, the intelligence that could model mine and predict my likely action (yet fail to predict my inaction enough to not bother with me in the first place), is an intelligence I'd like... (read more)
I've come around to the majority viewpoint on the alien/Omega problem. It seems to be easier to think about when you pin it down a bit more mathematically.
Let's suppose the alien determines the probability of me one-boxing is p. For the sake of simplicity, let's assume he then puts the 1M into one of the boxes with this probability p. (In theory he could do it whenever p exceeded some thresh-hold, but this just complicates the math.)
Therefore, once I encounter the situation, there are two possible states:
a) with probability p there is 1M in one box, and 1k... (read more)
Wait a second, the following bounded utility function can explain the quoted preferences:
Benja Fallenstein gave an alternative formulation that does imply an unbounded utility function:
But these preferences are pretty counter-intuitive to me. If U(live n years) is unbounded, then the above must hold for any nonzero p, q, and with "googolplex" replaced by any finite number. For example, let p = 1/3^^^3, q = .8, n = 3^^^3, and replace "googolplex" with "0". Would you really be willing to give up .8 probability of 3^^^3 years of life for a 1/3^^^3 chance at a longer (but still finite) one? And that's true no matter how many up-arrows we add to these numbers?
Eliezer, would you be willing to bet all of your assets and future earnings against $1 of my money, that we can do an infinite amount of computation before the universe ends or becomes incapable of supporting life?
Your answer ought to be yes, if your preferences are what you state. If it turns out that we can do an infinite amount of computation before the universe ends, then this bet increases your money by $1, which allows you to increase your chance of having an infinite lifetime by some small but non-zero probability. If it turns out that our universe can't do an infinite amount of computation, you lose a lot, but the loss of expected utility is still tiny compared to what you gain.
So, is it a bet?
Also, why do you suspect that answering "No" would enable someone to demonstrate circular / inconsistent preferences on your part?
Actually, I think I can hazard a guess to that one. I think the idea would be "the simpler the mathematical structure, the more often it'd show up as a substructure in other mathematical structures"
For instance, if you are building large random graphs, you'd expect to see some specific pattern of, say, 7 vertices and 18 edges show up as subgraphs more often then, say, some specific pattern of 100 vertices and 2475 edges.
There's a sense in which "reality fluid" could be distributed evenly which would lead to this. If every entire mathematical structure got an equal amount of reality stuff, then small structures would benefit from the reality juice granted to the larger structures that they happen to also exist as substructures of.
EDIT: blargh, corrected big graph edge count. meant to represent half a complete graph.
I really don't see what the problem is. Clearly, the being has "read your mind" and knows what you will do. If you are of the opinion to take both boxes, he knows that from his mind scan, and you are playing right into his hands.
Obviously, your decision cannot affect the outcome because it's already been decided what's in the box, but your BRAIN affected what he put in the box.
It's like me handing you an opaque box and telling you there is $1 million in it if and only if you go and commit murder. Then, you open the box and find it empty. I then o... (read more)
I one-box, but not because I haven't considered the two-box issue.
I one-box because it's a win-win in the larger context. Either I walk off with a million dollars, OR I become the first person to outthink Omega and provide new data to those who are following Omega's exploits.
Even without thinking outside the problem, Omega is a game-breaker. We do not, in the problem as stated, have any information on Omega other than that they are superintelligent and may be able to act outside of casuality. Or else Omega is simply a superduperpredictor, to the point wher... (read more)
My solution to the problem of the two boxes:
Flip a coin. If heads, both A & B. If tails, only A. (If the superintelligence can predict a coin flip, make it a radioactive decay or something. Eat quantum, Hal.)
In all seriousness, this is a very odd problem (I love it!). Of course two boxes is the rational solution - it's not as if post-facto cogitation is going to change anything. But the problem statement seems to imply that it is actually impossible for me to choose the choice I don't choose, i.e., choice is actually impossible.
Something is absurd here. I suspect it's the idea that my choice is totally predictable. There can be a random element to my choice if I so choose, which kills Omega's plan.
I'm not reading 127 comments, but as a newcomer who's been invited to read this page, along with barely a dozen others, as an introduction, I don't want to leave this unanswered, even though what I have to say has probably already been said.
First of all, the answer to Newcomb's Problem depends a lot on precisely what the problem is. I have seen versions that posit time travel, and therefore backwards causality. In that case, it's quite reasonable to take only one box, because your decision to do so does have a causal effect on the amount in Box B. Presu... (read more)
You are disposed to take two boxes. Omega can tell. (Perhaps by reading your comment. Heck, I can tell by reading your comment, and I'm not even a superintelligence.) Omega will therefore not put a million dollars in Box B if it sets you a Newcomb's problem, because its decision to do so depends on whether you are disposed to take both boxes or not, and you are.
I am disposed to take one box. Omega can tell. (Perhaps by reading this comment. I bet you can tell by reading my comment, and I also bet that you're not a superintelligence.) Omega will therefore put a million dollars in Box B if it sets me a Newcomb's problem, because its decision to do so depends on whether I am disposed to take both boxes or not, and I'm not.
If we both get pairs of boxes to choose from, I will get a million dollars. You will get a thousand dollars. I will be monetarily better off than you.
But wait! You can fix this. All you have to do is be disposed to take just Box B. You can do this right now; there's no reason to wait until Omega turns up. Omega does not care why you are so disposed, only that you are so disposed. You can mutter to yourself all you like about how silly the problem is; as long as you wander off with just B under your arm, it will tend to be the case that you end the day a millionaire.
Omega lets me decide to take only one box after meeting Omega, when I have already updated on the fact that Omega exists, and so I have much better knowledge about which sort of god I'm likely to encounter. Upsilon treats me on the basis of a guess I would subjunctively make without knowledge of Upsilon. It is therefore not surprising that I tend to do much better with Omega than with Upsilon, because the relevant choices being made by me are being made with much better knowledge. To put it another way, when Omega offers me a Newcomb's Problem, I will condition my choice on the known existence of Omega, and all the Upsilon-like gods will tend to cancel out into Pascal's Wagers. If I run into an Upsilon-like god, then, I am not overly worried about my poor performance - it's like running into the Christian God, you're screwed, but so what, you won't actually run into one. Even the best rational agents cannot perform well on this sort of subjunctive hypothesis without much better knowledge while making the relevant choices than you are offering them. For every rational agent who performs well with respect to Upsilon there is one who performs poorly with respect to anti-Upsilon.... (read more)
It doesn't have to be even remotely close to good enough to that for the scenario. I'd bet a sufficiently good human psychologist could take omega's role and get it 90%+ right if he tests and interviews the people extensively first (without them knowing the purpose) and gets to exclude people he is unsure about. A super intelligent being should be far, far better at this.
You yourself claim to know what you would do in the boxing experiment, and you are an agent limited by conventional physics. There is no physical law that forbids another agent from knowing you as well as (or even better than) you know yourself.
You'll have to explain why you think 99.99% (or whatever) is not good enough, a 0.01% chance to win $ 1000 shouldn't make up for a 99.99% chance of losing $999,000.
Assume Omega has a probability X of correctly predicting your decision:
If you choose to two-box:
If you choose to take box B only:
Your expected utilities for two-boxing and one-boxing are (respectively):
E2 = 1000X + (1-X)1001000
E1 = 1000000X
For E2 > E1, we must have 1000X + 1,001,000 - 1,001,000X - 1,000,000X > 0, or 1,001,000 > 2,000,000X, or
X < 0.5005
So as long as Omega can maintain a greater than 50% accuracy, you should expect to earn more money by one-boxing. Since the solution seems so simple, and since I'm a total novice at decision theory, it's possible I'm missing something here, so please let me know.
There is a good chance I am missing something here, but from an economic perspective this seems trivial:
P(Om) is the probability the person assigns Omega of being able to accurately predict their decision ahead of time.
A. P(Om) x $1m is the expected return from opening one box.
B. (1 - P(Om))x$1m + $1000 is the expected return of opening both boxes (the probability that Omega was wrong times the million plus the thousand.)
Since P(Om) is dependent on people's individual belief about Omega's ability to predict their actions it is not surprising different peop... (read more)
Re: "Do you take both boxes, or only box B?"
It would sure be nice to get hold of some more data about the "100 observed occasions so far". If Omega only visits two-boxers - or tries to minimise his outgoings - it would be good to know that. Such information might well be accessible - if we have enough information about Omega to be convinced of his existence in the first place.
What this is really saying is “if something impossible (according to your current theory of the world) actually happens, then rather than insisting it’s impossible and ignoring it, you should revise your theory to say that’s possible”. In this case, the impossible thing is reverse causality; since we are told of evidence that reverse causality has happened in the form of 100 successful previous experiments, we must revise our theory to accept that reverse causality actually can happen. This would lead us to the conclusion that we should take one box. Alter... (read more)
The link to that thesis doesn't seem to work for me.
A quick google turned up one that does
You know, I honestly don't even understand why this is a point of debate. One boxing and taking box B (and being the kind of person who will predictably do that) seem so obviously like the rational strategy that it shouldn't even require explanation.
And not obvious in the same way most people think the monty hill problem (game show, three doors, goats behind two, sports-car behind one, ya know?) seems 'obvious' at first.
In the case of the monty hill problem, you play with it, and the cracks start to show up, and you dig down to the surprising truth.
In this case, I don't see how anyone could see and cracks in the first place.
Am I missing something here?
Mr Eliezer, I think you've missed a few points here. However, I've probably missed more. I apologise for errors in advance.
An analogy occurs to me about "regret of rationality."
Sometimes you hear complaints about the Geneva Convention during wartime. "We have to restrain ourselves, but our enemies fight dirty. They're at an advantage because they don't have our scruples!" Now, if you replied, "So are you advocating scrapping the Geneva Convention?" you might get the response "No way. It's a good set of rules, on balance." And I don't think this is an incoherent position: he approves of the rule, but regrets the harm it causes in thi... (read more)
"Verbal arguments for one-boxing are easy to come by, what's hard is developing a good decision theory that one-boxes"
First, the problem needs a couple ambiguities resolved, so we'll use three assumptions: A) You are making this decision based on a deterministic, rational philosophy (no randomization, external factors, etc. can be used to make your decision on the box) B) Omega is in fact infallible C) Getting more money is the goal (i.e. we are excluding decision-makers which would prefer to get less money, and other such absurdities)
Changing an... (read more)
A way of thinking of this "paradox" that I've found helpful is to see the two-boxer as imagining more outcomes than there actually are. For a payoff matrix of this scenario, the two-boxer would draw four possible outcomes: $0, $1000, $1000000, and $1001000 and would try for $1000 or $1001000. But if Omega is a perfect predictor, than the two that involve it making a mistake ($0 and $1001000) are very unlikely. The one-boxer sees only the two plausible options and goes for $1000000.
It took me a week to think about it. Then I read all the comments, and thought about it some more. And now I think I have this "problem" well in hand. I also think that, incidentally, I arrived at Eliezer's answer as well, though since he never spelled it out I can't be sure.
To be clear - a lot of people have said that the decision depends on the problem parameters, so I'll explain just what it is I'm solving. See, Eliezer wants our decision theory to WIN. That implies that we have all the relevant information - we can think of a lot of situation... (read more)
I wanted to consider some truly silly solution. But since taking only box A is out (and I can’t find a good reason for choosing box A, other than a vague argument based in irrationality along the lines that I’d rather not know if omniscience exists…), so I came up with this instead. I won't apologize for all the math-economics, but it might get dense.
Omega has been correct 100 times before, right? Fully intending to take both boxes, I’ll go to each of the 100 other people. There’re 4 categories of people. Let’s assume they aren’t bound by psychology and th... (read more)
1) I would one-box. Here's where I think the standard two-boxer argument breaks down. It's the idea of making a decision. The two-boxer idea is that once the boxes have been fixed the course of action that makes the most money is taking both boxes. Unless there is reverse causality going on here, I don't think that anyone disputes this. If at that moment you could make a choice totally independently of everything leading up to that point you would two-box. Unfortunately, the very existence of Omega implies that such a feat is impossible.
2) A mildly s... (read more)
Actually I take it back. I think that what I would do depends on what I know of how Omega functions (exactly what evidence lead me to believe that he was good at predicting this).
Omega #1: (and I think this one is the most plausible) You are given a multiple choice personality test (not knowing what's about to happen). You are then told that you are in a Newcomb situation and that Omega's prediction is based on your test answers (maybe they'll even show you Omega's code after the test is over). Here I'll two-box. If I am punished I am not being punish... (read more)
"the dominant consensus in modern decision theory is that one should two-box...there's a common attitude that verbal arguments for one-boxing are easy to come by, what's hard is developing a good decision theory that one-boxes"
This may be more a statement about the relevance and utility of decision theory itself as a field (or lack thereof) than the difficulty of the problem, but it is at least philosophically intriguing.
From a physical and computational perspective, there is no paradox, and one need not invoke backwards causality, 'pre-commitmen... (read more)
Upon reading this, I immediately went,
"Well, General Relativity includes solutions that have closed timelike curves, and I certainly am not in any position to rule out the possibility of communication by such. So I have no actual reason to rule out the possibility that which strategy I choose will, after I make my decision, be communicated to Omega in my past and then the boxes filled accordingly. So I better one-box in order to choose the closed timelike loop where Omega fills the box."
I understand, looking at Wikipedia, that in Nozick's formu... (read more)
The "no backwards causality" argument seems like a case of conflation of correlation and causation. Your decision doesn't retroactively cause Omega to fill the boxes in a certain way; some prior state of the world causes your thought processes and Omega's prediction, and the correlation is exactly or almost exactly 1.
EDIT: Correlation coefficients don't work like that, but whatever. You get what I mean.
The "no backwards causality" argument seems like a case of conflation of correlation and causation. Your decision doesn't retroactively cause Omega to fill the boxes in a certain way; some prior state of the world causes your thought processes and Omega's prediction, and the correlation is exactly or almost exactly 1.
The original description of the problem doesn't mention if you know of Omega's strategy for deciding what to place in box B, or their success history in predicting this outcome - which is obviously a very important factor.
If you know these things, then the only rational choice, obviously and by a huge margin, is to pick only box B.
If you don't know anything other than box B may or may not contain a million dollars, and you have no reasons to believe that it's unlikely, like in the lottery, then the only rational decision is to take both. This also seems to... (read more)
You are betting a positive extra payout of $1,000 against a net loss of -$999,000 that there are no Black Swans[1] at all in this situation.
Given that you already have 100 points of evidence that taking Box A makes Box B empty (added to the evidence that Omega is more intelligent than you). I'd say that's a Bad Bet to make.
Given the amount of uncertainty in the world, choosing Box B instead of trying to "beat the system" seems like the rational step to me.
Edit I've given the Math in a comment below to show how to calculate when to make either dec... (read more)
How would Newcomb's problem look like in the physical world, taking quantum physics into account? Specifically, would Omega need to know quantum physics in order to predict my decision on "to one box or not to one box"?
To simplify the picture, imagine that Omega has a variable with it that can be either in the state A+B or B and which is expected to correlate with my decision and therefore serves to "predict" me. Omega runs some physical process to arrive at the contents of this variable. I'm assuming that "to predict" means &... (read more)
...
...
It seems to me that if all that is true, and you want to build a Friendly AI, then the rational thing to do he... (read more)
You said:
but you also said:
I can envision several possibilities:
Would you like to clarify?
Causal decision theorists self-modify to one-box on Newcomb's Problem with Omegas that looked at their source code after the self-modification took place; i.e., if the causal decision theorist self-modifies at 7am, it will self-modify to one-box with Omegas that looked at the code after 7am and two-box otherwise. This is not only ugly but also has worse implications for e.g. meeting an alien AI who wants to cooperate with you, or worse, an alien AI that is trying to blackmail you.
Bad decision theories don't necessarily self-repair correctly.
And in general, every time you throw up your hands in the air and say, "I don't know how to solve this problem, nor do I understand the exact structure of the calculation my computer program will perform in the course of solving this problem, nor can I state a mathematically precise meta-question, but I'm going to rely on the AI solving it for me 'cause it's supposed to be super-smart," you may very possibly be about to screw up really damned hard. I mean, that's what Eliezer-1999 thought you could say about "morality".
Sorry if this has already been addressed. I didn't take the time to read all 300 comments.
It seems to me that if there were an omniscient Omega, the world would be deterministic, and you wouldn't have free will. You have the illusion of choice, but your choice is already known by Omega. Hence, try (it's futile) to make your illusory choice a one-boxer.
Personally, I don't believe in determinism or the concept of Omega. This is a nice thought experiment though.
I don't grasp why this problem seems so hard and convoluted. Of course you have to one-box, if you two-box you'll lose for sure. From my perspective two-boxing is irrational...
If Omega can flawlessly predict the future, this confirms a deterministic world at the atomic scale. To be a perfect predictor Omega would also need to have a perfect model of my brain at every stage of making my "decision" - thus Omega can see the future and perfectly predict whether or not I'm gonna two-box or not.
If my brain is wired up in such a way as to choose two-box... (read more)
I'm kind of surprised at how complicated everyone is making this, because to me the Bayesian answer jumped out as soon as I finished reading your definition of the problem, even before the first "argument" between one and two boxers. And it's about five sentences long:
Don't choose an amount of money. Choose an expected amount of money--the dollar value multiplied by its probability. One-box gets you >(1,000,000*.99). Two-box gets you <(1,000*1+1,000,000*.01). One-box has superior expected returns. Probability theory doesn't usually encounte... (read more)
I would take box B, because it would be empty.
I see your general point, but it seems like the solution to the Omega example is trivial if Omega is assumed to be able to predict accurately most of the time:
(letting C = Omega predicted correctly; let's assume for simplicity that Omega's fallibility is the same for false positives and false negatives)
Setting these equal to find the equilibrium point:
1000000 P(C) = 1000 + 1000000 (1 - P(C))
1000 P(C) = 1001 - 1000
It certainly seems like a simple resolution exists...
As a rationalist, there should only ever be one choice you make. It should be the ideal choice. If you are a perfectly rational person, you will only ever make the ideal choice. You are certainly at least, deterministic. If you can make the ideal choice, so can someone else. That means, if someone knows your exact situation (trivial in the Newcomb paradox, as the super intelligent agent is causing your situation) then they can predict exactly what you will do, even without being perfectly rational themse... (read more)
Well, for me there are two possible hypothesis for that :
The boxes are not what they seem. For example, box B contains nano-machinery that detects if you one-box or not, create money if you one-box, and then self-destruct the nano-machinery.
Omega is smart enough to be able to predict if I'll one-box or two-box (he scanned my brain, runned it in a simulation, and saw my I do... I hope he didn't turn off the simulation afterwards, or he would have killed "me" then !).
In both cases, I should one-box. So I'll one-box. I don't really get the ra... (read more)
It's strange. I perfectly agree with the argument here about rationality - the rationality I want is the rationality that wins, not the rationality that is more reasonable. This agrees with my privileging truth as a leading which is useful, not which necessarily makes the best predictions. But in other points on the site, it always seems that correspondence is privileged over value.
As for Newcombs paradox, I suggest writing out all the relevant propositions a la Jaynes, with non-zero probabilities for all propositions. Make it a real problem, not an ideali... (read more)
An amusing n=3 survey of mathematics undergrads at Trinity Cambridge:
1) Refused to answer. 2) It depends on how reliable Omega is/but you cant (shouldn't) really quantify ethics anyway/this situation is unreasonable. 3) Obviously 2 box, one boxing is insane.
3 said he would program an AI to one box. And when I pointed out that his brain was built of quarks just like the AI he responded that in that case free will didn't exist and choice was impossible.
Upvoted for this sentence:
"If it ever turns out that Bayes fails - receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions - then Bayes has to go out the window."
This is such an important concept.
I will say this declaratively: The correct choice is to take only box two. If you disagree, check your premises.
"But it is agreed even among causal decision theorists that if you have the power to precommit yourself to take one box, in Newcomb's Problem, then you should do so. If y... (read more)
Yes, but like falsifiability, dangerous. This also goes for 'rationalists win', too.
'We' (Bayesians) face the Duhem-Quine thesis with a vengeance: we have often found situations where Bayes failed. And then we rescued it (we think) by either coming up with novel theses (TDT) or carefully analyzing the problem or a related problem and saying that is the real answer and so Bayes works after all (Jaynes again and again). Have we corrected ourselves or just added epicycles and special pleading? Should we just have tossed Bayes out the window at that point except in the limited areas we already proved it to be optimal or useful?
This can't really be answered.
I think it is important to make a distinction between what our choice is now, while we are here, sitting at a computer screen, unconfronted by Omega, and our choice when actually confronted by Omega. When actually confronted by Omega, your choice has been determined. Take both boxes, take all the money. Right now, sitting in your comfy chair? Take the million-dollar box. In the comfy chair, the contra-factual nature of the experiment basically gives you an Outcome Pump. So take the million-dollar box, because if you take the million-dollar box, it's full of a million dollars. But when it actually happens, the situation is different. You aren't in your comfy chair anymore.
I guess my cognition just breaks down over the idea of Omega. To me, Newcomb's problem seems akin to a theological argument. Either we are talking about a purely theoretical idea that is meant to illustrate abstract decision theory, in which case I don't care how many boxes I take, because it has no bearing on anything tied to reality, or we are actually talking about the real universe, in which case I take both boxes because I don't believe in alien superintelligences capable of foreseeing my choices any more than I believe in an anthropomorphic deity.
If in 35 AD you were told that there were only 100 people who had seen Jesus dead and entombed and then had seen him alive afterwards, and that there were no people who had seen him dead and entombed who had seen his dead body afterwards, would you believe he had been resurrected?
In Newcomb's problem as stated, we are told 100 people have gotten the predicted answer. Then no matter how unlikely our priors put on a superintelligent alien being able to predict what we would do, we should accept this as proof.
This seems like a pretty symmetric question t... (read more)
Really? A Phd ? Seriously ?
If Omega said "You shall only take Box B or I will smite thee" and then proceeded to smite a 100 infidels who dared to two box the rational choice would be obvious (especially if the smiting happened after O left)
is this really difficult to show mathematicly ?