Follow-up to: Normative uncertainty in Newcomb's problem

Philosophers and atheists break for two-boxing; theists and Less Wrong break for one-boxing
Personally, I would one-box on Newcomb's Problem. Conditional on one-boxing for lawful reasons, one boxing earns $1,000,000, while two-boxing, conditional on two-boxing for lawful reasons, would deliver only a thousand. But this seems to be firmly a minority view in philosophy, and numerous heuristics about expert opinion suggest that I should re-examine the view.

In the PhilPapers survey, Philosophy undergraduates start off divided roughly evenly between one-boxing and two-boxing:

Newcomb's problem: one box or two boxes?

Other 142 / 217 (65.4%)
Accept or lean toward: one box 40 / 217 (18.4%)
Accept or lean toward: two boxes 35 / 217 (16.1%)

But philosophy faculty, who have learned more (less likely to have no opinion), and been subject to further selection, break in favor of two-boxing:

Newcomb's problem: one box or two boxes?

Other 441 / 931 (47.4%)
Accept or lean toward: two boxes 292 / 931 (31.4%)
Accept or lean toward: one box 198 / 931 (21.3%)

Specialists in decision theory (who are also more atheistic, more compatibilist about free will, and more physicalist than faculty in general) are even more convinced:

Newcomb's problem: one box or two boxes?

Accept or lean toward: two boxes 19 / 31 (61.3%)
Accept or lean toward: one box 8 / 31 (25.8%)
Other 4 / 31 (12.9%)

Looking at the correlates of answers about Newcomb's problem, two-boxers are more likely to believe in physicalism about consciousness, atheism about religion, and other positions generally popular around here (which are also usually, but not always, in the direction of philosophical opinion). Zooming in one correlate, most theists with an opinion are one-boxers, while atheists break for two-boxing:

Newcomb's problem:two boxes 0.125
  one box two boxes
28.6% (145/506)
48.8% (247/506)
40.8% (40/98)
31.6% (31/98)
Response pairs: 655   p-value: 0.001

Less Wrong breaks overwhelmingly for one-boxing in survey answers for 2012:

One-box: 726, 61.4%
Two-box: 78, 6.6%
Not sure: 53, 4.5%
Don't understand: 86, 7.3%
No answer: 240, 20.3%

When I elicited LW confidence levels in a poll, a majority indicated 99%+ confidence in one-boxing, and 77% of respondents indicated 80%+ confidence.

What's going on?

I would like to understand what is driving this difference of opinion. My poll was a (weak) test of the hypothesis that Less Wrongers were more likely to account for uncertainty about decision theory: since on the standard Newcomb's problem one-boxers get $1,000,000, while two-boxers get $1,000, even a modest credence in the correct theory recommending one-boxing could justify the action of one-boxing.

If new graduate students read the computer science literature on program equilibrium, including some local contributions like Robust Cooperation in the Prisoner's Dilemma and A Comparison of Decision Algorithms on Newcomblike Problems, I would guess they would tend to shift more towards one-boxing. Thinking about what sort of decision algorithms it is rational to program, or what decision algorithms would prosper over numerous one-shot Prisoner's Dilemmas with visible source code, could also shift intuitions. A number of philosophers I have spoken with have indicated that frameworks like the use of causal models with nodes for logical uncertainty are meaningful contributions to thinking about decision theory. However, I doubt that for those with opinions, the balance would swing from almost 3:1 for two-boxing to 9:1 for one-boxing, even concentrating on new decision theory graduate students.

On the other hand, there may be an effect of unbalanced presentation to non-experts. Less Wrong is on average less philosophically sophisticated than professional philosophers. Since philosophical training is associated with a shift towards two-boxing, some of the difference in opinion could reflect a difference in training. Then, postings on decision theory have almost all either argued for or assumed one-boxing as the correct response on Newcomb's problem. It might be that if academic decision theorists were making arguments for two-boxing here, or if there was a reduction in pro one-boxing social pressure, there would be a shift in Less Wrong opinion towards two-boxing.

Less Wrongers, what's going on here? What are the relative causal roles of these and other factors in this divergence?

ETA: The SEP article on Causal Decision Theory.


New Comment
300 comments, sorted by Click to highlight new comments since: Today at 5:08 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

What's going on is that Eliezer Yudkowsky has argued forcefully for one-boxing, in terms of his "way of winning" thing, which, after reading the other stuff he wrote about that (like the "nameless virtue"), probably created a "why aren't you winning" alarm bell in people's heads.

Most philosophers haven't been introduced to the problem by Eliezer Yudkowsky.

To me, Newcomb's problem seemed like a contrived trick to punish CDT, and it seemed that any other decision theory was just as likely to run into some other strange scenario to punish it, until I started thinking about AIs that could simulate you accurately, something else that differentiates LessWrong from professional philosophers.

When I realized the only criteria by which a "best decision theory" could be crowned was winning in as many realistic scenarios as possible, and stopped caring that "acausal control" sounded like an oxymoron, and that there could potentially be Newcomblike problems to face in real life, and that there were decision theories that could win on Newcomb's problem without bungling the smoker's lesion problem, and read this:

What if your daughter had a 90% fatal disease, and box A contained a serum with a 20% chance of curing her, and box B might contain a serum with a 95% chance of curing her?

that convinced me to one-box.


About atheists vs theists and undergrads vs philosophers, I think two-boxing is a position that preys on your self-image as a rationalist. It feels like you are getting punished for being rational, like you are losing not because of your choice, but because of who you are (I would say your choice is embedded in who you are, so there is no difference). One-boxing feels like magical thinking. Atheists and philosophers have stronger self-images as rationalists. Most haven't grokked this:

How can you improve your conception of rationality? Not by saying to yourself, “It is my duty to be rational.” By this you only enshrine your mistaken conception. Perhaps your conception of rationality is that it is rational to believe the words of the Great Teacher, and the Great Teacher says, “The sky is green,” and you look up at the sky and see blue. If you think: “It may look like the sky is blue, but rationality is to believe the words of the Great Teacher,” you lose a chance to discover your mistake.

Will's link has an Asimov quote that supports the "self-image vs right answer" idea, at least for Asimov:

I would, without hesitation, take both boxes . . . I am myself a determinist, but it is perfectly clear to me that any human being worthy of being considered a human being (including most certainly myself) would prefer free will, if such a thing could exist. . . Now, then, suppose you take both boxes and it turns out (as it almost certainly will) that God has foreseen this and placed nothing in the second box. You will then, at least, have expressed your willingness to gamble on his nonomniscience and on your own free will and will have willingly given up a million dollars for the sake of that willingness-itself a snap of the finger in the face of the Almighty and a vote, however futile, for free will. . . And, of course, if God has muffed and left a million dollars in the box, then not only will you have gained that million, but far more imponant you will have demonstrated God's nonomniscience.9

Seems like Asimov isn't taking the stakes seriously enough. Maybe we should replace "a million dollars" with "your daughter here gets to live."

And only coincidentally signalling that his status is worth more than a million dollars.

But losing the million dollars also shoves in your face your ultimate predictability.

Voluntarily taking a loss in order to insult yourself doesn't seem rational to me.

Plus, that's not a form of free will I even care about. I like that my insides obey laws. I'm not fond of the massive privacy violation, but that'd be there or not regardless of my choice.

Adding to your story, it's not just Eliezer Yudkowsky's introduction to Newcomb's problem. It's the entire Bayesian / Less Wrong mindset. Here, Eliezer wrote: I felt something similar when I was reading through the sequences. Everything "clicked" for me - it just made sense. I couldn't imagine thinking another way. Same with Newcomb's problem. I wasn't introduced to it by Eliezer, but I still thought one-boxing was obvious; it works. Many Less Wrongers that have stuck around probably have had a similar experience; the Bayesian standpoint seems intuitive. Eliezer's support certainly helps to propagate one-boxing, but LessWrongers seem to be a self-selecting group.
It also helps that most Bayesian decision algorithms actually take on the arg max_a U(a)*P(a) reasoning of Evidential Decision Theory, which means that whenever you invoke your self-image as a capital-B Bayesian you are semi-consciously invoking Evidential Decision Theory, which does actually get the right answer, even if it messes up on other problems. (Commenting because I got here while looking for citations for my WIP post about another way to handle Newcomb-like problems.)
It may well be the strength of argument. It could also be the lead of a very influential/respected figure and the power of groupthink. In my experience, 2 forums with similar mission statements ('political debating sites', say, or 'atheist sites') often end up having distinct positions on all sorts of things that most of their posters converge around. The same is true of any group, although if 'theists' was a genuine survey of theists the world over it's at least a far more representative group. It would be very interesting to add a control group in some way: confront someone with the issue who was of a typical LessWrong demographic but hadn't read anything about Newcomb on Less Wrong for instance. If it's not just a quality of this sort of group-think, my best guess is that it's to do with the greater practical focus (or at least theoretical belief in practical focus!) on LessWrong. I suspect most people automatically parse this sort of philosophical question as 'what is more abstractly logical' whereas people on here probably parse it more as 'what should I do to win'. But I think these sort of 'our inherent group qualities' explanations are almost always locatable but often unnecessary in light of group-think.
David Wolpert of the "No Free Lunch Theorem" was one of my favorite researchers back in the 90s. If I remember it right, part of the No Free Lunch Theorem for generalizers was that for any world where your generalizer worked, there would be another world where it didn't. The issue was the fit of your generalizer to the universe you were in. Has anyone actually wrote out the bayesian updating for Newcomb? It should take quite a lot of evidence for me to give up on causality as is. As it turns out, looking at the Newcomb's Paradox wikipedia page, Wolpert was on the job for this problem, pointing out " It is straightforward to prove that the two strategies for which boxes to choose make mutually inconsistent assumptions for the underlying Bayes net". Yes, that's about my feeling. A hypothetical is constructed which contradicts something for which we have great evidence. Choosing to overturn old conclusions on the basis of new evidence is a matter of the probabilities you've assigned to the different and contradictory theories. Really nothing to see here. Hypothesizing strong evidence that contradicts something you've assigned high probability to naturally feels confusing. Of course it does.
Your past, Omega-observed self can cause both Omega's prediction and your future choice without violating causality. What you're objecting to is your being predictable.
My past self is not the cause of my future choices, it is one of many distal causes for my future choices. Similarly, it is not the cause of Omega's prediction. The direct cause of my future choice is my future self and his future situation, where Omega is going to rig the future situation so that my future self is screwed if he makes the usual causal analysis. Predictable is fine. People predict my behavior all the time, and in general, it's a good thing for both of us. As far as Omega goes, I object to his toying with inferior beings. We could probably rig up something to the same effect with dogs, using their biases and limitations against them so that we can predict their choices, and arrange it so that if they did the normally right thing to do, they always get screwed. I think that would be a rather malicious and sadistic thing to do to a dog, as I consider the same done to me. As far as this "paradox" goes, I object to the smuggled recursion, which is just another game of "everything I say is a lie". I similarly object to other "super rationality" ploys. I also object to the lack of explicit bayesian update analysis. Talky talky is what keeps a paradox going. Serious analysis makes one's assumptions explicit.
The obvious difference between these hypotheticals is that you're smart enough to figure out the right thing to do in this novel situation.
It's also worth mentioning that in the initial wording of the problem, unlike Eliezer's wording, it was just stated that the predictor is "almost certain" to have predicted correctly about which boxes you are going to take, and it was also specifically stated that there is no reverse causality going on (that what you actually decide to do now has no effect on what is in the boxes.)
For some reason this expression makes me think of the Princess Bride (she's only Mostly Dead).
Are you implying there exist decision theories that are are less likely to run into strange scenarios that punish it? I would think that Omegas could choose to give any agent prejudicial treatment.
We have to determine what counts as "unfair". Newcomb's problem looks unfair because your decision seems to change the past. I have seen another Newcomb-like problem that was (I believe) genuinely unfair, because depending on their decision theory, the agents were not in the same epistemic state. Here what I think is a "fair" problem. It's when 1. the initial epistemic state of the agent is independent of its source code; 2. given the decisions of the agent, the end result is independent of its source code; 3. if there are intermediary steps, then given the decisions of the agent up to any given point, its epistemic state and any intermediate result accessible to the agent at that point are independent of its source code. If we think of the agent as a program, I think we can equate "decision" with the agent's output. It's harder however to equate "epistemic state" with its input: recall Omega saying "Here is the 2 usual boxes. I have submitted this very problem in a simulation to TDT. If it one boxed, box B has the million. If it two boxed, box B is empty". So, if you're TDT, this problem is equivalent to the old Newcomb problem, where oneBox <=> $$$. But any other agent could 2 box, and get the million and the bonus. (Also, "TDT" could be replaced by a source code listing that the agent would recognize as its own.) -------------------------------------------------------------------------------- Anyway, I believe there's a good chance a decision theory exists such that it gets the best results out of any "fair" problem.Though now that I think of it, condition 2 may be a sufficient criterion for "fairness", for the problem above violates it: if TDT two-boxes, it does not get the million. Well except it does not two box, so my counter-factual doesn't really mean anything…
It still seems to me that you can't have a BestDecisionAgent. Suppose agents are black boxes -- Omegas can simulate agents at will, but not view their source code. An Omega goes around offering agents a choice between: * $1, or * $100 if the Omega thinks the agent acts differently than BestDecisionAgent in a simulated rationality test, otherwise $2 if the agent acts like BestDecisionAgent in the rationality test. Does this test meet your criteria for a fair test? If not, why not?
I think I have left a loophole. In your example, Omega is analysing the agent by analysing its outputs in unrelated, and most of all, unspecified problems. I think the end result should only depend on the output of the agent on the problem at hand. Here's a possibly real life variation. Instead of simulating the agent, you throw a number of problems at it beforehand, without telling it it will be related to a future problem. Like, throw an exam at a human student (with a real stake at the end, such as grades). Then, later you submit the student to the following problem: Sounds like something like that could "reasonably" happen in real life. But I don't think it's "fair" either, if only because being discriminated for being capable of taking good decisions is so unexpected.
Omega gives you a choice of either $1 or $X, where X is either 2 or 100? It seems like you must have meant something else, but I can't figure it out.
Yes, that's what I mean. I'd like to know what, if anything, is wrong with this argument that no decision theory can be optimal. Suppose that there were a computable decision theory T that was at least as good as all other theories. In any fair problem, no other decision theory could recommend actions with better expected outcomes than the expected outcomes of T's recommended actions. 1. We can construct a computable agent, BestDecisionAgent, using theory T. 2. For any fair problem, no computable agent can perform better (on average) than BestDecisionAgent. 3. Call the problem presented in the grandfather post the Prejudiced Omega Problem. In the Prejudiced Omega Problem, BestDecisionAgent will almost assuredly collect $2. 4. In the Prejudiced Omega Problem, another agent can almost assuredly collect $100. 5. The Prejudiced Omega Problem does not involve an Omega inspecting the source code of the agent. 6. The Prejudiced Omega Problem, like Newcomb's problem, is fair. 7. Contradiction I'm not asserting this argument is correct -- I just want to know where people disagree with it. Qiaochu_Yuan's post is related.
Let BestDecisionAgent choose the $1 with probability p. Then the various outcomes are: Simulation's choice | Our Choice | Payoff $1 | $1 = $1 $1 | $2 or $100 = $100 $2 or $100 | $1 = $1 $2 or $100 | $2 or $100 = $2 And so p should be chosen to maximise p^2 + 100p(1-p) + p(1-p) + 2(1-p)^2. This is equal to the quadratic -98p^2 + 97p + 2, which Wolfram Alpha says is maximised by p = 97/196, for a expected payoff of ~$26. If we are not BestDecisionAgent, and so are allowed to choose separately, we aim to maximise pq + 100p(1-q) + q(1-p) + 2(1-p)(1-q), which simplifies to -98pq+98p-q+2, which is maximized by q = 0, for a payoff of ~$50.5. This surprises me, I was expecting to get p = q. So (3) and (4) are not quite right, but the result is similar. I suspect BestDecisionAgent should be able to pick p such that p = q is the best option for any agent, at the cost of reducing the value it gets. ETA: Of course you can do this just by setting p = 0, which is what you assume. Which, actually, means that (3) and (4) contradict each other: if BestDecisionAgent always picks the $2 over the $1, then the best any agent can do is $2. (Incidentally, how do you format tables properly in comments?)
The Omega chooses payoff of $2 vs. $100 based off of a separate test that can differentiate between BestDecisionAgent and some other agent. If we are BestDecisionAgent, the Omega will know this and will be offered at most a $2 payoff. But some other agent will be different from BestDecisionAgent in a way that the Omega detects and cares about. That agent can decide between $1 and $100. Since another agent can perform better than BestDecisionAgent, BestDecisionAgent cannot be optimal.
Ah, ok. In that case though, the other agent wins at this game at the expense of failing at some other game. Depending on what types of games the agent is likely to encounter, this agents effectiveness may or may not actually be better than BestDecisionAgent. So we could possibly have an optimal decision agent in the sense that no change to its algorithm could increase its expected lifetime utility, but not to the extent of not failing in any game.
The problem of cooperation and simulation is one that happens in reality, right now, even though simulation accuracy is far far lower. The problem of Omega is a reductio of this but I think it's plausible that entities can approach omega-level abilities of prediction even if they'll never actually get that accurate.

I've been reading a little of the philosophical literature on decision theory lately, and at least some two-boxers have an intuition that I hadn't thought about before that Newcomb's problem is "unfair." That is, for a wide range of pairs of decision theories X and Y, you could imagine a problem which essentially takes the form "Omega punishes agents who use decision theory X and rewards agents who use decision theory Y," and this is not a "fair" test of the relative merits of the two decision theories.

The idea that rationalists should win, in this context, has a specific name: it's called the Why Ain'cha Rich defense, and I think what I've said above is the intuition powering counterarguments to it.

I'm a little more sympathetic to this objection than I was before delving into the literature. A complete counterargument to it should at least attempt to define what fair means and argue that Newcomb is in fact a fair problem. (This seems related to the issue of defining what a fair opponent is in modal combat.)

TDT's reply to this is a bit more specific.

Informally: Since Omega represents a setup which rewards agents who make a certain decision X, and reality doesn't care why or by what exact algorithm you arrive at X so long as you arrive at X, the problem is fair. Unfair would be "We'll examine your source code and punish you iff you're a CDT agent, but we won't punish another agent who two-boxes as the output of a different algorithm even though your two algorithms had the same output." The problem should not care whether you arrive at your decisions by maximizing expected utility or by picking the first option in English alphabetical order, so long as you arrive at the same decision either way.

More formally: TDT corresponds to maximizing on the class of problems whose payoff is determined by 'the sort of decision you make in the world that you actually encounter, having the algorithm that you do'. CDT corresponds to maximizing over a fair problem class consisting of scenarios whose payoff is determined only by your physical act, and would be a good strategy in the real world if no other agent ever had an algorithm similar to yours (you must be the only CDT-agent in the u... (read more)

I'd just like to say that this comparison of CDT, TDT, and UDT was a very good explanation of the differences. Thanks for that.
Agreed. Found the distinction between TDT and UDT especially clear here.
This explanation makes UDT seem strictly more powerful than TDT (if UDT can handle Parfit's Hitchhiker and TDT can't). If that's the case, then is there a point in still focusing on developing TDT? Is it meant as just a stepping stone to an even better decision theory (possibly UDT itself) down the line? Or do you believe UDT's advantages to be counterbalanced by disadvantages?
9Eliezer Yudkowsky10y
UDT doesn't handle non-base-level maximization vantage points (previously "epistemic vantage points") for blackmail - you can blackmail a UDT agent because it assumes your strategy is fixed, and doesn't realize you're only blackmailing it because you're simulating it being blackmailable. As currently formulated UDT is also non-naturalistic and assumes the universe is divided into a not-you environment and a UDT algorithm in a Cartesian bubble, which is something TDT is supposed to be better at (though we don't actually have good fill-in for the general-logical-consequence algorithm TDT is supposed to call). I expect the ultimate theory to look more like "TDT modded to handle UDT's class of problems and blackmail and anything else we end up throwing at it" than "UDT modded to be naturalistic and etc", but I could be wrong - others have different intuitions about this.

As currently formulated UDT is also non-naturalistic and assumes the universe is divided into a not-you environment and a UDT algorithm in a Cartesian bubble, which is something TDT is supposed to be better at (though we don't actually have good fill-in for the general-logical-consequence algorithm TDT is supposed to call).

UDT was designed to move away from the kind of Cartesian dualism as represented in AIXI. I don't understand where it's assuming its own Cartesian bubble. Can you explain?

0Eliezer Yudkowsky10y
The version I saw involved a Universe computation which accepts an Agent function and then computes itself, with the Agent makings it choices based on its belief about the Universe? That seemed to me like a pretty clean split.
No, the version we've been discussing for the last several years involves an argumentless Universe function that contains the argumentless Agent function as a part. Agent knows the source code of Agent (via quining) and the source code of Universe, but does not apriori know which part of the Universe is the Agent. The code of Universe might be mixed up so it's hard to pick out copies of Agent. Then Agent tries to prove logical statements of the form "if Agent returns a certain value, then Universe returns a certain value". As you can see, that automatically takes into account the logical correlates of Agent as well.

I find it rather disappointing that the UDT people and the TDT people have seemingly not been communicating very efficiently with each other in the last few years...

6Wei Dai10y
I think what has happened is that most of the LW people working on decision theory in the past few years have been working with different variations on UDT, while Eliezer hasn't participated much in the discussions due to being preoccupied with other projects. It seems understandable that he saw some ideas that somebody was playing with, and thought that everyone was assuming something similar.
Yes. And now, MIRI is planning a decision theory workshop (for September) so that some of this can be hashed out.
I honestly thought we'd been communicating. Posting all our work on LW and all that. Eliezer's comment surprised me. Still not sure how to react...
UDT can be modeled with a Universe computation that takes no arguments.
4Wei Dai10y
I think you must have been looking at someone else's idea. None of the versions of UDT that I've proposed are like this. See my original UDT post for the basic setup, which all of my subsequent proposals share.
1Eliezer Yudkowsky10y
"The answer is, we can view the physical universe as a program that runs S as a subroutine, or more generally, view it as a mathematical object which has S embedded within it." A big computation with embedded discrete copies of S seems to me like a different concept from doing logical updates on a big graph with causal and logical nodes, some of which may correlate to you even if they are not exact copies of you.

The sentence you quoted was just trying to explain how "physical consequences" might be interpreted as "logical consequences" and therefore dealt with within the UDT framework (which doesn't natively have a concept of "physical consequences"). It wasn't meant to suggest that UDT only works if there are discrete copies of S in the universe.

In that same post I also wrote, "A more general class of consequences might be called logical consequences. Consider a program P’ that doesn’t call S, but a different subroutine S’ that’s logically equivalent to S. In other words, S’ always produces the same output as S when given the same input. Due to the logical relationship between S and S’, your choice of output for S must also affect the subsequent execution of P’. Another example of a logical relationship is an S' which always returns the first bit of the output of S when given the same input, or one that returns the same output as S on some subset of inputs."

I guess I didn't explicitly write about parts of the universe that are "correlate to you" as opposed to having more exact logical relationships with you, but given how UDT is supposed to work, it was meant to just handle them naturally. At least I don't see why it wouldn't do so as well as TDT (assuming it had access to your "general-logical-consequence algorithm" which I'm guessing is the same thing as my "math intuition module").

FWIW, as far as I can remember I've always understood this the same way as Wei and cousin_it. (cousin_it was talking about the later logic-based work rather than Wei's original post, but that part of the idea is common between the two systems.) If the universe is a Game of Life automaton initialized with some simple configuration which, when run with unlimited resources and for a very long time, eventually by evolution and natural selection produces a structure that is logically equivalent to the agent's source code, that's sufficient for falling under the purview of the logic-based versions of UDT, and Wei's informal (underspecified) probabilistic version would not even require equivalence. There's nothing Cartesian about UDT.
I'm not so sure about this one... It seems that UDT would be deciding "If blackmailed, pay or don't pay" without knowing whether it actually will be blackmailed yet. Assuming it knows the payoffs the other agent receives, it would reason "If a pay if blackmailed...I get blackmailed, whereas if I don't pay if blackmailed...I don't get blackmailed. I therefore should never pay if blackmailed", unless there's something I'm missing.
I share the intuition that Newcomb's problem might be "unfair" (not a meaningful problem / not worth trying to win at), and have generally found LW/MIRI discussions of decision theory more enlightening when they dealt with other scenarios (like AIs exchanging source code) rather than Newcomb. One way to frame the "unfairness" issue: if you knew in advance that you would encounter something like Newcomb's problem, then it would clearly be beneficial to adopt a decision-making algorithm that (predictably) one-boxes. (Even CDT supports this, if you apply CDT to the decision of what algorithm to adopt, and have the option of adopting an algorithm that binds your future decision.) But why choose to optimize your decision-making algorithm for the possibility that you might encounter something like Newcomb's problem? The answer to the question "What algorithm should I adopt?" depends on what decision problems I am likely to face - why is it a priority to prepare for Newcomb-like problems? Well-defined games (like modal combat) seem to give more traction on this question than a fanciful thought experiment like Newcomb, although perhaps I just haven't read the right pro-one-boxing rejoinder.

You may not expect to encounter Newcomb's problems, but you might expect to encounter prisoner's dilemmas, and CDT recommends defecting on these.

Those who look for evidence will be thrown into the pits of hell, where they will weep and gnash their teeth forever and ever. Amen. And those who have faith will sit in glory in His presence for all time. Hallelujah!
You might be interested in reading TDT chapter 5 "Is Decision-Dependency Fair" if you haven't already.
Are people who think in terms of fairness aware of the connection to the prisoners' dilemma? (which is there from the very beginning, right?)
I think David Lewis was the first to observe this connection, in 1979, 10 years after Nozick's publication of the problem. But the prisioner's dilemma is only Newcomb-like if the two prisoners are psychological twins, i.e. if they use the same decision proceedure and know this about each other. One might object that this is just as unfair as Newcomb's problem. But the objection that Newcomb's is unfair isn't to be confused with the objection that it's unrealistic. I think everybody working on the problem accepts that Newcomb-like situations are practically possible. Unfairness is a different issue.
Nozick mentioned PD. I've always heard it asserted that Newcomb started with PD (eg, here). Oddly, Nozick does not give Newcomb's first name. He talks about the problem on page 1, but waits to page 10 to say that it is the problem in the title of the paper. Someone building a decision theory can equally well say that Newcomb's problem and someone who threatens to duplicate them and make them play PD against their duplicate are equally unfair, but that's not the connection.
Ah, good to know, thanks.
Hmm, that is an interesting objection. Would you be willing to sketch out (or point me to) a response to it?
Well, that depends. It could turn out to be the case that, in reality, for some fixed definition of fair, the universe is unfair. If that were the case, I think at least some of the philosophers who study decision theory would maintain a distinction between ideal rational behavior, whatever that means, and the behavior that, in the universe, consistently results in the highest payoffs. But Eliezer / MIRI is solely interested in the latter. So it depends on what your priorities are.
Well, if this is right... Then we should be able to come up with a Newcomb like problem that specifically punishes TDT agents (off the top of my head, Omega gives an additional 10 million to any agent not using TDT at the end of the box exercise). And if we can come up with such a problem, and EY/MIRI can't respond by calling foul (for the reasons you give), then getting richer on Newcomb isn't a reason to accept TDT.

The "practical" question is whether you in fact expect there to be things in the universe that specifically punish TDT agents. Omega in Newcomb's problem is doing something that plausibly is very general, namely attempting to predict the behavior of other agents: this is plausibly a general thing that agents in the universe do, as opposed to specifically punishing TDT agents.

TDT also isn't perfect; Eliezer has examples of (presumably, in his eyes, fair) problems where it gives the wrong answer (although I haven't worked through them myself).

This seems to be the claim under dispute, and the question of fairness should be distinguished from the claim that Omega is doing something realistic or unrealistic. I think we agree that Newcomb-like situations are practically possible. But it may be that my unfair game is practically possible too, and that in principle no decision theory can come out maximizing utility in every practically possible game. One response might be to say Newcomb's problem is more unfair than the problem of simply choosing between two boxes containing different amounts of money, because Newcomb's distribution of utility makes mention of the decision. Newcomb's is unfair because it goes meta on the decider. My TDT punishing game is much more unfair than Newcomb's because it goes one 'meta' level up from there, making mention of the decision theories. You could argue that even if no decision theory can maximise in every arbitrarily unfair game, there are degrees of unfairness related to the degree to which the problem 'goes meta'. We should just prefer the decision theory that can maximise the at the highest level of unfairness. This could probably be supported by the observation that while all these unfair games are practically possible, the more unfair a game is the less likely we are to encounter it outside of a philosophy paper. You could probably come up with a formalization of unfairness, though it might be tricky to argue that it's relevantly exhaustive and linear. EDIT: (Just a note, you could argue all this without actually granting that my unfair game is practically possible, or that Newcomb's problem is unfair, since the two-boxer will provide those premises.)
A theory that is incapable of dealing with agents that make decisions based on the projected reactions of other players, is worthless in the real world.
However, an agent that makes decisions based on the fact that it perfectly predicts the reactions of other players does not exist in the real world.
Newcomb does not require a perfect predictor.
I know that the numbers in the canonical case work out to .5005 accuracy for the required; within noise of random.
TDT does in fact sketch a fairly detailed model of "what sort of situation is 'fair' for the purpose of this paper", and it explicitly excludes referring to the specific theory that the agent implements. Note that Newcomb did not set out to deliberately punish TDT (would be hard; considering Newcomb predates TDT); so your variation shouldn't either.
I think an easy way to judge between fair and unfair problems is whether you need to label the decision theory. Without a little label saying "TDT" or "CDT", Omega can still punish two-boxers based on the outcome (factual or counterfactual) of their decision theory, regardless of what decision theory they used. How do you penalize TDT, without actually having to say "I'll penalize TDT", based solely on the expected results of the decision theory?
You penalise based on the counterfactual outcome: if they were in Newcomb's problem, this person would choose one box.
Typically by withholding information about the actual payoffs that will be experienced. eg. Tell the agents they are playing Newcomb's problem but don't mention that all millionaires are going to be murdered...
That's a good question. Here's a definition of "fair" aimed at UDT-type thought experiments: The agent has to know what thought experiment they are in as background knowledge, so the universe can only predict their counterfactual actions in situations that are in that thought experiment, and where the agent still has the knowledge of being in the thought experiment. This disallows my anti-oneboxer setup here: (because the predictor is predicting what decision would be made if the agent knew they were in Newcomb's problem, not what decision would be made if the agent knew they were in the anti-oneboxer experiment) but still allows Newcomb's problem, including the transparent box variation, and Parfit's Hitchhiker. I don't think much argument is required to show Newcomb's problem is fair by this definition, the argument would be about deciding to use this definition of fair, rather than one that favours CDT, or one that favours EDT.
Oops. Yes.

The most charitable interpretation would just be that there happened to be a convincing technical theory which said you should two-box, because it took an even more technical theory to explain why you should one-box and this was not constructed, along with the rest of the edifice to explain what one-boxing means in terms of epistemic models, concepts of instrumental rationality, the relation to traditional philosophy's 'free will problem', etcetera. In other words, they simply bad-lucked onto an edifice of persuasive, technical, but ultimately incorrect argument.

We could guess other motives for people to two-box, like memetic pressure for partial counterintuitiveness, but why go to that effort now? Better TDT writeups are on the way, and eventually we'll get to see what the field says about the improved TDT writeups. If it's important to know what other hidden motives might be at work, we'll have a better idea after we negate the usually-stated motive of, "The only good technical theory we have says you should two-box." Perhaps the field will experience a large conversion once presented with a good enough writeup and then we'll know there weren't any other significant motives.

Do you have an ETA on that? All my HPMoR anticipations combined don't equal my desire to see this published and discussed.

August. (I'm writing one.)

This reply confused me at first because it seems to be answering a different (ie. inverted) question to the one asked by the post.
0Eliezer Yudkowsky10y
One-boxing is normal and does not call out for an explanation. :)
If people who aren't crazy in a world that is mad? That certainly calls out for an explanation. In case it is reproducible!
I guess we need a charitable interpretation of "People are crazy, the world is mad"-- people are very much crazier than they theoretically could be (insert discussion of free will). I believe that people do very much more good (defined as life support for people) than harm, based on an argument from principles. If people didn't pour more negentropy into the human race than they take out, entropy would guarantee that the human race would cease to exist. The good that people do for themselves is included in the calculation.
What is the definition of TDT? Google wasn't helpful.
4Eliezer Yudkowsky10y
Timeless decision theory. UDT = Updateless decision theory.
FWIW, when I first read about the problem I took two-boxing to be the obviously correct answer (I wasn't a compatibilist back then), and I didn't change my mind until I read Less Wrong.
Anecdotal evidence amongst people I've questioned falls into two main categories. The 1st is the failure to think the problem through formally. Many simply focus on the fact that whatever is in the box remains in the box. The 2nd is some variation of failure to accept the premise of an accurate prediction of their choice. This actually counter intuitive to most people and for others it is very hard to even casually contemplate a reality in which they can be perfectly predicted (and therefore, in their minds, have no 'free will / soul'). Many conversations simply devolve into 'Omega can't actually make a such an accurate prediction about my choice therefore or I'd normally 2 box so I'm not getting my million anyhow'.

Anecdotally, there are two probability games that convinced me to one-box: The Monty Hall game and playing against the rock-paper-scissors bot at the NY Times.

The RPS bot is a good real world example of how it is theoretically possible to have an AI (or "Omega") who accurately predicts my decisions. The RPS bot predicted my decision about 2 out of 3 times so I don't see any conceptual reason why an even better designed robot/AI would beat me 999/1000 times at RPS. I tried really hard to outsmart the RPS bot and even still I lost more than I won. It was only when I randomized my choices using a hashing algorithm of sorts that I started to win.

The only reason I knew about the RPS game at the NYT was due to participation on Less Wrong, so maybe anecdotes like mine are the reason for the link. I also don't have any emotional attachment to the idea of free will.

So, I should take this as evidence that you're a robot whereas I have authentic, unpredictable free will? In 20 rounds just now, I came out slightly ahead (5 wins, 4 losses, 11 ties).
Yes, because it's impossible for AI to get better than a rudimentary program running on the NYT server.
I committed to sharing results beforehand: First twenty versus Veteran mode: +8 -6 =6. Second twenty: +8 -8 =4. I spent about five seconds thinking between moves. I love RPS, I could easily get addicted to this... ETA: I decided to play ten rounds where I thought really hard about it. I got +4 -0 =6. ETA2: Okay, I'll play twenty rounds thinking less than a second per move...: +7 -8 =5.
11 ties vs 9 non-ties? How odd.

(Relevant. (hint hint commenters you should read this before speculating about the origins of theists' intuitions about newcomb's problem))

Interesting. I didn't know Asimov was a two-boxer.

If you are actually wondering, most Lesswrongers one-box because Eliezer promotes one-boxing. That's it.

This should be taken seriously as as hypothesis. However, it can be broken down a bit:

1. LW readers one-box more because they are more likely to have read strong arguments in favor of one-boxing — namely Eliezer's — than most philosophers are.
2. LW readers one-box more because LW disproportionately attracts or retains people who already had a predilection for one-boxing, because people like to affiliate with those who will confirm their beliefs.
3. LW readers one-box more because they are guessing the teacher's password (or, more generally, parroting a "charismatic leader" or "high-status individual") by copying Eliezer's ideas.

To these I'll add some variants:

4. LW readers one-box more than most atheists because for many atheists, two-boxing is a way of saying that they are serious about their atheism, by denying Omega's godlike predictive ability; but LWers distinguish godlike AI from supernatual gods due to greater familiarity with Singularity ideas (or science fiction).
5. LW readers one-box to identify as (meta)contrarians among atheists / materialists.
6. LW readers one-box because they have absorbed the tribal belief that one-boxing makes you a better person.

The hypothesis that we don't dare take seriously I may as well explicitly state:

7. LW readers one-box more because one-boxing is the right answer.

Good breakdown. #7 is not an explanation unless coupled with a hypothesis on why LW readers are more adept than mainstream philosophers and decision theorists at spotting the right answer on this problem. Unless one claims that LWers just have a generally higher IQ (implausible) an explanation for this would probably go back to #1 or something like it. Personally, I think the answer is a combination of #1, #2, #3. I'm not sure about the relative roles played by each of them (which have a decreasing level of "rationality") but here is an analogy: Suppose you know that there is a controversy between two views A and B in philosophy (or economics, or psychology, or another area which is not a hard science), that University X has in its department a leading proponent of theory A, and that bunch of theorists have clustered around her. It is surely not surprising that there are more A proponents among this group than among the general discipline. As possible explanations, the same factors apply in this general case: we could hypothesize that philosophers in X are exposed to unusually strong arguments for A, or that B-proponents disproportionately go to other universities, or that philosophers in X are slavishly following their leader. I contend that the question about LW is no different in essence from this general one, and that whatever view about the interplay of sociology, memetic theory and rationality you have as your explanation of "many A-ers at X" also should apply for "many 1-boxers at LW".
I suspect 4 is pretty strong. I can't distinguish 1 and 2, but 3 doesn't seem right. People disagree all the time with little regard to whom they're disagreeing with.
In 1, new LW readers start at the base rate for one-boxing, but some two-boxers switch after reading good arguments in favor of one-boxing, which other folks have not read. In 2, new LW readers start at the base rate for one-boxing, but two-boxers are less likely to stick around. 4 seems to explain Asimov's two-boxing; his view seems to be an attempt to counterfactually stick up for free will. 6 seems like 3; it's just attributing the "conversion" to the community's influence at large, rather than to Eliezer's specifically. (Neither 6 nor 3 assumes the arguments here are good ones, which 1 does.)
How do you know?
Outside view it's the most plausible hypothesis (start by asking "who brought this question to the attention of LWers in the first place"...). It explains why LWers hold various other views as well (e.g. regarding many-worlds).
Oh, I agree it's plausible. But Tenoke didn't say "My guess is"; s/he stated, with what reads to me like great confidence and would-be authority, that that is the reason, and I'd like to know whether there's any justification for that beyond "seems like a plausible guess".
Yes, it is just a plausible guess..

I just recently really worked through this, and I'm a firm one-boxer. After a few discussions with two-boxer people, I came to understand why: I consider myself predictable and deterministic. Two-boxers do not.

For me, the idea that Omega can predict my behaviour accurately is pretty much a no-brainer. I already think it possible to upload into digital form and make multiple copies of myself (which are all simultaneously "me"), and running bulk numbers of predictions using simulations seems perfectly reasonable. Two-boxers, on the other hand,... (read more)

I consider myself quite predictable and deterministic, and I'm a two boxer.
Yes, it's clear that the correct strategy in advance, if you thought you were going to encounter Newcomb's problems, is to precommit to one-boxing (but as I mentioned in my comments, at least some two-boxers maintain a distinction between "ideal rational behavior" and "the behavior which, in reality, gives you the highest payoff"). Scott Aaronson goes even further: if people are running simulations of you, then you may have some anthropic uncertainty about whether you're the original or a simulation, so deciding to one-box may in fact cause the simulation to one-box if you yourself are the simulation! You can restore something of the original problem by asking what you should do if you were "dropped into" a Newcomb's problem without having the chance to make precommitments.
For me I think part of the reason I'm so very very quick to commit to one boxing is the low improvement in outcomes from 2 boxing as the problem was presented on the wiki. The wiki lists 1000 vs 1000,000. If I was sitting across from Derren Brown or similar skilled street magician I'd say there's much more than a 1 in a thousand chance that he'd predict that I'd one box. If the problem was stated with a lesser difference, say 1000 vs 5000 I might 2 box in part because a certain payoff is worth more to me than an uncertain one even if the expected return on the gamble is marginally higher.
I don't see how precommitment is relevant, whether you are "real" or a simulation. Omega knows what you will do even if you don't, so why bother precommitting?
Precommitment isn't relevant to Omega, but it is relevant to the person making the decision. It's basically a way of 'agreeing to cooperate' with possible simulations of yourself, in an environment where there's perhaps not as much on the line and it's easier to think rationally about the problem.
What I have never understood is why precommitment to a specific solution is necessary, either as a way of 'agreeing to cooperate' with possible simulations (supposing I posit simulations being involved), or more generally as a way of ensuring that I behave as an instantiation of the decision procedure that maximizes expected value. There are three relevant propositions: A: Predictor predicts I one-box iff I one-box B: Predictor predicts I two-box iff I two-box C: Predictor puts more money in box B than box A iff Predictor predicts I one-box If I am confident that (A and B and C) then my highest-EV strategy is to one-box. If I am the sort of agent who reliably picks the highest-EV strategy (which around here we call a "rational" agent), then I one-box. If A and C are true, then Predictor puts more money in box B. None of that requires any precommitment to figure out. What does precommitment have to do with any of this?
I don't believe that anyone in this chain said that it was 'necessary', and for a strictly rational agent, I don't believe it is. However, I am a person, and am not strictly rational. For me, my mental architecture is such that it relies on caching and precomputed decisions, and decisions made under stress may not be the same as those made in contemplative peace and quiet. Precomputation and precommitment is a way of improving the odds that I will make a particular decision under stress.
I agree that humans aren't strictly rational, and that decisions under stress are less likely to be rational, and that precommitted/rehearsed answers are more likely to arise under stress.
Isn't that a fully general counterargument against doing anything whatsoever in the absence of free will?
Do you mean "a fully general argument against precommitting when dealing with perfect predictors"? I don't see how free will is relevant here, however it is defined.
Person A: I'm about to fight Omega. I hear he's a perfect predictor, but I think if I bulk up enough, I can overwhelm him with strength anyway. He's actually quite weak. Person B: I don't see how strength is relevant. Omega knows what you will do even if you don't, so why bother getting stronger?
Feel free to make your point more explicit. What does this example mean to you?
Saying that Omega already knows what you will do doesn't solve the problem of figuring out what to do. If you don't precommit to one-boxing, your simulation might not one-box, and that would be bad. If you precommit to one-boxing and honor that precommitment, your simulation will one-box, and that is better.
I understand that precommitment can be a good thing in some situations, but I doubt that Newcomb is one of them. There is no way my simulation will do anything different from me if the predictor is perfect. I don't need to precommit to one-box. I can just one-box when the time comes. There is no difference in the outcome.
I don't understand how that's different from precommitting to one-box.
To me the difference is saying that one-boxing maximizes utility vs promising to one-box. In the first case there is no decision made or even guaranteed to be made when the time comes. I might even be thinking that I'd two-box, but change my mind at the last instance.
For the record, when I first really considered the problem, my reasoning was still very similar. It ran approximately as follows: "The more strongly I am able to convince myself to one-box, the higher the probability that any simulations of me would also have one-boxed. Since I am currently able to strongly convince myself to one-box without prior exposure to the problem, it is extremely likely that my simulations would also one-box, therefore it is in our best interests to one-box." Note that I did not run estimated probabilties and tradeoffs based on the sizes of the reward, error probability of Omega, and confidence in my ability to one-box reliably. I am certain that there are combinations of those parameters which would make two-boxing better than one, but I did not do the math.

Theists one-box because they tend to be more willing to accept backwards causation at face value (just guessing, but that's what I'd expect to find). Undergraduates surprise me; I would have expected that more of them would two-box, but I may be overestimating the exposure most of them have to formal decision theory. LW readers one-box because we believe rationalists should win, and because of EYs twelfth virtue of rationality; with every action, aim to cut, not merely to be "rational" or "Bayesian" or any other label. Most arguments fo... (read more)


Is the Predictor omniscient or making a prediction?

A tangent: when I worked at a teen homeless shelter there would sometimes be a choice for clients to get a little something now or more later. Now won every time, later never. Anything close to a bird in hand was valued more than a billion ultra birds not in the hand. A lifetime of being betrayed by adults, or poor future skills, or both and more might be why that happened. Two boxes without any doubt for those guys. As Predictors they would always predict two boxes and be right.

He makes a statement about the future which, when evaluated, is true. What's the difference between accurate predictions and omniscience? On that tangent: WTF? Who creates a system in which they can offer either some help now, or significantly more later, unless they are malicious or running an experiment?

He makes a statement about the future which, when evaluated, is true. What's the difference between accurate predictions and omniscience?"

So when I look at the source code of a program and state "this program will throw a NullPointerException when executed" or "this program will go into endless loop" or "this program will print out 'Hello World'" I'm being omniscient?

Look, I'm not discussing Omega or Newcomb here. Did you just call ME omniscient because in real life I can predict the outcome of simple programs?

You are wrong. There can be a power failure at least one time when that program runs, and you have not identified when those will be.

You are nitpicking. Fine, let's say that Omega is likewise incapable of detecting whether you'll have a heart-attack or be eaten by a pterodactyl. He just knows whether your mind is set on one-boxing or two-boxing.

Did this just remove all your objections about "omniscience" and Newcomb's box, since Omega has now been established to not know if you'll be eaten by a pterodactyl before choosing a box? If so, I suggest we make Omega being incapable of determining death-by-pterodactyl a permanent feature of Omega's character.


WTF? Who creates a system in which they can offer either some help now, or significantly more later, unless they are malicious or running an experiment?

Situations where you can get something now or something better later but not both come up all the time as consequences of growth, investment, logistics, or even just basic availability issues. I expect it would usually make more sense to do this analysis yourself and only offer the option that does more long-term good, but if clients' needs differ and you don't have a good way of estimating, it may make sense to allow them to choose.

Not that it's much of an offer if you can reliably predict the way the vast majority of them will go.

If you can make the offer right now, you don't have capital tied up in growth, investment, or logistics. Particularly since what you have available now doesn't cover the current need - all of it will be taken by somebody.
If the Predictor is accurate or omniscient, then the game is rigged and it becomes a different problem. If the Predictor is making guesses then box predicting and box selecting are both interesting to figure out. Or you live in a system nobody in particular created (capitalism) and work at a social service with limited resources with a clientele who have no background experience with adults who can be trusted. An employer telling hem "work now and I'll pay you later" is not convincing, while a peanut butter sandwich right now is.
How about "Here's a sandwich, if you work for me there I will give you another one at lunchtime and money at the end of the day." It's the case where the immediate reward has to be so much smaller than the delayed reward but still be mutually exclusive that confuses me, not the discounting due to lack of trust.
What does always choosing some now over more later have to do with Newcomb's problem?
Simply stating that box B either contains a million dollars or nothing will make people see the million dollars as more distant than the guaranteed thousand in box A, I imagine. That the probabilities reduce that distance to negligible matters only if the person updates appropriately on that information.

People suck at predicting their actions. I suspect that in a real-life situation even philosophers would one-box. For example, suppose a two-boxer sees a street magician predict people's behavior and consistently punish two-boxers (in some suitable version, like a card trick). Odds are, he will one-box, especially if the punishment for correctly predicted two-boxing is harsh enough. It would be an interesting psychological experiment, if someone could get the funding.

If a philosopher sees a street magician making "predictions" he should be rational enough to see that the street magician isn't engaging in prediction but is cheating.
This is also a perfectly reasonable explanation for Omega's success rate.
If by cheating you mean "removes the reward after you made your choice to two-box", this can be mitigated by having a third party, like philosopher's friend, write down what's in the boxes before the philosopher in question makes his decision. I imagine that in such a situation the number of two-boxers would go down dramatically.
If you really remove the option to cheat then how will the street magician to be able to accurately predict whether the philosopher two-boxes? There are people who might learn with practice to have a high degree of accuracy in classify people as one-boxers or two boxers if they had months of practice but that's not a skillset that your average street magician possesses. Finding a suitable person and then training the person to have that skillset is probably a bigger issue than securing the necessary funds for a trial.
So... if I point out a reasonably easy way to verifiably implement this without cheating, would you agree with my original premise?
I still believe that it will be increadibly hard to set up an experiment that makes an atheist philosopher or an average Lesswrong participant follower think that the predictions are genuine predictions. I don't know how philosphers would react. I one-box the question when posed on a theoretical level. If you would put my in a practical situation against a person who's doing genuine prediction I might try to do some form of occlumency to hide that I'm two-boxing. This is a bit like playing Werewolf. In the round with people with NLP training in which I'm playing Werewolf there are a bunch of people who are good enough to a bunch of people when they play Werewolf with "normal" people. On the other hand in that round with NLP people nearly everyone has good control of his own state and doesn't let other people read them. The last time I played one girl afterwards told me that I was very authentic even when playing a fake role. I'm not exactly sure about the occlumency skills of the average philosophy major but I would guess that there are many philosophy majors who believe that they themselves have decent occlumency skills. As a sidenote any good attempt at finding out whether someone is one-boxing or two-boxing might change whether he's one-boxing or two-boxing.
Unless I've misunderstood this it isn't an adversarial game. you're not trying to trick the predictor if you're one boxing. if anything you want the predictor to know that with as much certainty as possible. wearing your heart on your sleeve is good for you.
He's trying to trick the predictor into thinking that he's going to one-box, but then to actually two-box.
I see now. the first description I came across with this had a huge difference between boxes A and B on the order of 1000 vs 1,000,000. At that level there doesn't seem much point even intending to 2 box, better to let the predictor have his good record as a predictor while I get the million. an improvement of an extra 1000 just isn't convincing. though restated with a smaller difference like 2000 in one box, 1000 in the other and the choice of 2 boxing for 3000 vs 2000 is more appealing.
The easiest way is to have the result be determined by the decision; the magician arranges the scenario such that the money is under box A IFF you select only box A. That is only cheating if you can catch him. The details of how that is done are a trade secret, I'm afraid.
Wheter or not it's cheating doesn't depend on whether you catch him. A smart person will think "The magician is cheating" when faced with a street magician even if he doesn't get the exact trick. I don't know exactly how David Copperfield flies around but I don't think that's he's really can do levitation.
Why doesn't a smart person think that Omega is cheating? What's the difference between the observations one has of Omega and the observations one has of the street magician? By the way, if I think of Omega as equivalent to the street magician, I change to a consistent one-boxer from a much more complicated position.
Because Omega has per definition the ability to predict. Street magicians on the other hand are in the deception business. That means that a smart person has different priors about both classes.
The expected observations are identical in either case, right?
Yes, as long as we can only observe the end result. Priors matter when you have incomplete knowledge and guess the principle that lead to a particular result.
Believing that a particular principle led to an observed result helps make future predictions about that result when the principle that we believe is relevant; If we believe that the street magician is cheating, but he claims to be predicting, is each case in which we see the prediction and result match evidence that he is predicting or evidence that he is cheating? Is it evidence that when our turn comes up, we should one-box, or is it evidence that the players before us are colluding with the magician? If we believe that Omega is a perfect predictor, does that change the direction in which the evidence points? Is it just that we have a much higher prior that everybody we see is colluding with the magician (or that the magician is cheating in some other way) than that everybody is colluding with Omega, or that Omega is cheating? Suppose that the magician is known to be playing with house money, and is getting paid based on how accurately rewards are allocated to contestants (leaving the question open as to whether he is cheating or predicting, but keeping the payoff matrix the same). Is the reasoning for one-boxing for the magician identical to the reasoning for one-boxing for Omega, or is there some key difference that I'm missing?
If a magician is cheating than there a direct causal link between the subject choosing to one-box and the money being in the box. Causality matters for philosophers who analyse Newcomb's problem.
So the magician can only cheat in worlds where causal links happen?
I don't know whether one can meaningfully speak about decision theory for a world without causal links. If your actions don't cause anything how can one decision be better than another?
So, if the magician is cheating there is a causal link between the decision and the contents of the box, and if he isn't there is still a causal link. How is that a difference?
If I'm wet because it rains there a causal link between the two. If I kick a ball and the ball moves there a causal link between me kicking the ball and the ball moving. How's that a difference?
Did you kick the ball because it was raining, or are you wet because you kicked the ball?
Really? I feel like I would be more inclined to two-box in the real life scenario. There will be two physical boxes in front of me that already have money in them (or not). It'll just be me and two boxes whose contents are already fixed. I will really want to just take them both.
Maybe the first time. What will you do the second time?
I was surprised by the more general statement "that in a real-life situation even philosophers would one-box." In the specific example of an iterated Newcomb (or directly observing the results of others) I agree that two-boxers would probably move towards a one-box strategy. The reason for this, at least as far as I can introspect, has to do with the saliency of actually experiencing a Newcomb situation. When reasoning about the problem in the abstract I can easily conclude that one-boxing is the obviously correct answer. However, when I sit and really try to imagine the two boxes sitting in front of me, my model of myself in that situation two-boxes more than the person sitting at his computer. I think a similar effect may be at play when I imagine myself physically present as person after person two-boxes and finds one of the boxes empty. So I think we agree that observe(many two-box failures) --> more likely to one-box. I do think that experiencing the problem as traditionally stated (no iteration or actually watching other people) will have a relationship of observe(two physical boxes, predictor gone) --> more likely to two-box. The second effect is probably weak as I think I would be able to override the impulse to two-box with fairly high probability.
By a "real-life situation" I meant a Newcomb-like problem we routinely face but don't recognize as such, like deciding on the next move in a poker game, or on the next play in a sports game. Whenever I face a situation where my opponent has likely precommitted to a course of action based on their knowledge of me, and I have reliable empirical evidence of that knowledge, and betting against such evidence carries both risks and rewards, I am in a Newcomb situation.
I don't see how those are Newcomb situations at all. When I try to come up with an example of a Newcomb-like sports situation (eg football since plays are preselected and revealed simultaneously more or less) I get something like the following: 1. you have two plays A and B (one-box, two-box) 2. the opposing coach has two plays X and Y 3. if the opposing coach predicts you will select A they will select X and if they predict you will select B they will select Y. 4. A vs X results in a moderate gain for you. A vs Y results in no gain for you. B vs Y results in a small gain for you. B vs X results in a large gain for you. 5. You both know all this. The problem lies in the 3rd assumption. Why would the opposing coach ever select play X? Symmetrically, if Omega was actually competing against you and trying to minimize your winnings why would it ever put a million dollars in the second box. Newcomb's works, in part, due to Omega's willingness to select a dominated strategy in order to mess with you. What real-life situation involves an opponent like that?
Newcomb's problem does happen (and has happened) in real life. Also, omega is trying to maximize his stake rather than minimize yours; he made a bet with alpha with much higher stakes than the $1,000,000. Not to mention newcomb's problem bears some vital semblance to the prisoners' dilemma, which occurs in real life.
1Eliezer Yudkowsky10y
And Parfit's Hitchhiker scenarios, and blackmail attempts, not to mention voting.
Sure, I didn't mean to imply that there were literally zero situations that could be described as Newcomb-like (though I think that particular example is a questionable fit). I just think they are extremely rare (particularly in a competitive context such as poker or sports). edit: That example is more like a prisoner's dilemma where Kate gets to decide her move after seeing Joe's. Agree that Newcomb's definitely has similarities with the relatively common PD.
Oddly enough, that problem is also solved better by a time-variable agent: Joe proposes sincerely, being an agent who would never back out of a commitment of this level. If his marriage turns out poorly enough, Joe, while remaining the same agent that used to wouldn't back out, backs out. And the prisoners' dilemma as it is written cannot occur in real life, because it requires no further interaction between the agents.
If I have even a little bit of reason to believe the problem is newcomboid (like, I saw it make two or three successful predictions, and no unsuccessful ones, or I know the omega would face bad consequences for predicting wrongly (even just in terms of reputation), or I know the omega studied me well), I'd one box in real-life too.
Well, I am referring specifically to an instinctive/emotional impulse driven by the heavily ingrained belief that money does not appear or disappear from closed boxes. If you don't experience that impulse or will always be able to override it then yes, one-boxing in real life would be just as easy as in the abstract. As per my above response to shminux, I think this effect would be diminished and eventually reversed after personally observing enough successful predictions.
I agree, if the accuracy was high and there was a chance for learning. It would also be interesting to ask those who favor two-boxing how they think their views would evolve if they repreatedly experienced such situations. Some may find they are not reflectively consistent on the point.
Right, good point about revealed reflective inconsistency. I'd guess that repeated experiments would probably turn any two-boxer into a one-boxer pretty quickly, if the person actually cares about the payoff, not about making a point, like Asimov supposedly would, as quoted by William Craig in this essay pointed out by Will Newsome. And those who'd rather make a point than make money can be weeded out by punishing predicted two-boxing sufficiently harshly.
This isn't an argument (at least not a direct argument) that one-boxing is more rational. Two-boxers grant that one-boxing gets you more money. They just say that sometimes, a situation might arise that punishes rational decision making. And they may well agree that in such a situation, they would one box irrationally. The question isn't 'what would you do?', the question is 'what is it rational to do?' You might reply that what's rational is just what gets you the most money, but that's precisely the point that's up for dispute. If you assume that rationality is just whatever makes you richer, you beg the question.
I'm not sure they do.

Are the various people actually being presented with the same problem? It makes a difference if the predictor is described as a skilled human rather than as a near omniscient entity.

The method of making the prediction is important. It is unlikely that a mere human without computational assistance could simulate someone in sufficient detail to reliably make one boxing the best option. But since the human predictor knows that the people he is asking to choose also realize this he still might maintain high accuracy by always predicting two boxing.

edit: grammar

3Eliezer Yudkowsky10y
(Plausible, but then the mere human should have a low accuracy / discrimination rate. You can't have this and a high accuracy rate at the same time. Also in practice there are plenty of one-boxers out there.)
But if you're playing against a mere human, it is in your interest to make your behavior easy to predict, so that your Omega can feel confident in oneboxing you. (Thus, social signalling) This is one of the rare cases where evolution penalizes complexity.
Social signalling doesn't make one easier to accurately predict. Costly signalling and precommitment costs might, but everyone rational would implement a free signal that made the judge more likely to put the money in the box- regardless of their actual intent.
If it was free, it wouldn't make the judge more likely to put the money in the box. Unless the judge was really bad at his job.
What if it was sunk cost? Should that convince a judge? What if the precommitment cost is lower than the difference between the high reward and the low reward? Should that convince a judge? Where does social signalling actually help to make the decision?
I think the idea is that, given an assumption of having a fairly typical mind, the signal is supposed to be unlikely if one is not precommitted to whatever one is signalling allegiance to. Though honestly, I have no idea how you'd convincingly signal that you're following TDT. Evolution did not prepare me for that situation! :)
If the judge knows that you are trying to convince him, then there should be nothing you can do which convinces him short of committing to a penalty cost if you take a different action (which is the same as changing the payoff matrix); If I manage to commit to giving $1500 to a charity that I hate (e.g. Westboro) if I take both boxes, and communicate that commitment to the judge, then I can convince the judge that I will take one box. I don't have to convince him of my decision process, only of my actions.
Saying you'll two-box does make it easier to predict... Cue the slow clap on the people who say they'll two-box (of course they only say that since they discount the possibility that this will ever actually happen).
Don't you believe us? I do discount the possibility that the impossible version will happen, but not the possibility that a near-analogue will happen. I withhold my judgement on near-analogues until and unless I have sufficient information to estimate the results.

More formal logic and philosophy training -> a greater chance to over-think it and think explicitly about decision theory (and possibly even have loyalties to particular rigid theories) rather than just doing what gets you more money? A case of thinking too deeply about the matter just leading a large fraction of people into confusion?

A case of thinking too deeply about the matter just leading a large fraction of people into confusion?

Confusion doesn't explain the firm directionality of the trend.

(and possibly even have loyalties to particular rigid theories)

This doesn't explain which one gets dominance, although it allows some amplification of noise or bias caused by other factors.

I think the "think explicitly about decision theory" part was supposed to indicate the direction, since CDT as been the leader.
At the time you are asked to make the decision, taking both boxes gets you more money than taking one box does. People who take one box take $1,000,000 instead of $1,001,000; people who take two boxes take $1000 instead of $0.
Um, huh? I don't enjoy word games, but what does "instead of" mean here?
The one-boxers had a choice between $1m and $1m+1k; the two boxers had a choice between $0 and $1k. The "instead of" refers to their reward if they had done the opposite of what it had already been predicted that they do.
The problem statement is assuming a perfect predictor, though, so that 'instead of' clause is mostly noise.
Yeah, it designates the counterfactual; what they didn't take.
It's not referring to a possible state of reality.
In which case there aren't possible states of reality; only exemplified and counterfactual states.

My guess is that a large part of the divergence relates to the fact that LWers and philosophers are focused on different questions. Philosophers (two-boxing philosophers, at least) are focused on the question of which decision "wins" whereas LWers are focused on the question of which theory "wins" (or, at least, this is what it seems to me that a large group of LWers is doing, more on which soon).

So philosophical proponents of CDT will almost all (all, in my experience) agree that it is rational if choosing a decision theory to follow t... (read more)

3Eliezer Yudkowsky10y
From the standpoint of reflective consistency, there should not be a divergence between rational decisions and rational algorithms; the rational algorithm should search for and output the rational decision, and the rational decision should be to adopt the rational algorithm. Suppose you regard Newcomb's Problem as rewarding an agent with a certain decision-type, namely the sort of agent who one-boxes. TDT can be viewed as an algorithm which searches a space of decision-types and always decides to have the decision-type such that this decision-type has the maximal payoff. (UDT and other extensions of TDT can be viewed as maximizing over spaces broader than decision-types, such as sensory-info-dependent strategies or (in blackmail) maximization vantage points). Once you have an elegant theory which does this, and once you realize that a rational algorithm can just as easily maximize over its own decision-type as the physical consequences of its acts, there is just no reason to regard two-boxing as a winning decision or winning action in any sense, nor regard yourself as needing to occupy a meta-level vantage point in which you maximize over theories. This seems akin to precommitment, and precommitment means dynamic inconsistency means reflective inconsistency. Trying to maximize over theories means you have not found the single theory which directly maximizes without any recursion or metaness, and that means your theory is not maximizing the right thing. Claiming that TDTers are maximizing over decision theories, then, is very much a CDT standpoint which is not at all how someone who sees logical decision theories as natural would describe it. From our perspective we are just picking the winning algorithm output (be the sort of agent who picks one box) in one shot, and without any retreat to a meta-level. The output of the winning algorithm is the winning decision, that's what makes the winning algorithm winning.
Yes. Which is to say, clearly you fall into the second class of people (those who have studied decision theory a lot) and hence my explanation was not meant to apply to you. Which isn't to say I agree with everything you say. Decisions can have different causal impacts to decision theories and so there seems to be no reason to accept this claim. Insofar as the rational decision is the decision which wins which depends on the causal effects of the decision and the rational algorithm is the algorithm which wins which depends on the causal effects of the algorithm then there seems to be no reason to think these should coincide. Plus, I like being able to draw distinctions that can't be drawn using your terminology. Agreed (if you are faced with a decision of which algorithm to follow). Of course, this is not the decision that you're faced with in NP (and adding more options is just to deny the hypothetical) Yes, and I think this is an impressive achievement and I find TDT/UDT to be elegant, useful theories. The fact that I make the distinction between rational theories and rational decisions does not mean I cannot value the answers to both questions. Well...perhaps. Obviously just because you can maximise over algorithms, it doesn't follow that you can't still talk about maximising over causal consequences. So either we have a (boring) semantic debate about what we mean by "decisions" or a debate about practicality: that is, the argument would be that talk about maximising over algorithms is clearly more useful than talk about maximising over causal consequences so why care about the second of these. For the most part, I buy this argument about practicality (but it doesn't mean that two-boxing philosophers are wrong, just that they're playing a game that both you and I feel little concern for). I know what all these phrases mean but don't know why it follows that your theory is not maximising the "right" thing. Perhaps it is not maximising a thing that you find t

Well...perhaps. Obviously just because you can maximise over algorithms, it doesn't follow that you can't still talk about maximising over causal consequences. So either we have a (boring) semantic debate about what we mean by "decisions" or a debate about practicality: that is, the argument would be that talk about maximising over algorithms is clearly more useful than talk about maximising over causal consequences so why care about the second of these.

No, my point is that TDT, as a theory, maximizes over a space of decisions, not a space of algorithms, and in holding TDT to be rational, I am not merely holding it to occupy the most rational point in the space of algorithms, but saying that on its target problem class, TDT's output is indeed always the most rational decision within the space of decisions. I simply don't believe that it's particularly rational to maximize over only the physical consequences of an act in a problem where the payoff is determined significantly by logical consequences of your algorithm's output, such as Omega's prediction of your output, or cohorts who will decide similarly to you. Your algorithm can choose to have any sort of decision-... (read more)

Interesting. I have a better grasp of what you're saying now (or maybe not what you're saying, but why someone might think that what you are saying is true). Rapid responses to information that needs digesting are unhelpful so I have nothing further to say for now (though I still think my original post goes some way to explaining the opinions of those on LW that haven't thought in detail about decision theory: a focus on algorithm rather than decisions means that people think one-boxing is rational even if they don't agree with your claims about focusing on logical rather than causal consequences [and for these people, the disagreement with CDT is only apparent]). ETA: On the CDT bit, which I can comment on, I think you overstate how "increasingly contorted" the CDTers "redefinitions of winning" are. They focus on whether the decision has the best causal consequences. This is hardly contorted (it's fairly straightforward) and doesn't seem to be much of a redefinition: if you're focusing on "winning decisions" as the CDTer does (rather than "winning agents") it seems to me that the causal consequences are the most natural way of separating out the part of the agent's winning relates to the decision from the parts that relate to the agent more generally. As a definition of a winning decision, I think the definition used on LW is more revisionary than the CDTers definition (as a definition of winning algorithm or agent, the definition on LW seems natural but as a way of separating out the part of the agent's winning that relate to the decision, logical consequences seems far more revisionary). In other words, everyone agrees what winning means. What people disagree about is when we can attribute the winningness to the decision rather than to some other factor and I think the CDTer takes the natural line here (which isn't to say they're right but I think the accusations of "contorted" definitions are unreasonable).
4Eliezer Yudkowsky10y
If agents whose decision-type is always the decision with the best physical consequences ignoring logical consequences, don't end up rich, then it seems to me to require a good deal of contortion to redefine the "winning decision" as "the decision with the best physical consequences", and in particular you must suppose that Omega is unfairly punishing rationalists even though Omega has no care for your algorithm apart from the decision it outputs, etc. I think that to believe that the Prisoner's Dilemma against your clone or Parfit's Hitchhiker or voting are 'unfair' situations requires explicit philosophical training, and most naive respondents would just think that the winning decision was the one corresponding to the giant heap of money on a problem where the scenario doesn't care about your algorithm apart from its output.
To clarify: everyone should agree that the winning agent is the one with the giant heap of money on the table. The question is how we attribute parts of that winning to the decision rather than other aspects of the agent (because this is the game the CDTers are playing and you said you think they are playing the game wrong, not just playing the wrong game). CDTers use the following means to attribute winning to the decision: they attribute the winning that is caused by the decision. This may be wrong and there may be room to demonstrate that this is the case but it seems unreasonable to me to describe it as "contorted" (it's actually quite a straightforward way to attribute the winning to the decision) and I think that using such descriptions skews the debate in an unreasonable way. This is basically just a repetition of my previous point so perhaps further reiteration is not of any use to either of us... In terms of NP being "unfair", we need to be clear about what the CDTer means by this (using the word "unfair" makes it sound like the CDTer is just closing their eyes and crying). On the basic level, though, the CDTer simply mean that the agent's winning in this case isn't entirely determined by the winning that can be attributed to the decision and hence that the agent's winning is not a good guide to what decision wins. More specifically, the claim is that the agent's winning is determined in part by things that are correlated with the agent's decision but which aren't attributable to the agent's decision and so the agent's overall winning in this case is a bad guide to determining which decision wins. Obviously you would disagree with the claims they're making but this is different to claiming that CDTers think NP is unfair in some more everyday sense (where it seems absurd to think that Omega is being unfair because Omega cares only about what decision you are going to make). I don't necessarily think the CDTers are right but I don't think the way you outlin
So to summarise. On LW the story is often told as follows: CDTers don't care about winning (at least not in any natural sense) and they avoid the problems raised by NP by saying the scenario is unfair. This makes the CDTer sound not just wrong but also so foolish it's hard to understand why the CDTer exists. But expanded to show what the CDT actually means, this becomes: CDTers agree that winning is what matters to rationality but because they're interested in rational decisions they are interested in what winning can be attributed to decisions. Specifically, they say that winning can be attributed to a decision if it was caused by that decision. In response to NP, the CDTer notes that the agent's overall winning is not a good guide to the winning decision as in this case, the agent's winning it also determined by factors other than their decisions (that is, the winning cannot be attributed to the agent's decision). Further, because the agent's winnings correlate with their decisions, even though it can't be attributed to their decisions, the case can be particularly misleading when trying to determine the winning decisions. Now this second view may be both false and may be playing the wrong game but it at least gives the CDTer a fair hearing in a way that the first view doesn't.
In Newcomb the outcome "pick two boxes, get $1.001M" is not in the outcome space, unless you fight the hypothetical, so the properly restricted CDT one-boxes. In the payoff matrix [1000, 0; 1001000, 1000000] the off-diagonal cases are inconsistent with the statement that Omega is a perfect predictor, so if you take them into account, you are not solving Newcomb, but some other problem where Omega is imperfect with unknown probability. Once the off-diagonal outcomes are removed, CDT trivially agrees with EDT.
First, removal of those scenarios is inconsistent with CDT as it is normally interpreted: CDT evaluates the utility of an act by the expected outcome of an exogenous choice being set without dependence on past causes, i.e. what would happen if a force from some unanticipated outside context came in and forced you to one-box or two-box, regardless of what you would otherwise have done. It doesn't matter if the counterfactual computed in this way is unphysical, at least without changing the theory. Second, to avoid wrangling over this, many presentations add small or epsilon error rates (e.g. the Predictor flips a weighted coin to determine whether to predict accurately or inaccurately, and is accurate 99% of the time, or 99.999999% of the time). What's your take with that adjustment?
Are you saying that the "CDT as it is normally interpreted" cannot help but fight the hypothetical? Then the Newcomb problem with a perfect predictor is not one where such CDT can be applied at all, it's simply not in the CDT domain. Or you can interpret CDT as dealing with the possible outcomes only, and happily use it to one-box. In the second case, first, you assume the existence of the limit if you extrapolate from imperfect to perfect predictor, which is a non-trivial mathematical assumption of continuity and is not guaranteed to hold in general (for example, a circle, no matter low large, is never topologically equivalent to a line). That notwithstanding, CDT does take probabilities into account, at least the CDT as described in Wikipedia, so the question is, what is the counterfactual probability that if I were to two-box, then I get $1.001M, as opposed to the conditional probability of the same thing. The latter is very low, the former has to be evaluated on some grounds. The standard two-boxer reasoning is that Unpacking this logic, I conclude that "even if the prediction is for the player to take only B, then taking both boxes yields $1,001,000, and taking only B yields only $1,000,000—taking both boxes is still better" means assigning equal conterfactual probability to both outcomes, which goes against the problem setup, as it discards the available information ("it does not matter what omega did, the past is past, let's pick the dominant strategy"). This also highlights the discontinuity preventing one from taking this "information-discarding CDT" limit. This is similar to the information-discarding EDT deciding to not smoke in the smoking lesion problem.
The standard CDT algorithm computes the value of each action by computing the expected utility conditional on a miraculous intervention changing one's decision to that action, separately from early deterministic causes, and computing the causal consequences of that. See Anna's discussion here, including modifications in which the miraculous intervention changes other things, like one's earlier dispositions (perhaps before the Predictor scanned you) or the output of one's algorithm (instantiated in you and the Predictor's model). Say before the contents of the boxes are revealed our CDTer assigns some probability p to the state of the world where box B is full and his internal makeup will deterministically lead him to one-box, and probability (1-p) to the state of the world where box B is empty and that his internal makeup will deterministically lead him to two-box. Altering your action miraculously and exogenously would not change the box contents causally. So the CDTer uses the old probabilities for the box contents, the utility of one-boxing is computed to be $1,000,000 times p, and the utility of two boxing is calculated to be $1,001,000p+$1,000 times (1-p). If she is confident that she will apply CDT based on past experience, or introspection, she will have previously updated to thinking that p is very low.
Right, I forgot. The reasoning is "I'm a two-boxer because I follow a loser's logic and Omega knows it, so I may as well two-box." There is no anticipation of winning $1,001,000. No, that does not sound quite right...
The last bit about p going low with introspection isn't necessary. The conclusion (two-boxing preferred, or at best indifference between one-boxing and two-boxing if one is certain one will two-box) follows under CDT with the usual counterfactuals for any value of p. The reasoning is "well, if the world is such that I am going to two-box, then I should two-box, and if the world is such that I am going to one-box, then I should two-box" Optional extension: "hmm, sounds like I'll be two-boxing then, alas! No million dollars for me..." (Unless I wind up changing my mind or the like, which keeps p above 0).
CDT doesn't assign credences to outcomes in the way you are suggesting. One way to think about it is as follows: Basically CDT says that you should use your prior probability in a state (not an outcome) and update this probability only in those cases where the decision being considered causally influences the state. So whatever prior credence you had in the "box contains $M" state, given that the decision doesn't causally influence the box contents, you should have that same credence regardless of decision and same for the other state. There are so many different ways of outlining CDT that I don't intend to discuss why the above account doesn't describe each of these versions of CDT but some equivalent answer to that above will apply to all such accounts.
How can one simultaneously * consider it rational, when choosing a decision theory, to pick one that tells you to one-box; and * be a proponent of CDT, a decision theory that tells you to two-box? It seems to me that this is possible only for those who (1) actually think one can't or shouldn't choose a decision theory (c.f. some responses to Pascal's wager) and/or (2) think it reasonable to be a proponent of a theory it would be irrational to choose. Those both seem a bit odd. [EDITED to replace some "you"s with "one"s and similar locutions, to clarify that I'm not accusing PhilosophyStudent of being in that position.]
We need to distinguish two meanings of "being a proponent of CDT". If by "be a proponent of CDT" we mean, "think CDT describes the rational decision" then the answer is simply that the CDTer thinks that rational decisions relate to the causal impact of decisions and rational algorithms relate to the causal impact of algorithms and so there's no reason to think that the rational decision must be endorsed by the rational algorithm (as we are considering different causal impacts in the two cases). If by "be a proponent of CDT" we mean "think we should decide according to CDT in all scenarios including NP" then we definitely have a problem but no smart person should be a proponent of CDT in this way (all CDTers should have decided to become one-boxers if they have the capacity to do so because CDT itself entails that this is the best decision)
I think this elides distinctions too quickly. You can describe things this way. This description in hand, what does one do if dropped into NP (the scan has already been made, the boxes filled or not)? Go with the action dictated by algorithm and collect the million, or the lone action and collect the thousand? Are you thinking of something like hiring a hitman to shoot you unless you one-box, so that the payoffs don't match NP? Or of changing your beliefs about what you should do in NP? For the former, convenient ways of avoiding the problem aren't necessarily available, and one can ask why the paraphernalia are needed when no one is stopping you from just one-boxing. For the latter, I'd need a bit more clarification.
This comment was only meant to suggest how it was internally consistent for a CDTer to: In other words, I was not trying here to offer a defence of a view (or even an outline of my view) but merely to show why it is that the CDTer can hold both of these things without inconsistency. I'm thinking about changing your dispositions to decide. How one might do that will depend on their capabilities (for myself, I have some capacity to resolutely commit to later actions without changing my beliefs about the rationality of that decision). For some agents, this may well not be possible.
You didn't, quite. CDT favors modifying to one-box on all problems where there is causal influence from your physical decision to make the change. So it favors one-boxing on Newcomb with a Predictor who predicts by scanning you after the change, but two-boxing with respect to earlier causal entanglements, or logical/algorithmic similarities. In the terminology of this post CDT (counterfactuals over acts) attempts to replace itself with counterfactuals over earlier innards at the time of replacement, not counterfactuals over algorithms.
Yes. So it is consistent for a CDTer to believe that: (1) When picking a decision theory, you should pick one that tells you to one-box in instances of NP where the prediction has not yet occurred; and (2) CDT correctly describes two-boxing as the rational decision in NP. I committed the sin of brevity in order to save time (LW is kind of a guilty pleasure rather than something I actually have the time to be doing).
OK, that's all good, but already part of the standard picture and leaves almost all the arguments intact over cases one didn't get to precommit for, which is the standard presentation in any case. So I'd say it doesn't much support the earlier claim: Also: No pressure.
Perhaps my earlier claim was too strong. Nevertheless, I do think that people on LW who haven't thought about the issues a lot might well not have a solid enough opinion to be either agreeing or disagreeing with the LW one-boxing view or the two-boxing philosopher's view. I suspect some of these people just note that one-boxing is the best algorithm and think that this means that they're agreeing with LW when in fact this leaves them neutral on the issue until they make their claim more precise. I also think one of the reasons for the lack of two-boxers on LW is that LW often presents two-boxing arguments in a slogan form which fails to do justice to these arguments (see my comments here and here). Which isn't to say that the two-boxers are right but is to say I think the debate gets skewed unreasonably in one-boxers' favour on LW (not always, but often enough to influence people's opinions).

Other 142 / 217 (65.4%)

What is this "Other" that's so popular that more than half of the people choose it?

I reported the "coarse" results which lumps together "don't know," "insufficiently familiar to answer," and several others. Here are the fine data for undergraduates: Newcomb's problem: one box or two boxes? Insufficiently familiar with the issue 114 / 217 (52.5%) Lean toward: one box 22 / 217 (10.1%) Accept: one box 18 / 217 (8.3%) Lean toward: two boxes 18 / 217 (8.3%) Accept: two boxes 17 / 217 (7.8%) Skip 13 / 217 (6.0%) Agnostic/undecided 9 / 217 (4.1%) There is no fact of the matter 2 / 217 (0.9%) Accept an intermediate view 2 / 217 (0.9%) Reject both 2 / 217 (0.9%)
The most charitable interpretation for that would be ‘not sure’.
It allows for brave rebellion against the establishment without the need to actually state what you're rebelling for.
No, it's mainly people who weren't familiar enough with the problem to have a view.

Newcomb's paradox (and Newcomb's paradox variants) get's discussed a lot here. But nothing from the poll indicates that kind of background knowledge is present among those polled. in fact, the opposite appears to be indicated, based on this link:

Newcomb's problem: one box or two boxes?

Insufficiently familiar with the issue 1254 / 3226 (38.9%)

That seems to be the problem with the greatest posted amount of insufficient familiarity among any of the po... (read more)

The faculty and especially decision theorists polled report much higher familiarity.

The obvious guess is that theists are more comfortable imagining their decisions to be, at least in principle, completely predictable and not "fight the hypothetical". Perhaps atheists are more likely to think they can trick omega because they are not familiar and comfortable with the idea of a magic mind reader so they don't tend to properly integrate the stipulation that omega is always right.

To me, the fact that I have been told to assume that I believe the Predictor seems extremely relevant. If we assume that I am able to believe that, then it would likely be the single most important fact that I had ever observed, and to say that it would cause a significant update on my beliefs regarding causality would be an understatement. On the basis that I would have strong reason to believe that causality could flow backwards, I would likely choose the one box.

If you tell me that somehow, I still also believe that causality always flows forward with r... (read more)

The standard formulation to sidestep that is that the Predictor treats choosing a mixed strategy as two-boxing.
My initial reaction is to find that aggravating and to try to come up with another experiment that would allow me to poke at the universe by exploiting the Predictor, but it seems likely that this too would be sidestepped using the same tactic. So we could generalize to say that any experiment you come up with that involves the Predictor and gives evidence regarding the temporal direction of causation will be sidestepped so as to give you no new information. But intuitively, it seems like this condition itself gives new information in the paradox, yet I haven't yet wrapped my head around what evidence can be drawn from it. On another note, even if causality flows always forward, it is possible that humans might be insufficiently affected by nondeterministic phenomena to produce significantly nondeterministic behavior, at least at the time scale we're talking about. If that is the case, then it could potentially be the case that human reasoning has approximate t-symmetry over short time scales, and that this can be exploited to "violate causality" with respect to humans without actually violating causality with respect to the universe at large. Which means that I have a more general hypothesis, "human reasoning causality can be violated" for which the violation of causality in general would be strong evidence, but the non-violation of causality would only be weak counter-evidence. And in learning of the Predictor's success, I have observed evidence strongly recommending this hypothesis. So upon further consideration, I think that one-boxing is probably the way to go regardless, and it must simply be accepted that if you actually observe the Predictor, you can no longer rely on CDT if you know that such an entity might be involved. The only part of the paradox that still bugs me then is the hand-waving that goes into "assume you believe the Predictor's claims". It is actually hard for me to imagine what evidence I could observe for that which would both clearly d

One of my aversions to Newcomb generalizes to loads of hypotheticals - don't tell me what I think or prefer in the hypothetical. Tell me my observations, and leave the conclusions and inferences to me.

Is it about determinism?

LessWrongian: The whole universe (multiverse) is deterministic. I am deterministic. Therefore, Omega can predict me.
Philosopher/Atheist: I have a free will. Free will beats determinism! Therefore, Omega can't predict me.
Theist: God is omnipotent and omniscient. God beats everything! Therefore, God/Omega can predict me.

Or about plausibility of a higher intelligence?

Theist: I believe in God. Therefore, I believe God/Omega can be so smarter than me.
Atheist/Philosopher: I don't believe in God. Therefore, I don't believe God/Omega can be... (read more)

People who believe in non-deterministic free will are more likely to one-box than those who don't, in the philpapers survey. This is consistent with the data, although I don't know that I'd put a lot of weight on it (and the libertarianism correlation above might come from the bundling of libertarianism about will with religion).

It may be that two-boxers perceive the key issue as the (im)possibility of backwards causation. However, Wheeler's delayed choice experiment demonstrates what seems to me to be backwards causation. Because backwards causation is not categorically impossible, I'm a one-boxer.

This is a bad reason to one-box. First, there is no backward causation in Wheeler's delayed choice, unless you accept a Bohm-spirit interpretation of QM (that photons (counterf)actually travel as particles), because you do not measurably affect the past from the future. Second, no backward causation is required for one-boxing to make sense, only that Omega knows you better than you do yourself.
It seems to me that there is backward causation under the decoherence interpretation, as the world we inhabit is affected by the experimental set-up (there's either a diffraction pattern on the back screen characteristic of a wave, or a pattern characteristic of a single slit). I really think people tend to overestimate the latitude that exists among the various quantum interpretations. They are just interpretations, after all. I don't think that Omega knowing a person better than they know themselves is sufficient to explain the 100% accuracy of Omega's prediction.

I answered in the 99% confidence bracket. The intuition pump that got me there is talking about submitting computer programs that output a decision, rather than simply making a decision. Omega gets to look at the program that makes the decision before filling the boxes. It's obvious in this situation - you submit a program that one-boxes, open that box, and get the million dollars (since Omega knows that you are going to one-box, since it's in the computer code).

Now, the real Newcomb problem has people making the decision, not code. But, if you assume dete... (read more)

All three are likely less familiar with the arguments in favor of two-boxing, relative to their familiarity with arguments for one-boxing, than faculty/atheists [theists are very tightly concentrated in philosophy of religion]/philosophers.

I really with they hadn't given an "Other" option to those questions (there are plenty of questions where having an "other" option makes sense; but this isn't one - when you're faced with the problem, either you take one box or take them both).

You could believe the problem isn't well-formed. I think the game theorist Ken Binmore believes this. Or you could believe the problem is underspecified and that adding details (e.g. how Omega comes up with the prediction) could affect whether one-boxing or two-boxing is appropriate.
Or you walk away and choose neither... Refusing to choose is itself a choice. (Although in this case, it is a choice which everyone agrees is strictly worse than the other two.)
I've had people choose only Box A because "screw you, that's why."

Newcomb's problem isn't about decision theory, it's about magic and strange causation. Replace the magician with a human agent and one-boxing isn't nearly as beneficial anymore- even when the human's accuracy is very high.

Less Wrongers publicly consider one-boxing the correct answer because it's non-obvious and correct for the very limited problem where decisions can be predicted in advance, just like we (taken as a whole) pretend that we cooperate on one-shot prisoner's dilemma.

People in other areas are more likely to believe other things about the magic involved (for example, that free will exists in a meaningful form), and therefore have different opinions about what the optimal answer is.


Newcomb's problem isn't about decision theory...

Well, it was first introduced into philosophical literature by Nozick explicitly as a challenge to the principle of dominance in traditional decision theories. So, it's probably about decision theory at least a little bit.

From the context, I would presume "about" in the sense of "this is why it's fascinating to the people who make a big deal about it". (I realise the stated reason for LW interest is the scenario of an AI whose source code is known to Omega having to make a decision, but the people being fascinated are humans.)
Given that your source code is known to Omega, your decision cannot be 'made'.
Yes it can.
Perhaps it would sound better: Once a deterministic method of making a determination (along with all of the data that method will take into account) are set, it cannot be reasonably said that a decision is being made. A Customer Service Representative that follows company policy regardless of the outcome isn't making decisions, he's abdicating the decision-making to someone else. It's probable that free will doesn't exist, in which case decisions don't exist and agenthood is an illusion; that would be consistent with the line of thinking which has produced the most accurate observations to date. I will continue to act as though I am an agent, because on the off chance I have a choice it is the choice that I want.
Oddly enough, those are about programming. There's nothing in there that is advice to robots about what decisions to make.
It is all about robots -- deterministic machines -- performing activities that everyone unproblematically calls "making decisions". According to what you mean by "decision", they are inherently incapable of doing any such thing. Robots, in your view, cannot be "agents"; a similar Google search shows that no-one who works with robots has any problem describing them as agents. So, what do you mean by "decision" and "agenthood"? You seem to mean something ontologically primitive that no purely material entity can have; and so you conclude that if materialism is true, nothing at all has these things. Is that your view?
It would be better to say that materialism being true has the prerequisite of determinism being true, in which case "decisions" do not have the properties we're crossing on.
Still not true. The prediction capability of other agents in the same universe does not make the decisions made by an agent into not-decisions. (This is a common confusion that often leads to bad decision-theoretic claims.)
If free will is not the case, there are no agents (anymore?) If it is the case that the universe in the past might lead to an agent making one of two or more decisions, then free will is the case and perfect prediction is impossible; if it is not the case that an entity can take any one of two or more actions, then free will is not the case and perfect prediction is possible. Note that it is possible for free will to exist but for me to not be one of the agents. Sometimes I lose sleep over that.
A starting point.
The scale does not decide the weight of the load.
A sufficiently intelligent and informed AI existing in the orbit of Alpha Centauri but in no way interacting with any other agent (in the present or future) does not by its very existence remove the capability of every agent in the galaxy to make decisions. That would be a ridiculous way to carve reality.
The characteristic of the universe that allows or prevents the existence of such an AI is what is being carved.
Can you clarify what you mean by "agent"?
One of the necessary properties of an agent is that it makes decisions.
I infer from context that free will is necessary to make decisions on your model... confirm?
Yeah, the making of a decision (as opposed to a calculation) and the influence of free will are coincident.
OK, thanks for clarifying your position. A couple of further assumptions... 1) I assume that what's actually necessary for "agency" on your account is that I'm the sort of system whose actions cannot be deterministically predicted, not merely that I have not been predicted... creating Predictor doesn't eliminate my "agency," it merely demonstrates that I never had any such thing, and destroying Predictor doesn't somehow provide me with or restore my "agency". 2) I assume that true randomness doesn't suffice for "agency" on your account... that Schrodinger's Cat doesn't involve an "agent" who "decides" to do anything in particular, even though it can't be deterministically predicted. Yes? So, OK. Assuming all of that: Suppose Sam performs three actions: (A1) climbs to the roof of a high building, (A2) steps off the edge, and (A3) accelerates toward the ground. Suppose further that A1-A3 were predictable, and therefore on your account not "decisions." Is there any useful distinction to be made between A1, A2, and A3? For example, predicting A3 only requires a knowledge of ballistics, whereas predicting A1 and A2 require more than that. Would you classify them differently on those grounds?
If I was classifying things based on how well a given predictor could predict them, I'd give all three events numbers within a range; I suspect that A1 and A2 would be less predictable for most predictors (but more predictable for the class of predictors which can see a short distance into the future, since they happen sooner). If I was classifying things based on the upper limit of how accurately they could be predicted, I'd give them all the same value, but I would give an action which I consider a decision or the outcome of a decision which has not been made yet a different value. 2: I don't deny the possibility that there is an agent involved in anything nondeterministic; I think it is very unlikely that unstable atoms are (or contain) agents, but the world would probably look identical to me either way. It's also possible that things which appear deterministic are in fact determined by agents with a value function entirely foreign to me; again, the world would look the same to me if there was one or more "gravity agents" that pulled everything toward everything. That postulate has a prior so low that I don't think 'epsilon' adequately describes it, and I have no reports of the evidence which would support it but not the standard theory of gravitation (Winwardium Leviosaa working, for example). It's not possible to confirm an infinite number of accurate predictions, and any event which has happened as predicted only a finite number of times (e.g. a number of times equal to the age of the universe in plank time) is not proof that it can always be accurately predicted. * * Just to be sure,I do not believe that this dragon in my garage exists. I also think that it's more likely that I don't exist as a magician with the power to do something that matter in general does not do. It's just that the expected utility of believing that the future is mutable (that I can affect things) is higher than the expected utility of believing that the state of th
Thanks for the clarification. I wasn't asking whether your probability of an agent being involved in, say, unstable atom decay was zero. I was just trying to confirm that the mere fact of indeterminacy did not suffice to earn something the label "agent" on your account. That is, confirm that an agent being involved in unstable atom decay was not a certainty on your account. Which I guess you've confirmed. Thanks. I agree that infinite confidence in a prediction is impossible.
Did you mean that there was an upper bound less than 1 on the proper confidence of any nontrivial prediction? That's contrary to materialism, isn't it?
Yes. Trivial ones, too. And no, not as far as I can tell, merely consistent with the existence of error rates. For that matter, I would also say that infinite confidence in a non-prediction is impossible. That is, I'm pretty damned sure I have toenails, but my confidence that I have toenails is not infinite.
What do you suppose that upper bound is?
If I generate a statement at the same confidence level as "I have toenails" every day for a century, I'd be unsurprised to get a few wrong just because my brain glitches every once in a while, I'd be surprised if I got as many as ten wrong, and I'd be only slightly surprised to get them all right. So call that .99998 confidence. Which in practice I refer to as certainty. Of course, better-designed brains are capable of higher confidence than that. What's your confidence that you have toenails?
Is there anything that anyone can be more certain about than your belief that you have toenails, or is .99998 the upper bound for confidence in any prediction? My confidence that I have toenails is more certain than my confidence that there is no accurate claim of a confidence of exactly 1.
Not at all. For example, as I already said, better-designed brains are capable of higher confidence than that. There may also be other classes of statements for which even my brain is capable of higher confidence, though off-hand I'm not sure what they might be... perception and recognition of concrete familiar objects is pretty basic. Thinking about it now, I suppose the implication of ownership adds some unnecessary complexity and correspondingly lowers MTBF; my confidence in "there are toenails on that foot" might be higher... maybe even as much as an order of magnitude higher. Then again, maybe not... we're really playing down at the level of organic brain failure here, so the semantic content may not matter at all. (nods) Mine, too. What's your confidence that you have toenails?
You can get pretty darn high confidences with negation and conjunctions. I can say with great confidence that I am not a 15 story tall Triceratops with glowing red eyes, and I can say with even greater confidence that I am not a 15 story tall Triceratops with glowing red eyes who is active in the feminist movement.
(Incidentally, now you have me wondering how "Linda is a Triceratops and a bank teller" would work in the classic conjunction fallacy example.) So, as a matter of pure logic, you're of course correct... but in this particular context, I'm not sure. As I say, once I get down to the 5-9s level, I'm really talking about brain failures, and those can affect the machinery that evaluates negations and conjunctions as readily as they can anything else (perhaps more so, I dunno). If I made a statement in which I have as much confidence as I do in "I am not a 15 story tall Triceratops with glowing red eyes" every day for a hundred years, would I expect to get them all correct? I guess so, yes. So, agreed, it's higher than .99998. A thousand years? Geez. No, I'd expect to screw up at least once. So, OK, call it .999999 confidence instead for that class. What about "I am not a 15 story tall Triceratops with glowing red eyes who is active in the feminist movement"? Yeesh. I dunno. I don't think I have .9999999 confidence in tautologies.
Within noise of 1. I couldn't list things that I am that certain of for long enough to expect one of them to be wrong, and I'm bad in general at dealing with probabilities outside of [0.05,0.95] In one of the ancestors, I asked if there was an upper limit <1 which represented an upper bound on the maximum permissible accurate confidence in something. (e.g. some number 0<x<1 such that confidence always fell into either (1-x, x) or [1-x, x].
I'm happy to say "within noise of 1" (aka "one minus epsilon") is the upper limit for maximum permissible accurate confidence. Does that count as an answer to your question?
What you said is an answer, but the manner in which you said it indicates that it isn't the answer you intend. I'm asking if there is a lower bound above zero for epsilon, and you just said yes, but you didn't put a number on it.
I didn't, it's true. I don't know any way to put a number to it; for any given mind, I expect there's an upper limit to how confident that mind can be about anything, but that upper limit increases with how well-designed the mind is, and I have no idea what the upper limit is to how well-designed a mind can be, and I don't know how to estimate the level of confidence an unspecified mind can have in that sort of proposition (though as at least one data point, a mind basically as fallible as mine but implementing error-checking algorithms can increase that maximum by many orders of magnitude). I'd initially assumed that meant I couldn't answer your question, but when you gave me "within noise of 1" as an answer for your confidence about toenails that suggested that you considered that an acceptable answer to questions about confidence levels, and it was an accurate answer to your question about confidence levels as well, so I gave it.
So... you wouldn't be able to tell the difference between an epsilon>0 and an epsilon =>0?
I'm not sure how I could tell the difference between two upper bounds of confidence at all. I mean, it's not like I test them in practice. I similarly can't tell whether the maximum speed of my car is 120 mph or 150 mph; I've never driven above 110. But, to answer your question... nope, I wouldn't be able to tell.
So... hrm. How do I tell whether something is a decision or not?
By the causal chain that goes into it. Does it involve modeling the problem and considering values and things like that?
So if a programmable thermostat turns the heat on when the temperature drops below 72 degrees F, whether that's a decision or not depends on whether its internal structure is a model of the "does the heat go on?" problem, whether its set-point is a value to consider, and so forth. Perhaps reasonable people can disagree on that, and perhaps they can't, but in any case if I turn the heat on when the temperature drops below 72 degrees F most reasonable people would agree that my brain has models and values and so forth, and therefore that I have made a decision. (nods) OK, that's fair. I can live with that.
The thermostat doesn't model the problem. The engineer who designed the thermostat modeled the problem, and the thermostat's gauge is a physical manifestation of the engineer's model. It's in the same sense that I don't decide to be hungry - I just am. ETA: Dangit, I could use a sandwich.
Combining that assertion with your earlier one, I get the claim that the thermostat's turning the heat on is a decision, since the causal chain that goes into it involves modeling the problem, but it isn't the thermostat's decision, but rather the designer's decision. Or, well, partially the designer's. Presumably, since I set the thermostat's set-point, it's similarly not the thermostat's values which the causal chain involves, but mine. So it's a decision being made collectively by me and the engineer, I guess. Perhaps some other agents, depending on what "things like that" subsumes. This seems like an odd way to talk about the situation, but not a fatally odd way.
I felt a weird sort of validation when I saw that Theists tend to 1box more than Atheists, and I think you pretty much nailed why. Theists are more likely to believe that omniscience is possible, so it isn't surprising that less theists believe they can beat Omega. I haven't studied the literature on free will well enough to know the terms; I noticed that distribution of beliefs on free will were given in the post, and suspect that if I was up to speed on the terminology that would affect my confidence in my model of why people 1box/2box quite a lot. For now, I'm just noticing that all the arguments in favor of 2boxing that I've read seem to come down to refusal to believe that Omega can be a perfect predictor. But like I said, I'm not well studied on the literature and might not be saying anything meaningful.
That's hits what I meant pretty much on the head. If Omega is a perfect predictor, then it is meaningless to say that the human is making a choice.

Theists actually understand that god is going to predict you correctly, non-LW atheists can't take the "god" idea seriously anymore and don't really model playing newcomb vs. god, and LWers are really good at writing great self-congratulatory just so stories like this one.

1Eliezer Yudkowsky10y
I remark once again that Newcomb is just the unfortunately contrived entry point into Prisoner's Dilemma, Parfit's Hitchhiker, blackmail, and voting, which are all "Newcomblike problems".
Oh, and Parfit's Hitchhiker highlights that the concept of honor is a layman's version of reflective consistency: you tell the driver that you are a (wo)man of your word, because you truly are, for decision-theoretical reasons, as well as because you were brought up this way.
So, my hypothesis predicts that theists will not do better on "Newcomblike problems" not involving deities.
I've always had trouble with this part. I went through the reasoning that Newcomb is two PDs side by side, but this side-by-sideness seems to kill the essential part of PD, its unpredictability. Newcomb is perfectly deterministic, whereas in PD you depend on what the other party will do and often hope that they are reflectively consistent. The one-shot counterfactual mugging is again different from one-shot PD, even if one is reflectively consistent.

New to LessWrong?