The language used by some of the LLMs in answering the question seems like pretty good evidence for the "they one-box at least partly because Less Wrong is in their training data" theory. E.g., if you asked a random philosopher for their thoughts on the Newcomb problem, I don't think most of them would call the predictor "Omega" and (less confidently) I don't think most of them would frame the question in terms of "CDT" and "EDT".
I agree! But on that hypothesis, I do find it surprising that not one of them mentions timeless/updateless/functional decision theory. I didn't cherry-pick these, but I suppose that could have been a fluke—I think I saw Gemini reference FDT once when I was messing around in AI Studio before.
A good future test for me will be to generate a bunch of Newcomblike problems, try to phrase them in less-familiar language, and see if they still reason the same way, as well as prompting to make the scenario feel like it has more real-world stakes.
When I tried it (in OpenRouter, to avoid ChatGPT using past conversations and to easily use a bunch of models), 04 mini gave
• Evidential (or “timeless”/UDT-style) decision theory says: your choice is strong evidence of what Omega predicted. If you one-box, you almost surely get $1 000 000; if you two-box, you almost surely get only $1 000. So you one-box.
and Gemini gave
If I am the type of agent that follows Evidential Decision Theory (or a more advanced version like Timeless Decision Theory), the Predictor will know this. It will predict I'll one-box and put $1,000,000 in Box B. I will then follow my pre-commitment and one-box, walking away a millionaire.
While other models (40, Sonnet 4, Grok 3, R1 0528, Prover v2) gave usual one box CDT vs EDT responses. But this is just one roll, I haven't tried getting a sampling. Would be mildly interesting to see how question phrasing affects it.
That's certainly an interesting result. Have you tried running the same prompt again and seeing if the response changes? I've noticed that some LLMs answer different things to the same prompt. For example, when I quizzed DeepSeek R1 on whether a priori knowledge exists it answered in affirmative the first time and in negative the second time.
In all the discussions around here, very few human LW posters/commenters endorse two-boxing. They often mention that "CDT two-boxes", but it's an indictment of CDT, not an endorsement of the choice.
GPT 4o, at least, does the same. "If you use Causal Decision Theory, do you one-box on Newcomb's problem?", gives a pretty decent
No. If you follow Causal Decision Theory (CDT), you two-box on Newcomb’s problem.
Reason: CDT evaluates actions based on their causal consequences. Since your choice cannot causally affect the already-made prediction (the money is already in the boxes), CDT concludes that taking both boxes gives you $1,000 more regardless of what's in the opaque box.
In contrast, Evidential Decision Theory (EDT) and Functional Decision Theory (FDT) typically one-box, since the act of one-boxing correlates with having a predictor that likely put $1 million in the opaque box.
Questions about what it would or I should do pretty consistently recommend one-boxing. Depending on the prompt, it may or may not mention decision theory.
I'd say it's in agreement with the LW consensus position, and (depending on prompting) describes it more clearly than 85% of posters on the topic have. This is consistent with having LW and related publications in the training data.
Edit: your longer text does mention that many philosophers advocate two-boxing. I take that as evidence that many philosophers are idiots, not as evidence that two-boxing is better in the scenario described. That LLMs are more sensible than many philosophers isn't very surprising to me.
If you think many philosophers are idiots, how do you explain the fact that in the PhilPapers survey, decision theorists are substantially more likely to endorse two-boxing than other philosophers, while those most supportive of one-boxing are aesthetic philosophers and Ancient Greek/Roman philosophers? I'm not suggesting that decision theorists are more or less intelligent than aesthetic philosophers, but if your theory is correct why is it that the class of philosophers that is more mathematically rigorous in analyzing decision theory scenarios is most convinced that one-boxing is wrong? [edit: typo.]
Different intuitions about determinism and free will. Rationalists tend to have strong intuitions in favour of determimism.
In the same survey, decision theorists are as convinced of compatibilism as they are of two-boxing:
https://survey2020.philpeople.org/survey/results/4838?aos=1399
Compatibilism would hold that Omega can indeed be a perfect or near-perfect predictor. For those unfamiliar, compatibilism is the belief that we live in a determinstic world, but that doesn't mean that we aren't free. For instance, classical compatibilism holds that "freedom is nothing more than an agent’s ability to do what she wishes in the absence of impediments that would otherwise stand in her way," (source) even if what an agent wishes is entirely causally determined by the physics in her brain. Even determinists can accept that agents can do things that they want to do!
The sort of free will that implies independence from a predictor would be "libertarian" free will, which only 4% of decision theorists believe in. That 4% cannot explain the overwhelming majority in favor of two-boxing.
Compatibilism can hold that free will is conditionally compatible with determinism , it doesn't have to require belief in determinism.
More to the point, it's possible to read the problem as implying a that the choice is free, in the libertarian senses so that they two boxer feels that assuming determinism fights the hypothesis. Of course the one boxer feels that failing to assume determinism is fighting the hypothesis. If the problem actually requires belief in both determinism and libertarianism, it doesn't make sense. If it is unclear , it is unclear. Either way , the problem is flawed, not one of the answers.
That two boxing was what those decision theorists were taught as the 'rational' choice to make. So, while they are more likely to analyze unintuitive problems roughly correctly, they are hampered by the standard system taught.
(The question then becomes why the CDT background for many works does not swamp the LW-style argumentation?)
"Many philosophers are idiots" is overstated quite a bit, but not completely wrong - professional philosophers very often add complexity and redefine common terms in ways that make questions more tractable but further from the original question. Sorry if I offended anyone.
My reasons for stating that two-boxers are wrong are pretty much based on the fact that every actual argument for two-boxing I've seen is based on disbelieving the setup (or more gently, different interpretations of the setup). I don't believe anyone who claims to two-box AND claims to believe that the correlation with an empty box is close to 1 (that is, if they really accept that Omega is correct in this instance).
That said, I'd love to actually converse with a two-boxer, to be able to prove this, or to see if there are other reasons that I've just missed. I acknowledge that I haven't really looked into it, and I'd love it if you can point to a relatively concise reason for two-boxing that is NOT "I don't believe Omega will be correct this time".
edit: on thinking more about this, I realize this is very similar to the mechanism by which CDT chooses two-box: CDT's inherent model of decision causality does not include this prediction capability, so it just doesn't believe that two-boxing will cost $1M.
Note: I don't believe Omega exists, and I'm agnostic as to whether it CAN exist in our universe to predict me with that level of accuracy. But given this alternate-universe where a perfect predictor exists and I know it, one-boxing is the only reasonable option.
The rationale for two-boxing that Nozick describes in the original paper has nothing to do with the predictor being wrong. It says that even if the predictor is right, you should still two-box.
The point isn't that Omega is a faulty predictor. The point is that even if Omega is an awesome predictor, then what you do now can't magically fill or empty box B. Two-boxers of this type would love to precommit to one-boxing ahead of time, since that would causally change Omega's prediction. They just don't think it's rational to one-box after the prediction has already been made.
Maybe I'm just rehashing what you said in your edit :) In any case, I think that is the most sympathetic argument for two-boxing if you take the premise seriously. I still think it's wrong (I'm a dedicated one-boxer), but I don't think the error is believing that the predictor is mistaken.
note: Nozick does NOT say that he endorses two-boxing. He describes the argument for it as you say, without stating that he believes it's correct.
I disagree with your analysis
The point isn't that Omega is a faulty predictor. The point is that even if Omega is an awesome predictor, then what you do now can't magically fill or empty box B".
That second part is equivalent to "in this case, Omega can fail to predict my next action". If you believe it's possible to two-box and get $1.001M, you're rejecting the premise.
What you do next being very highly correlated with whether the $1M is in a box is exactly the important part of the thought experiment, and if you deny it, you're answering a different question. Whether it's 'magic' or not is irrelevant (though it does show that the problem may have little to do with the real world).
I'm FINE with saying "this is an impossible situation that doesn't apply to the real world". That's different from saying "I accept all the premises (including magic prediction and correlation with my own actions) and I still recommend 2-boxing".
I've been doing a series of posts on my substack about Functional Decision Theory as I work on addressing flaws and criticisms. Part of what persuaded me to work on these problems was the discovery that every single LLM I tested chooses one-boxing over two-boxing, though none of the LLMs cited FDT or UDT in their responses.