jackmastermind — LessWrong

Every Major LLM Endorses Newcomb One-Boxing

In the same survey, decision theorists are as convinced of compatibilism as they are of two-boxing:

https://survey2020.philpeople.org/survey/results/4838?aos=1399

Compatibilism would hold that Omega can indeed be a perfect or near-perfect predictor. For those unfamiliar, compatibilism is the belief that we live in a determinstic world, but that doesn't mean that we aren't free. For instance, classical compatibilism holds that "freedom is nothing more than an agent’s ability to do what she wishes in the absence of impediments that would otherwise stand in her way," (source) even if what an agent wishes is entirely causally determined by the physics in her brain. Even determinists can accept that agents can do things that they want to do!

The sort of free will that implies independence from a predictor would be "libertarian" free will, which only 4% of decision theorists believe in. That 4% cannot explain the overwhelming majority in favor of two-boxing.

Every Major LLM Endorses Newcomb One-Boxing

jackmastermind5mo37

The rationale for two-boxing that Nozick describes in the original paper has nothing to do with the predictor being wrong. It says that even if the predictor is right, you should still two-box.

Omega has already put either $1M or $0 in Box B. It's sitting right there.
If Omega put $1M, then I can one-box for $1M or two-box for $1M + $1,000. Therefore I should two-box.
If Omega put $0, then I can one-box for $0 or two-box for $0 + $1,000. Therefore I should two-box.

The point isn't that Omega is a faulty predictor. The point is that even if Omega is an awesome predictor, then what you do now can't magically fill or empty box B. Two-boxers of this type would love to precommit to one-boxing ahead of time, since that would causally change Omega's prediction. They just don't think it's rational to one-box after the prediction has already been made.

Maybe I'm just rehashing what you said in your edit :) In any case, I think that is the most sympathetic argument for two-boxing if you take the premise seriously. I still think it's wrong (I'm a dedicated one-boxer), but I don't think the error is believing that the predictor is mistaken.

Every Major LLM Endorses Newcomb One-Boxing

jackmastermind5mo10

If you think many philosophers are idiots, how do you explain the fact that in the PhilPapers survey, decision theorists are substantially more likely to endorse two-boxing than other philosophers, while those most supportive of one-boxing are aesthetic philosophers and Ancient Greek/Roman philosophers? I'm not suggesting that decision theorists are more or less intelligent than aesthetic philosophers, but if your theory is correct why is it that the class of philosophers that is more mathematically rigorous in analyzing decision theory scenarios is most convinced that one-boxing is wrong? [edit: typo.]

Every Major LLM Endorses Newcomb One-Boxing

jackmastermind5mo30

I agree! But on that hypothesis, I do find it surprising that not one of them mentions timeless/updateless/functional decision theory. I didn't cherry-pick these, but I suppose that could have been a fluke—I think I saw Gemini reference FDT once when I was messing around in AI Studio before.

A good future test for me will be to generate a bunch of Newcomblike problems, try to phrase them in less-familiar language, and see if they still reason the same way, as well as prompting to make the scenario feel like it has more real-world stakes.

FDT Does Not Endorse Itself in Asymmetric Games

jackmastermind5mo60

I see. I suppose you'd do this by creating a policy node that is subjunctively upstream of every individual FDT decision, and intervening on that. The possible values would be every combination of FDT decisions, and you'd calculate updateless expected value over them.

This seems to work, though I'll think on it some more. I'm a little disappointed that this isn't the formulation of FDT in the paper, since that feels like a pretty critical distinction. But in any case, I should have read more carefully, so that's on me. Thank you for bringing that up! Your comment is now linked in the introduction :)

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments