## LESSWRONGLW

Caspar Oesterheld

# Wiki Contributions

>We mentioned both.

Did you, though? Besides Roko's basilisk, the references to acausal trade seem vague, but to me they sound like the kinds that could easily make things worse. In particular, you don't explicitly discuss superrationality, right?

>Finally, while it might have been a good idea initially to treat Roko's basilisk as an information hazard to be ignored, that is no longer possible so the marginal cost of mentioning it seems tiny.

I agree that due to how widespread the idea of Roko's basilisk is, it overall matters relatively little whether this idea is mentioned, but I think this applies similarly in both directions.

I agree that some notions of free will imply that Newcomb's problem is impossible to set up. But if one of these notion is what is meant, then the premise of Newcomb's problem is that these notions are false, right?

It also happens that I disagree with these notions as being relevant to what free will is.

Anyway, if this had been discussed in the original post, I wouldn't have complained.

What's the reasoning behind mentioning the fairly controversial, often deemed dangerous Roko's basilisk over less risky forms of acausal trade (like superrational cooperation with human-aligned branches)?

Free will is a controversial, confusing term that, I suspect, different people take to mean different things. I think to most readers (including me) it is unclear what exactly the Case 1 versus 2 distinction means. (What physical property of the world differs between the two worlds? Maybe you mean not having free will to mean something very mundane, similar to how I don't have free will about whether to fly to Venus tomorrow because it's just not physically possible for me to fly to Venus, so I have to "decide" not to fly to Venus?)

I generally think that free will is not so relevant in Newcomb's problem. It seems that whether there is some entity somewhere in the world that can predict what I'm doing shouldn't make a difference for whether I have free will or not, at least if this entity isn't revealing its predictions to me before I choose. (I think this is also the consensus on this forum and in the philosophy literature on Newcomb's problem.)

>CDT believers only see the second decision. The key here is realising there are two decisions.

Free will aside, as far as I understand, your position is basically in line with what most causal decision theorists believe: You should two-box, but you should commit to one-boxing if you can do so before your brain is scanned. Is that right? (I can give some references to discussions of discussions of CDT and commitment if you're interested.)

If so, how do you feel about the various arguments that people have made against CDT? For example, what would you do in the following scenario?

>Two boxes, B1 and B2, are on offer. You may purchase one or none of the boxes but not both. Each of the two boxes costs $1. Yesterday, Omega put$3 in each box that she predicted you would not acquire. Omega's predictions are accurate with probability 0.75.

In this scenario, CDT always recommends buying a box, which seems like a bad idea because from the perspective of the seller of the boxes, they profit when you buy from them.

>TDT believers only see the first decision, [...] The key here is realising there are two decisions.

I think proponents of TDT and especially Updateless Decision Theory and friends are fully aware of this possible "two-decisions" perspective. (Though typically Newcomb's problem is described as only having one of the two decision points, namely the second.) They propose that the correct way to make the second decision (after the brain scan) is to take the perspective of the first decision (or similar). Of course, one could debate whether this move is valid and this has been discussed (e.g., here, here, or here).

Also: Note that evidential decision theorists would argue that you should one-box in the second decision (after the brain scan) for reasons unrelated to the first-decision perspective. In fact, I think that most proponents of TDT/UDT/... would agree with this reasoning also, i.e., even if it weren't for the "first decision" perspective, they'd still favor one-boxing. (To really get the first decision/second decision conflict you need cases like counterfactual mugging.)

That's interesting, but I don't give it much weight. A lot of things that are close to Monty Fall are in GPT's training data. In particular, I believe that many introductions to the Monty Hall problem discuss versions of Monty Fall quite explicitly. Most reasonable introductions to Monty Hall discuss that what makes the problem work is that Monty Hall opens a door according to specific rules and not uniformly at random. Also, even humans (famously) get questions related to Monty Hall wrong. If you talk to a randomly sampled human and they happen to get questions related to Monty Hall right, you'd probably conclude (or at least strongly update towards thinking that) they've been exposed to explanations of the problem before (not that they solved it all correct on the spot). So to me the likely way in which LLMs get Monty Fall (or Monty Hall) right is that they learn to better match it onto their training data. Of course, that is progress. But it's (to me) not very impressive/important. Obviously, it would be very impressive if it got any of these problems right if they had been thoroughly excluded from its training data.

To me Bing Chat actually seems worse/less impressive (e.g., more likely to give incorrect or irrelevant answers) than ChatGPT, so I'm a bit surprised. Am I the only one that feels this way? I've mostly tried the two systems on somewhat different kinds of prompts, though. (For example, I've tried (with little success) to use Bing Chat instead of Google Search.) Presumably some of this is related to the fine-tuning being worse for Bing? I also wonder whether the fact that Bing Chat is hooked up to search in a somewhat transparent way makes it seem less impressive. On many questions it's "just" copy-and-pasting key terms of the question into a search engine and summarizing the top result. Anyway, obviously I've not done any rigorous testing...

There's a Math Stack Exchange question: "Conjectures that have been disproved with extremely large counterexamples?" Maybe some of the examples in the answers over there would count? For example, there's Euler's sum of powers conjecture, which only has large counterexamples (for high k), found via ~brute force search.

>Imagine trying to do physics without being able to say things like, "Imagine we have a 1kg frictionless ball...", mathematics without being able to entertain the truth of a proposition that may be false or divide a problem into cases and philosophy without being allowed to do thought experiments. Counterfactuals are such a basic concept that it makes sense to believe that they - or something very much like them - are a primitive.

In my mind, there's quite some difference between all these different types of counterfactuals. For example, consider the counterfactual question, "What would have happened if Lee Harvey Oswald hadn't shot Kennedy?" I think the meaning of this counterfactual is kind of like the meaning of the word "chair".
- For one, I don't think this counterfactual is very precisely defined. What exactly are we asked to imagine? A world that is like ours, except the laws of physics in Oswalds gun where temporarily suspended to save JFK's life? (Similarly, it is not exactly clear what counts as a chair (or to what extent) and what doesn't.)
- Second, it seems that the users of the English language all have roughly the same understanding of what the meaning of the counterfactual is, to the extent that we can use it to communicate effectively. For example, if I say, "if LHO hadn't shot JFK, US GDP today would be a bit higher than it is in fact", then you might understand that to mean that I think JFK had good economic policies, or that people were generally influenced negatively by the news of his death, or the like. (Maybe a more specific example: "If it hadn't suddenly started to rain, I would have been on time." This is a counterfactual, but it communicates things about the real world, such as: I didn't just get lost in thought this morning.) (Similarly, when you tell me to get a "chair" from the neighboring room, I will typically do what you want me to do, namely to bring a chair.)
- Third, because it is used for communication, some notions of counterfactuals are more useful than others, because they are better for transferring information between people. At the same time, usefulness as a metric still leaves enough open to make it practically and theoretically impossible to identify a unique optimal notion of counterfactuals. (Again, this is very similar to a concept like "chair". It is objectively useful to have a word for chairs. But it's not clear whether it's more useful for "chair" to include or exclude .)
- Fourth, adopting whatever notion of counterfactual we adopt for this purpose has no normative force outside of communication -- they don't interact with our decision theory or anything. For example, causal counterfactuals as advocated by causal decision theorists are kind of similar to the "If LHO hadn't shot JFK" counterfactuals. (E.g., both are happy to consider literally impossible worlds.) As you probably know, I'm partial to evidential decision theory. So I don't think these causal counterfactuals should ultimately be the guide of our decisions. Nevertheless, I'm as happy as anyone to adopt the linguistic conventions related to "if LHO hadn't shot JFK"-type questions. I don't try to reinterpret the counterfactual question as a conditional one. (Note that answers to, "how would you update on the fact that JFK survived the assassination?", would be very different from answers to the counterfactual question. ("I've been lied to all my life. The history books are all wrong.") But other conditionals could come much closer.) (Similarly, using the word "chair" in the conventional way doesn't commit one to any course of action. In principle, Alice might use the term "chair" normally, but never sit on chairs, or only sit on green chairs, or never think about the chair concept outside of communication, etc.)

So in particular, the meaning of counterfactual claims about JFK's survival don't seem necessarily very related to the counterfactuals used in decision making. (The question, "what would happen if I don't post this comment?" that I asked myself prior to posting this comment.)

In math, meanwhile, people seem to consider counterfactuals mainly for proofs by contradiction, i.e., to prove that the claims are contrary to fact. Cf. https://en.wikipedia.org/wiki/Principle_of_explosion , which makes it difficult to use the regular rules of logic to talk about counterfactuals.

Do you agree or disagree with this (i.e., with the claim that these different uses of counterfactuals aren't very closely connected)?