Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I've previously argued that the concept of counterfactuals can only be understood from within the counterfactual perspective.

I will be awarding a $1000 prize for the best post that engages with the idea that counterfactuals may be circular in this sense. The winning entry may be one of the following (these categories aren't intended to be exclusive):

a) A post that attempts to draw out the consequences of this principle for decision theory

b) A post that attempts to evaluate the arguments for and against adopting the principle that counterfactuals only make sense from within the counterfactual perspective

c) A review of relevant literature in philosophy or decision theory

d) A post that restates already existing ideas in a clearer or more accessible manner (I don't think this topic has been explored much on LW, but it may have in explored in the literature on decision theory or philosophy)

Feel free to ask me for clarification about what would be on or off-topic. Probably the main thing I'd like to see is substantial engagement with this principle. The bounty is for posts that engage with the notion that counterfactuals might only make sense from within a counterfactual perspective. I have written on this topic, but the competition isn't limited to posts that engage with my views on this topic. It's perfectly fine to engage with other arguments for this proposition if, for example, you find someone arguing in favour of this in the philosophical/mathematical literature or Less Wrong.

If someone submits a high-quality post that only touches on this issue tangentially, but someone else submits an only okayish post that tries to deeply engage with this issue, then I would likely award it to the latter as I'm trying to incentivise more engagement with this issue rather than just high-quality posts generally. If the bounty is awarded to an unexpected submission, I expect this to be the main cause.

I will be awarding an additional $100 for the best short-form post on this topic. This may be a LW Shortform post, a public Facebook post, a Twitter thread, ect (I'm not going to include Discord/Slack messages as they aren't accessible).

Why do I believe in this principle?

Roughly, my reasons are as follows:

  1. Rejecting David Lewis' Counterfactual Realism as absurd and therefore concluding that counterfactuals must be at least partially a human construction: either a) in the sense of them being an inevitable and essential part of how we make sense of the world by our very nature or b) in the sense of being a semi-arbitrary and contingent system that we've adopted in order to navigate the world
  2. Insofar as counterfactuals are inherently a part of how we interpret the world, the only way that we can understand them is to "look out through them", notice what we see, and attempt to characterise this as precisely as possible
  3. Insofar as counterfactuals are a somewhat arbitrary and contingent system constructed in order to navigate the world, the way that the system is justified is by imagining adopting various mental frameworks and noticing that a particular framework seems like it would be useful over a wide variety of circumstances. However, we've just invoked counterfactuals twice: a) by imagining adopting different mental frameworks b) by imagining different circumstances over which to evaluate these frameworks[1].
  4. In either case, we seem to be unable to characterise counterfactuals without depending on already having the concept of counterfactuals. Or at least, I find this argument persuasive.

Why do I believe this is important?

I've argued for the importance of agent meta-foundations before. Roughly, there seems to be a lot of confusion about what counterfactuals are and how to construct them. I believe that much of this confusion would be cleared up if we can sort out some of these foundational issues. And the claim that counterfactuals can only be understood from an interior perspective is one such issue.

Why am I posting this bounty?

I believe in this idea, but:

  1. I haven't been able to dedicate nearly as much to time exploring this as I would like in between all of my other commitments
  2. Working on this approach just by myself is kind of lonely and extremely challenging (for example, it's hard to get good quality feedback)
  3. I suspect that more people would be persuaded that this was a fruitful approach if this principle was presented to them in a different light.

How do I submit my entry?

Make a post on LW or the Alignment forum, then add a link in the comments below. I guess I'm also open to private submissions. Ideally, you should mention that you're submitting your post for the bounty just to make sure that I'm aware of it.

When do I need to submit by?

I'm currently planning to set the submission window to 3 months from the date of this post (that would be the 1st of April, but let's make it April 2nd so people don't think this competition is some kind of prank). Submissions after this date may be refused.

How will this be judged?

I've written on this topic myself, so this probably biases me in some ways, but $1000 is a small enough amount of money that it's probably not worthwhile looking for external judges.

Some Background Info

I guess I started to believe that counterfactuals were circular when I started to ask questions like, "What actually are these things we call counterfactuals?". I noticed that they didn't seem to exist in a literal sense, but that we also seem to be unable to do without them.

Some people have asked why the Bayesian Network approach suggested by Judea Pearl is insufficient (including in the comments below). This approach is firmly rooted in Causal Decision Theory (CDT). Most people on LW have rejected CDT because of its failure to handle Newcomb's Problem.

MIRI has proposed Functional Decision Theory (FDT) as an alternative, but this theory is dependent on logical counterfactuals and they haven't figured out exactly how to construct these. While I don't exactly agree with the logical counterfactual framing, I agree that these kinds of exotic decision theory problems require us to create a new notion of counterfactuals. And this naturally leads to questions about what counterfactuals really are which I see as further leading to the conclusion that they are circular.

I can see why many people are sufficiently skeptical of the notion of counterfactuals being circular that they dismiss it out of hand. It's entirely possible that I could be mistaken about this thesis, but for these people, I'd suggest reading Eliezer's post Where Recursive Justification Hits Bottom which argues for a circular epistemology since if you are persuaded by this post, counterfactuals being circular may then be less of a jump.

Fine Print

I'll award the prize assuming that there's at least one semi-decent submission (according to the standards of posts on Less Wrong). If this isn't the case, then I'll donate the money to an AI Safety organization instead. I'd be open to having this money be held in escrow.

I'm intending to award the prize to the top entry, but there's a chance that I split it if I can't make a decision.

  1. ^

    Counterpoint: requiring counterfactuals to justify their own use isn't the same as counterfactuals only making sense from within themselves. Response: It's possible to engage in the appropriate symbol manipulation without a concept of counterfactuals, but we can't have a semantic understanding of what we're doing. We can't even describe this process without being to say things like "if given string of symbols s, do y". Similarly, counterfactuals aren't just justified by imagining the consequences of applying different mental over different circumstances, in this case, they are a system for performing well over a variety of circumstances.

37

Ω 11

114 comments, sorted by Click to highlight new comments since: Today at 10:58 PM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

I previously wrote a post about reconciling free will with determinism. The metaphysics implicit in Pearlian causality is free will (In Drescher's words: "Pearl's formalism models free will rather than mechanical choice."). The challenge is reconciling this metaphysics with the belief that one is physically embodied. That is what the post attempts to do; these perspectives aren't inherently irreconcilable, we just have to be really careful about e.g. distinguishing "my action" vs "the action of the computer embodying me" in a the Bayes net and distingu... (read more)

4Chris_Leong5mo
You've linked me to three different posts, so I'll address them in separate comments. Two Alternatives to Logical Counterfactuals I actually really liked this post - enough that I changed my original upvote to a strong upvote. I also disagree with the notion that logical counterfactuals make sense when taken literally so I really appreciated you making this point persuasively. I agreed with your criticisms of the material condition approach and I think policy-dependent source code could be potentially promising. I guess this naturally leads to the question of how to justify this approach. This results in questions like, "What exactly is a counterfactual?" and "Why exactly do we want such a notion?" and I believe that following this path leads to the discovery that counterfactuals are circular. I'm more open to saying that I adopt Counterfactual Non-Realism than I was when I originally commented although I don't see theories based on material conditionals as the only approach within this category. I guess I'm also more enthusiastic about thinking in terms of policies rather than action mainly because of the lesson I drew from the Counterfactual Prisoner's Dilemma [https://www.lesswrong.com/posts/sY2rHNcWdg94RiSSR/the-counterfactual-prisoner-s-dilemma] . I don't really know why I didn't make this connection at the time, since I had written that post a few months prior, but I appear to have missed this. I still feel that introducing the term "free will" is too loaded to be helpful here, regardless of whether you are or aren't using it in a non-standard fashion. Like I'd encourage you to structure your posts to try to separate: a) This is how we handle counterfactuals b) This is the implications of this for the free will debate A large part of this is because I suspect many people on Less Wrong are simply allergic to this term.
2Chris_Leong5mo
Thoughts on Modeling Naturalized Logic Decision Theory Problems in Linear Logic I hadn't heard of linear logic before - it seems like a cool formalisation - although I tend to believe that formalisations are overrated as unless they are used very carefully they can obscure more than they reveal. I believe that spurious counterfactuals are only an issue with the 5 and 10 problem because of an attempt to hack logical-if to substitute for counterfactual-if in such a way that we can reuse proof-based systems. It's extremely cool that we can do as much as we can working in that fashion, but there's no reason why we should be surprised that it runs into limits. So I don't see inventing alternative formalisations that avoid the 5 and 10 problem as particularly hard as the bug is really quite specific to systems that try to utilise this kind of hack. I'd expect that almost any other system in design space will avoid this. So if, as I claim, attempts at formalisation will avoid this issue by default, the fact that any one formalisation avoids this problem shouldn't give us too much confidence in it being a good system for representing counterfactuals in general. Instead, I think it's much more persuasive to ground any proposed system with philosophical arguments (such as your first post was focusing on), rather than mostly just posting a system and observing it has a few nice properties. I mean, your approach in this article certainly a valuable thing to do, but I don't see it as getting all the way to the heart of the issue. Interestingly enough, this mirrors my position in Why 1-boxing doesn't imply backwards causation [https://www.lesswrong.com/posts/gAAFzqJkfeSHvcwTw/why-1-boxing-doesn-t-imply-backwards-causation] where I distinguish between Raw Reality (the territory) and Augmented Reality (the territory augmented by counterfactuals). I guess I put more emphasis on delving into the philosophical reasons for such a view and I think that's what this post is a bit sh
4jessicata5mo
Thanks for reading all the posts! I'm not sure where you got the idea that this was to solve the spurious counterfactuals problem, that was in the appendix because I anticipated that a MIRI-adjacent person would want to know how it solves that problem. The core problem it's solving is that it's a well-defined mathematical framework in which (a) there are, in some sense, choices, and (b) it is believed that these choices correspond to the results of a particular Turing machine. It goes back to the free will vs determinism paradox, and shows that there's a formalism that has some properties of "free will" and some properties of "determinism". A way that EDT fails to solve 5 and 10 is that it could believe with 100% certainty that it takes $5 so its expected value for $10 is undefined. (I wrote previously [https://www.lesswrong.com/posts/Rcwv6SPsmhkgzfkDw/edt-solves-5-and-10-with-conditional-oracles] about a modification of EDT to avoid this problem.) CDT solves it by constructing physically impossible counterfactuals which has other problems, e.g. suppose there's a Laplace's demon that searches for violations of physics and destroys the universe if physics is violated; this theoretically shouldn't make a difference but it messes up the CDT counterfactuals. It does look like your post overall agrees with the view I presented. I would tend to call augmented reality "metaphysics" in that it is a piece of ontology that goes beyond physics. I wrote about metaphysical free will [https://unstableontology.com/2020/03/22/what-is-metaphysical-free-will/] a while ago and didn't post it on LW because I anticipated people would be allergic to the non-physicalist philosophical language.
4Chris_Leong5mo
Thanks for that clarification. I suppose that demonstrates that the 5 and 10 problem is a broader problem than I realised. I still think that it's only a hard problem within particular systems that have a vulnerability to it. Yeah, we have significant agreement, but I'm more conservative in my interpretations. I guess this is a result of me being, at least in my opinion, more skeptical of language. Like I'm very conscious of arguments where someone says, "X could be described by phrase Y" and then later they rely on connations of Y that weren't proven. For example, you write, "From the AI’s perspective, it has a choice among multiple actions, hence in a sense “believing in metaphysical free will”. I would suggest it would be more accurate to write: "The AI models the situation as though it had free will" which leaves open the possibility that it is might be just a pragmatic model, rather than the AI necessarily endorsing itself as possessing free will. Another way of framing this: there's an additional step in between observing that an agent acts or models a situation as it believes in freewill and concluding that it actually believes in freewill. For example, I might round all numbers in a calculation to integers in order to make it easier for me, but that doesn't mean that I believe that the values are integers.
2Chris_Leong5mo
Comments on A critical agential account of free will, causation, and physics We can imagine a situation where there is a box containing an apple or a pear. Suppose we believe that it contains a pear, but we believe it contains an apple. If we look in the box (and we have good reason to believe looking doesn't change the contents), then we'll falsfy our pear hypothesis. Similarly, if we're told by an oracle that if we looked we would see a pear, then there'd be no need for us to actually look, we'd have heard enough to falsify our pear hypothesis. However, the situation you've identified isn't the same. Here you aren't just deciding whether to make an observation or not, but what the value of that observation would be. So in this case, the fact that if you took action B you'd observe the action you took was B doesn't say anything about the case where you don't take action B, unlike knowing that if you looked in the box you'd see you an apple provides you information even if you don't look in the box. It simply isn't relevant unless you actually take B. I think it's reasonable to suggest starting from falsification as our most basic assumption. I guess where you lose me is when you claim that this implies agency. I guess my position is as follows: * It seems like agents in a deterministic universe can falsify theories in at least some sense. Like they take two different weights drop them and see they land at the same time falsifying the fact that heavier objects fall faster * On the other hand, some like agency or counterfactuals seems necessary for talking about falsfiability in the abstract as this involves saying that we could falsify a theory if we ran an experiment that we didn't. In the second case, I would suggest that what we need is counterfactuals not agency. That is, we need to be able to say things like, "If I ran this experiment and obtained this result, then theory X would be falsified", not "I could have run this experiment and if I d
2jessicata5mo
The main problem is that it isn't meaningful for their theories to make counterfactual predictions about a single situation; they can create multiple situations (across time and space) and assume symmetry and get falsification that way, but it requires extra assumptions. Basically you can't say different theories really disagree unless there's some possible world / counterfactual / whatever in which they disagree; finding a "crux" experiment between two theories (e.g. if one theory says all swans are white and another says there are black swans in a specific lake, the cruxy experiment looks in that lake) involves making choices to optimize disagreement. Those seem pretty much equivalent? Maybe by agency you mean utility function optimization, which I didn't mean to imply was required. The part I thought was relevant was the part where you can believe yourself to have multiple options and yet be implemented by a specific computer.
2Chris_Leong5mo
Agreed, this is yet another argument for considering counterfactuals to be so fundamental that they don't make sense outside of themselves. I just don't see this as incompatible with determinism, b/c I'm grounding using counterfactuals rather than agency. I don't mean utility function optimization, so let me clarify what as I see as the distinction. I guess I see my version as compatible with the determinist claim that you couldn't have run the experiment because the path of the universe was always determined from the start. I'm referring to a purely hypothetical running with no reference to whether you could or couldn't have actually run it. Hopefully, my comments here have made it clear where we diverge and this provides a target if you want to make a submission (that said, the contest is about the potential circular dependency of counterfactuals and not just my views. So it's perfectly valid for people to focus on other arguments for this hypothesis, rather than my specific arguments).

I mostly agree with Zack_M_Davis that this is a solved problem, although rather than talking about a formalization of causality I'd say this is a special case of epistemic circularity and thus an instance of the problem of the criterion. There's nothing unusual going on with counterfactuals other than that people sometimes get confused about what propositions are (e.g. they believe propositions have some sort of absolute truth beyond causality because they fail to realize epistemology is grounded in purpose rather than something eternal and external to the... (read more)

2Chris_Leong5mo
Which part are you claiming is a solved problem? Is it: a) That counterfactuals can only be understood within the counterfactual perspective OR b) The implications of this for decision theory OR c) Both
2G Gordon Worley III5mo
I think A is solved, though I wouldn't exactly phrase it like that, more like counterfactuals make sense because they are what they are and knowledge works the way it does. Zack seems to be making a claim to B, but I'm not expert enough in decision theory to say much about it.
2Chris_Leong5mo
Sorry, when you say A is solved, you're claiming that the circularity is known to be true, right? Zack seems to be claiming that Bayesian Networks both draw out the implications and show that the circularity is false. So unless I'm misunderstanding you, your answer seems to be at odds with Zack.
5G Gordon Worley III5mo
I don't think they're really at odds. Zack's analysis cuts off at a point where the circularity exists below it. There's still the standard epistemic circularity that exists whenever you try to ground out any proposition, counterfactual or not, but there's a level of abstraction where you can remove the seeming circularity by shoving it lower or deeper into the reduction of the proposition towards grounding out in some experience. Another way to put this is that we can choose what to be pragmatic about. Zack's analysis choosing to be pragmatic about counterfactuals at the level of making decisions, and this allows removing the circularity up to the purpose of making a decision. If we want to be pragmatic about, say, accurately predicting what we will observe about the world, then there's still some weird circularity in counterfactuals to be addressed if we try to ask questions like "why these counterfactuals rather than others?" or "why can we formulate counterfactuals at all?". Also I guess I should be clear that there's no circularity outside the map. Circularity is entirely a feature of our models of reality rather than reality itself. That's way, for example, the analysis on epistemic circularity I offer is that we can ground things out in purpose and thus the circularity was actually an illusion of trying to ground truth in itself rather than experience. I'm not sure I've made this point very clearly elsewhere before, so sorry if that's a bit confusing. The point is that circularity is a feature of the relative rather than the absolute, so circularity exists in the map but not the territory. We only get circularity by introducing abstractions that can allow things in the map to depend on each other rather than the territory.
2Chris_Leong5mo
I wouldn't be surprised if other concepts such as probability were circular in the same way as counterfactuals, although I feel that this is more than just a special case of epistemic circularity. Like I agree that we can only reason starting from where we are - rather than from the view from nowhere - but counterfactuals feel different because they are such a fundamental concept that appears everywhere. As an example, our understanding of chairs doesn't seem circular in quite the same sense. That said, I'd love to see someone explore this line of thought. I could be wrong, but I suspect Zack would disagree with the notion that there is a circularity below it involving counterfactuals. I wouldn't be surprised though if Zack acknowledge a circularity not involving counterfactuals. Agreed. That said, I don't think counterfactuals are in the territory. I think I said before that they were in the map, although I'm now leaning away from that characterisation as I feel that they are more of a fundamental category that we use to draw the map.
2G Gordon Worley III5mo
Yes, I think there is something interesting going on where human brains seem to operate in a way that makes counterfactuals natural. I actually don't think there's anything special about counterfactuals, though, just that the human brain is designed such that thoughts are not strongly tethered to sensory input vs. "memory" (internally generated experience), but that's perhaps only subtly different than saying counterfactuals rather than something powering them is a fundamental feature of how our minds work.
1tailcalled5mo
I think I disagree here. I'm working on an entry to OP's competition which will contain an argument showing some inherent convergence between different agent's counterfactuals, due to the structure of the universe.
2G Gordon Worley III5mo
I think this is just agreement then? That minds are influenced by the structure of the universe they operate in in similar ways sounds like exactly what we should expect. That doesn't mean we need to elevate such convergence to be something more than intersubjective agreement about reality.
1tailcalled5mo
If minds are influenced by the structure of the universe, then that requires some causal structure of the universe to influence them.
2G Gordon Worley III5mo
Causation is a feature of models, not reality. We need only suppose reality is one thing after another (or not even that! reality is just this moment, which for us contains a sensation we call a memory of past moments), and any causal structure is inferred to exist rather than something we directly observe. I make this argument in some detail here: https://www.lesswrong.com/posts/RMBMf85gGYytvYGBv/no-causation-without-reification
1tailcalled5mo
I feel a bit confused. I agree that causal structure is inferred to exist, and never directly observable. However, the universe has certain properties that makes it very hard not to infer a causal structure if we want to model it, in particular: * A constant increase in entropy * Deterministic laws relating the past and the future * ... which have symmetry across time and space It seems exponentially hard to account for this without causality. When opening the post: I immediately disagree here, formally we usually model causality as our observations being generated by some sort of dynamical system. This cannot be specified with a mathematical notation like implication. Sure, I know, but that doesn't mean there's no dynamical process generating the territory, only that we don't know which one (and maybe can't know). A and B are typically high-level features in our models that simplify the territory; as a result, the causality in our models will also be simplifications of the causality in the territory. But without causality, I don't see how you'd get thermodynamics. That seems like a "just is" that is best accounted for causally, even if we don't have the exact causal theory underlying it. (Somehow, thermodynamics has managed to hold even as we've repeatedly updated our models, because it doesn't depend on the exact causal model, but instead follows from deep aspects of the causal structure of reality.) But if causality is describing some feature of reality, and the feature it is describing is not itself causal, then what is the feature it is describing?

I'm still puzzled by your puzzlement.

You are treating httpss://www.greaterwrong.com/posts/T4Mef9ZkL4WftQBqw/the-nature-of-counterfactuals as though it still an open, but as far as I can see, all the issues raised were answered in the comments .

I think this is a solved problem. Are you familiar with the formalization of causality in terms of Bayesian networks? (You have enough history on this website that you've probably heard of it!)

Make observations using sensors. Abstract your sensory data into variables: maybe you have a weather variable with possible values RAINY and SUNNY, a sprinkler variable with possible values ON and OFF, and a sidewalk variable with possible values WET and DRY. As you make more observations, you can begin to learn statistical relationships between your variables: maybe... (read more)

4tailcalled5mo
I don't really agree. The idea of using conditional independencies as measuring causality is cute in theory, but it doesn't IME work in practice for many reasons. Both because things are rarely truly independent, because you don't get enough data to test for independencies in practice, and because conditional independence relations are not enough to uniquely identify the causal structure. There's much more to causality than just conditional independence relations.
2Zack_M_Davis5mo
Maybe I'm explaining it badly? I'm trying to point to the Judea Pearl thing [http://bayes.cs.ucla.edu/BOOK-2K/] in my own words. The claim is not that causality "just is" conditional independence relationships. (Pearl repeatedly explicitly disclaims that causal concepts are different from statistical concepts and require stronger assumptions.) Do you have an issue with the graph formalism itself (as an explanation of the underlying reality of how causality and counterfactuals work), separate from practical concerns about how one would learn a particular graph?

Maybe I'm explaining it badly? I'm trying to point to the Judea Pearl thing in my own words. The claim is not that causality "just is" conditional independence relationships. (Pearl repeatedly explicitly disclaims that causal concepts are different from statistical concepts and require stronger assumptions.)

Partly it's explaining it badly. In addition to the points listed above, there's also issues like focusing entirely on rung 2 causality and disregarding rung 3 causality, which is arguably the truer kind of causality.

Do you have an issue with the graph formalism itself (as an explanation of the underlying reality of how causality and counterfactuals work), separate from practical concerns about how one would learn a particular graph?

I assume that here we are understanding the graph formalism sufficiently broadly as to include e.g. differential equations, as otherwise there's definitely a problem already there. And in the same vein, for most problems both DAGs and differential equations are too rigid/vector-spacey to work, and we probably need new formalisms that can better handle systems with varying structure of variables.

Regardless, I don't think the question of how one would ... (read more)

2Chris_Leong5mo
Yeah, I'm aware of Bayesian Networks. Two points: 1. Bayesian Networks don't solve Newcomb's problem, but I assume you're aware of it. So I'm guessing your point is that if standard counterfactuals can be constructed outside of the counterfactual perspective that more general counterfactuals would most likely be the same? 2. Does the concept of a variable even make sense without counterfactuals? It's not immediately obvious that it does, although I haven't thought through this enough to assert that it doesn't. Update: Having spent a few minutes thinking this through, I've concluded that the concept of a variable over time makes sense or a variable over space, ect. makes sense without counterfactuals. However, this is a more limited notion of variable than that which we normally deal with as, if for example, the variable L representing the state of a lightswitch is "ON" at t=0, then we wouldn't have the notion that it could have been "OFF" instead. Update 2: Upon further thought, this seems more limited than I first thought. For example, we can't say let a be how many apples there would be at time t if we counted them, because "if we counted them" is invoking counterfactual reasoning, unless we really did count the apples at each time period. In any case, the issue of whether or not Bayesian Networks are circular seems to be complex enough that it is deserving of further investigation.

How much are you interested in a positive vs normative theory of counterfactuals? For example, do you feel like you understand how humans do counterfactual reasoning, and how and why it works for them (insofar as it works for them)? If not, is such an understanding what you're looking for? Or do you think humans are not perfect at counterfactual reasoning (e.g. maybe because people disagree with each other about Newcomb's problem etc.) and there's some deep notion of "correct counterfactual reasoning" that humans are merely approximating, and the deeper "c... (read more)

4Chris_Leong5mo
Update: I should further clarify that even though I provided a rough indication of how important I consider various approaches, this is off-the-cuff and I could be persuaded an approach was more valuable than I think, particularly if I saw good quality work. I guess my ultimate interest is normative as the whole point of investigating this area is to figure out what we should do. However, I am interested in descriptive theories insofar as they can contribute to this investigation (and not insofar as the details aren't useful for normative theories). For example, when I say that counterfactuals only make sense from within the counterfactual perspective and further that counterfactuals are ultimately grounded as an evolutionary adaption I'm making descriptive statements. The latter seems to be more of a positive statement, while the former doesn't seem to be (it seems to be justified by philosophical reasoning more than empirical investigation). In any case, it feels like there is more work to be done in taking these high-level abstract statements and making them more precise. I think that further investigation here could be useful - although not in the sense that 40% use this style of reasoning and 60% use this style - exact percentages aren't the relevant things here - at least not at this early stage. I'd also lean towards saying that how experts operate is more important than average humans and that the behavior of especially stupid humans is probably of limited importance. I guess I see the behaviour of normal humans mattering for two reasons: a) Firstly because I see making use of counterfactuals as evolutionarily grounded (in a more primitive form than the highly cognitive and mathematically influenced versions that we tend to use on LW) b) Secondly because the experts are more likely to discard intuitions that don't agree with their theories. And I think we need to use our reasoning to produce a consistent theory from our intuitions at some point, but th
8Steven Byrnes5mo
I think brains build a generative world-model, and that world-model is a certain kind of data structure, and "counterfactual reasoning" is a class of operations that can be performed on that data structure. (See here [https://www.lesswrong.com/posts/SkcM4hwgH3AP6iqjs/can-you-get-agi-from-a-transformer] .) I think that counterfactual reasoning relates to reality only insofar as the world-model relates to reality. (In map-territory terminology: I think counterfactual reasoning is a set of things that you can do with the map, and those things are related to the territory only insofar as the map is related to the territory.) I also think that there are lots of specific operations that are all "counterfactual reasoning" (just as there are lots of specific operations that are all "paying attention"—paying attention to what?), and once we do a counterfactual reasoning operation, there are also a lot of things that we can do with the result of the operation. I think that, over our lifetimes, we learn metacognitive heuristics that guide these decisions (i.e. exactly what "counterfactual reasoning"-type operations to do and when, and what to do with the result of the operation), and some people's learned metacognitive heuristics are better than others (from the perspective of achieving such-and-such goal). Analogy: If you show me a particular trained ConvNet that misclassifies a particular dog picture as a cat, I wouldn't say that this reveals some deep truth about the nature of image classification, and I wouldn't conclude that there is necessarily such a thing as a philosophically-better type of image classifier that fundamentally doesn't ever make mistakes like that. (The brain image classifier makes mistakes too [https://en.wikipedia.org/wiki/Optical_illusion], albeit different mistakes than ConvNets make, but that's besides the point.) Instead I would be more inclined to look for a very complicated explanation of the mistake, related to details of its training data and
4Chris_Leong5mo
Agreed. This is definitely something that I would like further clarity on I guess the real-world reasons for a mistake are sometimes not very philosophically insightful (ie. Bob was high when reading the post, James comes from a Spanish speaking background and they use their equivalent of a word differently than English-speakers, Sarah has a terrible memory and misremembered it) I'm guessing like your position might be that there are just mistakes and there aren't mistakes that are more philosophically fruitful or less fruitful? There's just mistakes. Is that correct? Or were you just responding to my specific claim that it might be useful to know how the average person responds to problems because we are evolved creatures? If so, then I definitely agree that we'd have to delve into the details and not just remain on the level of averages. Update: Actually, I'll add an analogy that might be helpful. Let's suppose you didn't know what a dog was. Actually, that's kind of the case: once you start diving into any definition you end up running into fuzzy cases, such as does a robotic dog count as a dog? Then if humans had built a bunch of different classifiers and you didn't have access to the humans (say they went extinct) then you might want to analyse the different classifiers to try to figure out how humans defined the term dog, even though much of the behaviour might only tell you how the flaws tend to produce rather than about the human concept Similarly, we don't have exact access to our evolutionary history, but examining human intuitions about counterfactuals might provide insights about which heuristics have worked well, whilst also recognising that it's hard, arguably impossible, to even talk about "working well" without embracing the notion of counterfactuals. And I agree that there are probably different ways we could emphasis various heuristics rather than a unique, principled solution. I'm not claiming the situation is precisely this - in fact I'm not
2Steven Byrnes5mo
Hmm, my hunch is that you're misunderstanding me here. There are a lot of specific operations that are all "making a fist". I can clench my fingers quickly or slowly, strongly or weakly, left hand or right hand, etc. By the same token, if I say to you "imagine a rainbow-colored tree; are its leaves green?", there are a lot of different specific mental models that you might be invoking. (It could have horizontal rainbow stripes on the trunk, or it could have vertical rainbow stripes on its branches, etc.) All those different possibilities involve constructing a counterfactual mental model and querying it, in the same nuts-and-bolts way. I just meant, there are many possible counterfactual mental models that one can construct. Suppose I ask "There's a rainbow-colored tree somewhere in the world; are its leaves green?" You think for a second. What's happening under the surface when you think about this? Inside your head are various different models pushing in different directions. Maybe there's a model that says something like "rainbow-colored things tend to be rainbow-colored in all respects". So maybe you're visualizing a rainbow-colored tree, and querying the color of the leaves in that model, and this model is pushing on your visualized tree and trying to make it have a color scheme that's compatible with the kinds of things you usually see, e.g. in cartoons, which would be rainbow-colored leaves. But there's also a botany model that says "tree leaves tend to be green, because that's the most effective for photosynthesis, although there are some exceptions like Japanese maples and autumn colors". In scientifically-educated people, probably there will also be some metacognitive knowledge that principles of biology and photosynthesis are profound deep regularities in the world that are very likely to generalize , whereas color-scheme knowledge comes from cartoons etc. and is less likely to generalize. So what's at play is not "the nature of counterfactuals", but th
4Chris_Leong5mo
I agree that there isn't a single uniquely correct notion of a counterfactual. I'd say that we want different things from this notion and there are different ways to handle the trade-offs. I find this confusing as CDT counterfactuals where you can only project forward seem very different from things like FDT where you can project back in time as well. Well, we need the information encoded in our DNA rather than than what is actually implemented in humans (clarification: what is implemented in humans is significantly influenced by society) though we aren't at the level where we can access that by analysing the DNA directly or people's brain structure for that matter, so we have to reverse engineer it from behaviour I've very much focused on trying to understand how to solve these problems in theory rather than how can we correct any cognitive flaws in humans or on how to adapt decision theory to be easier or more convenient to use. In so far as I'm interested in how average humans reason counterfactually, it's mostly about trying to understand the various heuristics that are the basis of counterfactuals. I guess I believe that we need counterfactuals to understand and evaluate these heuristics, but I guess I'm hoping that we can construct something reflexively consistent.

By the same token, I think every neurotypical human thinking about Newcomb's problem is using counterfactual reasoning, and I think that there isn't any interesting difference in the general nature of the counterfactual reasoning that they're using.

I find this confusing as CDT counterfactuals where you can only project forward seem very different from things like FDT where you can project back in time as well.

I think there is "machinery that underlies counterfactual reasoning" (which incidentally happens to be the same as "the machinery that underlies imagination"). My quote above was saying that every human deploys this machinery when you ask them a question about pretty much any topic.

I was initially assuming (by default) that if you're trying to understand counterfactuals, you're mainly trying to understand how this machinery works. But I'm increasingly confident that I was wrong, and that's not in fact what you're interested in. Instead it seems that your interests are more like "how would an AI, equipped with this kind of machinery, reach correct conclusions about the world?" (After all, the machinery by itself can lead to both correct and incorrect conclusions—just as "thinki... (read more)

3Chris_Leong5mo
I agree that counterfactual reasoning is contingent on certain brain structures, but I would say the same about logic as well and it's clear that the logic of a kindergartener is very different from that of a logic professor - although perhaps we're getting into a semantic debate - and what you mean is that the fundamental machinery is more or less the same. Yeah, this seems accurate. I see understanding the machinery as the first step towards the goal of learning to counterfactually reason well. As an analogy, suppose you're trying to learn how to reason well. It might make sense to figure out how humans reason, but if you want to build a better reasoning machine and not just duplicate human performance, you'd want to be able to identify some of these processes as good reasoning and some as biases. I guess I don't see why there would need to be a separation in order for the research direction I've suggested to be insightful. In fact, if there isn't a separation, this direction could even be more fruitful as it could lead to rather general results. I would say (as a slight simplification) that our goal in studying counterfactual reasoning should be to get counterfactuals to a point where we can answer questions about them using our normal reasoning. That post certainly seems to contain an awful lot of philosophy to me. And I guess even though this post and my post On the Nature of Counterfactuals [https://www.lesswrong.com/posts/T4Mef9ZkL4WftQBqw/the-nature-of-counterfactuals] don't make any reference to decision theory, that doesn't mean that it isn't in the background influencing what I write. I've written a lot of posts here, many of which discuss specific decision theory questions. I guess I would still consider Joe Carlsmith's post a high-quality post if it had focused exclusively on the more philosophical aspects. And I guess philosophical arguments are harder to evaluate than mathematical ones and it can be disconcerting for some people, especially thos

My entry. Ultimately I'm not sure whether I agree or disagree with your point, but I hope I've bought up some valuable things.

I'm not sure how strong you are in physics; the "Causality is real, counterfactuals are not" section is a brief summary of some fairly abstract and general properties of physics, so we might need to discuss it further in the comments if they do not immediately ring true to you.

2Chris_Leong4mo
Thanks for your submission. I'm still thinking about it, but I really appreciated how your entry engaged with the topic. Yeah, I did at one point have a brief passing thought that you could make an argument along the lines you followed (that counterfactuals are a construction, but that they are built on top of underlying rules of the universe which have a real existence). Ideally, I would have thought through this line of thought before writing The Nature of Counterfactuals, but I lack the patience to spend a long time polishing before I release a post, so I mentally tagged it as something to think more about later. I guess one reason why I might have tagged this as a "latter" thought is that I'm still trying to figure out my way around the debate between those who believe that the universe has laws vs. the more Humean perspective that things just are. Thanks for developing this perspective. At the very least, it'll provide a more solid target for me to engage with (vs. the vague intuition I had that an argument along these lines might be viable), but it's also possible that I may come to agree with it after I've thought it through.
1tailcalled4mo
I'm not familiar with Hume's philosophy, but the idea that "things just are" without being restricted to follow some patterns/laws seems to lose badly in a Bayesian way to theories which accept the laws that exist.
2Chris_Leong4mo
Perhaps, I've only heard them vaguely, second-hand, so I'm reluctant to take a position on this yet.

I've spent some hours yesterday writing an entry for this competition, but before publishing it I thought it might be best for me to try to talk briefly about my thoughts in a comment here. I think my post would go under this heading:

b) A post that attempts to evaluate the arguments for and against adopting the principle that counterfactuals only make sense from within the counterfactual perspective

Specifically, I think I disagree with your thesis that it counterfactuals only make sense from a counterfactual perspective. Here's a sketch of my reason (which... (read more)

4Chris_Leong5mo
Interestingly enough I simultaneously hold that both: a) Counterfactuals only make sense from within themselves b) Counterfactuals are grounded by being an evolutionary adaption Given this, I just wanted to encourage you to make sure that you don't assume that it must be a) OR b), not both, without arguing for these being mutually exclusive possibilities. It's possible that evolution may provide us with a notion of counterfactuals that aren't recursively dependent upon themselves, although this would have to overcome the challenge of talking about evolution without invoking counterfactuals. Anyway, looking forward to reading your post.
1tailcalled5mo
Hm, now I wonder if I should try to come up with a causally-incorrect account of evolutionary history that still makes the same distributional predictions that the causally-correct ones do. This seems like it could produce a perspective on how different counterfactual models would interpret the grounding of our counterfactuals. Because ultimately evolution is just a feature of the universe that any theory must account for, whether it makes causally correct predictions or not.
1tailcalled5mo
Update: coming up with an alternative account of the causality involved in evolutionary history is actually... Really hard? Which of course is to be expected because I'm essentially trying to come up with a false theory that can account for a real phenomenon. But I think there might be something to be learned about the nature of causality from the difficulty of coming up with alternative causal explanations for evolutionary history, even though any set of mere observations should in theory be able to have an infinitude of causal explanations.
1tailcalled5mo
Aha, I think I've got it! Assuming "reasonable" theories, there is only one notion of causality that allows you to talk about the causal effects of an organism's genetics. Lemme explain: One way we could create incorrect causal accounts of evolution would be to break various physical symmetries. Just because a ball falls to the ground when I drop it does not mean it would have fallen to the ground when if it had been dropped elsewhere; thus maybe our universe is extremely causally unusual, because it just happens to "thread the needle" between an infinitude of states where the laws of nature would have been entirely different. The above sort of approach would permit pretty much any kind of counterfactual, but it would also be completely unable to explain why our universe just happens to thread the needle so perfectly. (One might imagine that one could explain it with a "common cause" model, since after all confounding is the big alternative to direct causal effects. However, the common cause would have to encode the entire trajectory of our universe, which is an enormous amount of information; this just makes the problem recursive, in that one then needs to come up with a causal model to explain this information.) So an account which breaks the equational laws of physics needs to appeal to a leap of faith on the order of all of the complexity of the entire universe's trajectory, which seems "unreasonable" to me - if nothing else, it doesn't seem computationally viable to represent such accounts. But the laws of physics can be seen as non-causal equations, rather than as causal effects; generally they're directly reversible, and even when they are not, they are still bijective and volume-preserving. That is, you can take any physical state and extrapolate it backwards, not just forwards. And you can also take a complicated jumble of pieces of physicals states across different times, and find trajectories that trace through them. So you could, for instance, pi
2Chris_Leong5mo
I'm kind of confused here. I can understand individual sentences, but not where you're going as a whole. So your aim here is to figure out why causality is forwards and not backwards? If not, what do you mean by there only being one notion of causality that allows threading the needle?
3tailcalled5mo
I was thinking about the question "Why don't agents construct crazy counterfactuals?", and decided that I wanted a clearer idea of what crazy counterfactuals would look like in the case of evolution. As in, if you asked someone who had a crazy set of counterfactuals what would have happened if some organism had had some different DNA, what would they answer?
2Chris_Leong5mo
Okay, that makes more sense now! I'll try to circle back and take a look at your original comment again when I have time.
3tailcalled5mo
I think perhaps one distinction that needs to be made is between "counterfactuals exist only in our imagination" and "causality exist only in our imagination". Counterfactuals definitely exist only in our imagination. We're literally making up some modified version of the world, and then extrapolating its imaginary consequences. Often, we might define causality in terms of counterfactuals; "X causes Y if Y has a counterfactual dependence on X". So in that sense we might imagine that causality too only exists in our imagination. But at least in the Pearlian paradigm, it's actually the opposite way around. You start with some causal (dynamical) system, and then counterfactuals are defined to be made-up/"mutilated" versions of that system. The reason we use counterfactuals in the Pearlian paradigm is because they are a convenient interface for "querying" the aggregated properties of causality. I'd argue that there is some real underlying causality that generates the universe. Though it's easy to be comfused about this, because we do not have direct access to this causality; instead we always think about massively-simplified carricatural models, which boil the enormous complexity of reality down into something manageable.
2Chris_Leong5mo
Yeah, sounds like a plausible theory.

Counterfactuals (in the potential outcome sense used in statistics) and Pearl's structural equation causality semantics are equivalent.

4Chris_Leong5mo
What are your thoughts on Newcomb's, ect?
2IlyaShpitser5mo
I gave a talk at FHI ages ago on how to use causal graphs to solve Newcomb type problems. It wasn't even an original idea: Spohn had something similar in 2012. I don't think any of this stuff is interesting, or relevant for AI safety. There's a pretty big literature on model robustness and algorithmic fairness that uses causal ideas. If you want to worry about the end of the world, we have climate change, pandemics, and the rise of fascism.
2Chris_Leong5mo
Why did you give a talk on causal graphs if you didn't think this kind of work was interesting or relevant? Maybe I'm misunderstanding what you're saying isn't interesting or relevant.

Oh hey, I already have slides for this.

 

Here you go: https://www.lesswrong.com/posts/vuvS2nkxn3ftyZSjz/what-is-a-counterfactual-an-elementary-introduction-to-the

 

I took the approach: if I very clearly explain what counterfactuals are and how to compute them, then it will be plain that there is no circularity. I attack the question more directly in a later paragraph, when I explain how counterfactual can be implemented in terms of two simpler operations: prediction and intervention. And that's exactly how it is implemented in our causal probabilis... (read more)

2Chris_Leong5mo
Hey Darmani, I enjoyed reading your post - it provides a very clear explanation of the three levels of the causal hierarchy - but it doesn't seem to really engage with the issue of circularity. I guess the potential circularity becomes important when we start asking the question of how to model taking different actions. After intervening on our decision node do we just project forward as per Causal Decision Theory or do we want to do something like Functional Decision Theory that allows back-projecting as well? If it's the latter, how exactly do we determine what is subjunctively linked to what? When trying to answer these questions, this naturally leads us to ask, "What exactly are these counterfactual things anyway?" and that path (in my opinion) leads to circularity. These issues seem to occur even in situations when we know perfectly how to forwards predict and where we are given sufficient information that we don't need to use abduction. Anyway, thanks for your submission! I'm really happy to have at least one submission already.
1Darmani5mo
I'm not surprised by this reaction, seeing as I jumped on banging it out rather than checking to make sure that I understand your confusion first. And I still don't understand your confusion, so my best hope was giving a very clear, computational explanation of counterfactuals with no circularity in hopes it helps. Anyway, let's have some back and forth right here. I'm having trouble teasing apart the different threads of thought that I'm reading. I think I'll need to see some formulae to be sure I know what you're talking about. I understand the core of decision theory to be about how to score potential actions, which seems like a pretty separate question from understanding counterfactuals. More specifically, I understand that each decision theory provides two components: (1) a type of probabilistic model for modeling relevant scenarios, and (2) a probabilistic query that it says should be used to evaluate potential actions. Evidentiary decision theory uses an arbitrary probability distribution as its model, and evaluates actions by P(outcome |action). Causal decision theory uses a causal Bayes net (set of intervential distributions) and the query P(outcome | do(action)). I understand FDT less well, but basically view it as similar to CDT, except that it intervenes on the input to a decision procedure rather than on the output. But all this is separate from the question of how to compute counterfactuals, and I don't understand why you bring this up. I still understand this to be the core of your question. Can you explain what questions remain about "what is a counterfactual" after reading my post?
2Chris_Leong5mo
While I can see this working in theory, in practise it's more complicated as it isn't obvious from immediate inspection to what extent an argument is or isn't dependent on counterfactuals. I mean counterfactuals are everywhere! Part of the problem is that the clearest explanation of such a scheme would likely make use of counterfactuals, even if it were later shown that these aren't necessary. The best source for learning about FDT is this MIRI paper [https://intelligence.org/2017/10/22/fdt/], but given its length, you might find the summary in this blog post [https://intelligence.org/2017/03/18/new-paper-cheating-death-in-damascus/] answers your questions more quickly. The key unanswered question (well, some people claim to have solutions) in Functional Decision theory is how to construct the logical counterfactuals that it depends on. What do I mean by logical counterfactuals? MIRI models agents as programs ie. logic so that imagining an agent taking an action other than it takes become imagining logic being such that a particular function provides a particular output on a given input than it does. Now I don't quite agree with the logical counterfactuals framing, but I have been working on the question of constructing appropriate counterfactuals for this situation.
1Darmani5mo
1. Is the explanation in the "What is a Counterfactual" post linked above circular? 2. Is the explanation in the post somehow not an explanation of counterfactuals? I read a large chunk of the FDT paper while drafting my last comment. The quoted sentence may hint at the root of the trouble that I and some others here seem to have in understanding what you want. You seem to be asking about the way "counterfactual" is used in a particular paper, not in general. It is glossed over and not explained in full detail in the FDT paper, but it seems to mainly rely on extra constraints on allowable interventions, similar to the "super-specs" in one of my other papers: https://www.jameskoppel.com/files/papers/demystifying_dependence.pdf [https://www.jameskoppel.com/files/papers/demystifying_dependence.pdf] . I'm going to go try to model Newcomb's problem and some of the other FDT examples in Omega. If I'm successive, it's evidence that there's nothing more interesting going on than what's in my causal hierarchy post.
3Chris_Leong5mo
Is the explanation in the post somehow not an explanation of counterfactuals? Oh, it's definitely an explanation of counterfactuals, but I wouldn't say it's a complete explanation of counterfactuals as it doesn't handle exotic cases (ie Newcomb's). I added some more background info after I posted the bounty and maybe I should have done that originally, but I posted the bounty on LW/alignment forum and that led me towards taking a certain background context as given, although I can now see that I should have clarified this originally. Is the explanation in the "What is a Counterfactual" post linked above circular? It seems that way, although maybe this circular dependence isn't essential. Take for example the concept of prediction. This seems to involve imagining different outcomes. How can we do this without counterfactuals? I guess I have the same question with interventions. This seems to depend on the notion that we could intervene or we could not intervene. Only one of these can happen - the other is a counterfactual.
1Darmani5mo
I don't understand what counterfactuals have to do with Newcomb's problem. You decide either "I am a one-boxer" or "I am a two-boxer," the boxes get filled according to a rule, and then you pick deterministically according to a rule. It's all forward reasoning; it's just a bit weird because the action in question happens way before you are faced with the boxes. I don't see any updating on a factual world to infer outcomes in a counterfactual world. "Prediction" in this context is a synonym for conditioning.P(x|y)is defined asP( x,y)P(y). If intervention sounds circular...I don't know what to say other than read Chapter 1 of Pearl ( https://www.amazon.com/Causality-Reasoning-Inference-Judea-Pearl/dp/052189560X [https://www.amazon.com/Causality-Reasoning-Inference-Judea-Pearl/dp/052189560X] ). To give a two-sentence technical explanation: A structural causal model is a straight-line program with some random inputs. They look like this It's usually written with nodes and graphs, but they are equivalent to straight-line programs, and one can translate easily between these two presentations. In the basic Pearl setup, an intervention consists of replacing one of the assignments above with an assignment to a constant. Here is an intervention setting the sprinkler off. From this, one can easily compute thatP(wetgrass|do(sprinkler=false))=12. If you want the technical development of counterfactuals that my post is based on, read Pearl Chapter 7, or Google around for the "twin network construction." Or I'll just show you in code below how you compute the counterfactual "I see the sprinkler is on, so, if it hadn't come on, the grass would not be wet," which is writtenP(wet_grass|sprinkler=true,do(sprinkler=false))=0 We construct a new program, This is now reduced to a pure statistical problem. Run this program a bunch of times, filter down to only the runs where sprinkler_factual is true, and you'll find that wet_grass_counterfactual is false i
2Chris_Leong5mo
Everyone agrees what you should do if you can precommit. The question becomes philosophically interesting when an agent faces this problem without having had the opportunity to precommit.
1Darmani5mo
Okay, I see how that technique of breaking circularity in the model looks like precommitment. I still don't see what this has to do with counterfactuals though.
2Chris_Leong5mo
"You decide either "I am a one-boxer" or "I am a two-boxer," the boxes get filled according to a rule, and then you pick deterministically according to a rule. It's all forward reasoning; it's just a bit weird because the action in question happens way before you are faced with the boxes." So you wouldn't class this as precommitment?
1Darmani5mo
I realize now that this expressed as a DAG looks identical to precommitment. Except, I also think it's a faithful representation of the typical Newcomb scenario. Paradox only arises if you can say "I am a two-boxer" (by picking up two boxes) while you were predicted to be a one-boxer. This can only happen if there are multiple nodes for two-boxing set to different values. But really, this is a problem of the kind solved by superspecs in my Onward! paper. There is a constraint that the prediction of two-boxing must be the same as the actual two-boxing. Traditional causal DAGs can only express this by making them literally the same node; super-specs allow more flexibility. I am unclear how exactly it's handled in FDT, but it has a similar analysis of the problem ("CDT breaks correlations").

My entry. Focuses on the metaphysics of counterfactuals arguing that there are two types based upon two different possible states of a person's mental model of causal relationships. This agrees with circularity. In general, I concur with principles 1-4 which you outline. My post hits on a bit of criteria a) b) and d).

https://www.lesswrong.com/posts/EvDsnqvmfnjdQbacb/circular-counterfactuals-only-that-which-happens-is-possible

3JohnBuridan2mo
Also, to the people who see everything confusing about counterfactuals as solved, this seems like a failure to ask new questions. If counterfactuals were "solved", I would expect to be living in a world where would be no difficulty reverse engineering anything, the the theory and practice of prior formation would also be solved, decision theory would be unified into one model. We don't live in that world. I think there is still tons of fertile ground for thinking about the use of counterfactuals and we have not yet really scratched the surface of what's possible.
1TAG2mo
Being solved at the level that philosophy operates doesn't imply being solved at the engineering level.
1JohnBuridan2mo
You are right, of course. But even at the "level of philosophy" there are different levels, corridors, and extrapolations possible. For example, it is not a question of engineering whether counterfactuals on chaotic systems are conditional predictions, or whether counterfactuals of different types of relationships have less necessary connection.

Some people have asked why the Bayesian Network approach suggested by Judea Pearl is insufficient (including in the comments below). This approach is firmly rooted in Causal Decision Theory (CDT). Most people on LW have rejected CDT because of its failure to handle Newcomb's Problem.

I'll make a counter-claim and say that most people on LW in fact have rejected the use of Newcomb's Problem as a test that will say something useful about decision theories.

That being said, there is definitely a sub-community which believes deeply in the relevance of Newcomb... (read more)

2Chris_Leong5mo
Firstly, I don't see why that would interfere with evaluating possible arguments for and against circular dependency. It's possible for an article to be here's why these 3 reasons why we might think counterfactuals are circular are all false (not stating that an article would have to necessarily engage with 3 different arguments to win). Secondly, I guess my issue with most of the attempts to say "use system X for counterfactuals" is that people seem to think merely not mentioning counterfactuals means that there isn't a dependence on them. So there likely needs to be some part of such an article discussing why things that look counterfactual really aren't. I briefly skimmed your article and I'm sure if I read it further I'd learn something interesting, but merely as is it wouldn't be on scope.
1Koen.Holtman5mo
OK, so if I understand you correctly, you posit that there is something called 'circular epistemology'. You said in the earlier post you link to at the top: You further suspect that circular epistemology might have something useful to say about counterfactuals, in terms of offering a justification for them without 'hitting a point where we can provide no justification at all'. And you have a bounty for people writing more about this. Am I understanding you correctly?
2Chris_Leong5mo
Yeah, I believe epistemology to be inherently circular. I think it has some relation to counterfactuals being circular, but I don't see it as quite the same as counterfactuals seem a lot harder to avoid using than most other concept. The point of mentioning circular epistemology was to persuade people that my theory isn't as absurd as it sounds at first.
1Koen.Holtman5mo
Wait, I was under the impression from the quoted text that you make a distinction between 'circular epistemology' and 'other types of epistemology that will hit a point where we can provide no justification at all'. i.e. these other types are not circular because they are ultimately defined as a set of axioms, rewriting rules, and observational protocols for which no further justification is being attempted. So I think I am still struggling to see what flavour of philosophical thought you want people to engage with, when you mention 'circular'. Mind you, I see 'hitting a point where we provide no justification at all' as a positive thing in a mathematical system, a physical theory, or an entire epistemology, as long as these points are clearly identified.
2Chris_Leong5mo
If you're referring to the Wittgenstenian quote, I was merely quoting him, not endorsing his views.
1Koen.Holtman5mo
Not aware of which part would be a Wittgenstenian quote. Long time ago that I read Wittgenstein, and I read him in German. In any case, I remain confused on what you mean with 'circular'.
2Chris_Leong5mo
Hmm... Oh, I think that was elsewhere on this thread. Probably not to you. Eliezer's Where Recursive Justification Hits Bottom seems to embrace a circular epistemology despite its title.
1TAG5mo
He doesn't show much sign of embracing the validity of all circular argument ss, and neither do you.
2Chris_Leong5mo
I never said all circular arguments are valid
1TAG5mo
That doesn't help. If recursive justification is a particular kind of circular argument that's valid, so that others are invalid, then something makes it valid. But what? EY doesn't say. And how do we know that the additional factor isn't doing all the work?
1Koen.Holtman5mo
??? I don't follow. You meant to write "use system X instead of using system Y which calls itself a definition of counterfactuals "?
2Chris_Leong5mo
What I mean is that some people seem to think that if they can describe a system that explains counterfactuals without mentioning counterfactuals when explaining them that they've avoided a circular dependence. When of course, we can't just take things at face value, but have to dig deeper than that.
1Koen.Holtman5mo
OK thanks for explaining. See my other recent reply for more thoughts about this.

You're asking questions of the form "which concepts are more fundamentally real". Such questions almost always lead nowhere useful (unless you consider continental philosophy useful). I'm happy to go through some examples to build intuition as to why I feel that, please let me know when you reply me.

I'd reframe "circular dependency of counterfactuals" as "counterfactuals are fundamental". The circularity you see is more fundamental concepts leading to less fundamental ones, and the less fundamental ones leading to more fundamental ones. But less fundamenta... (read more)

2Chris_Leong5mo
Firstly, thanks for engaging with the circularity argument as there, unfortunately, hasn't been much engagement with it on the thread. I guess I don't see a reason to reframe it like this. Your object to circularity is: But that only makes sense if you've already reframed it. If I simply talk about circularity and avoid defining any concept in the circle as more or less fundamental, then that argument doesn't get off the ground. So I guess it seems stronger to leave the framing as is since that dodges the argument you just provided. I agree that asking the question involves assuming the existence of a lot more concepts, but why would this affect the claimed circularity of counterfactuals?
1acylhalide5mo
Oh okay, maybe then I haven't understood what you mean by "circularity of counterfactuals", or what specific claims you are making. To quote from your post: I'm not totally sure how to interpret this. "Counterfactuals" feel more fundamental to me than "purpose". Counterfactuals don't exist because X, counterfactuals exist, period. Using "because" in a sentence assumes counterfactuals exist. You can say "counterfactuals exist and they can be used for X" instead of "counterfactuals exist because X". Here I'm guessing your usage of "because" also assumes the existence of free will / choice, and what we can do using that free will. I'd say you can't use free will to stop constructing counterfactuals. Like maybe you can stop thinking altogether, but if you switch on the part of your brain that does thinking, that part is only capable of thinking in a way that assumes counterfactuals exist.
2Chris_Leong5mo
Perhaps they do, but I guess I'm challenging this by suggesting that counterfactuals only make sense from within the counterfactual perspective. Or reframing this, counterfactuals only make sense from a cognitive frame. I don't see this as connecting to the free will debate. "Because" assumes that humans have such a thing as a will, but there's no requirement for it to be free. I agree with this, although I can see why my position is confusing. I guess I believe both that: a) Humans automatically make use of some intuitive notion or notions of counterfactuals b) People interested in decision theory intentionally try to construct a more principled and consistent notion of counterfactuals I guess it was the later question I was referring to when I was asking why humans construct counterfactuals.
1acylhalide5mo
Makes sense. Why is that only perhaps? For me, saying a) is true is the same thing as saying counterfactuals exist (as a concept). Could you please elaborate on the meaning of cognitive frame?
2Chris_Leong5mo
I guess I'd roughly describe it as something that forms models of the world.
1acylhalide5mo
(just made small edits in my replY)
1acylhalide5mo
That is kinda true but I'll frame it differently. I can recognise the existence of countefactuals even before I recognise "wait, my universe follows ordered rules?" (laws of physics) or "my mind is also contrained by the same rules and therefore I can build models of myself?" or "why don't I try defining ideal models called decision theories?" So I mean, sure, the formal notion of counterfactuals as in decision theory (point b) is obviously a product of human imagination, but the intuitive notion (point a) to me exists as very fundamental in the sense that I am literally not capable of conscious experience or thought without already reasoning using counterfactuals. It exists in my phenomenal world even before I open my eyes and realise there's a physical world out there. (I personally see the phenomenal world as more fundamnental than the physical, but I'm sure others will debate otherwise.) If you believe you can yourself reason (about anything) without adopting the intuitive notion of counterfactuals do tell (I don't think you can). If you're talking about decision theories that don't use counterfactuals, then sure, you can define a model, and this will be a model of something, but is it really a model of either a human or AI ideal if it doesn't use counterfactuals? It is trivial to define a Turing machine that doesn't build world models or use counterfactuals, but then this is not a decision theory. You need to ground formal models in actual experience. "Decision theories that don't recognise the existence of counterfactuals" sounds a little bit like "theorem provers that don't recognise the notion of true or false". Theorem provers recognise a formal notion of true / false, which is grounded on my intuitive experience of true / false.
2Chris_Leong5mo
Well, this is why I proposed that counterfactuals only make sense from within the counterfactual view - by which I meant that when we try to explain what counterfactuals are we inevitably find ourselves making use of the notion of counterfactuals - but perhaps you think my framing/interpretation could be improved. I think one thing that this discussion has highlighted is that I should be highlighting and paying more attention to the distinction between our primitive, intuitive notions of counterfactuals and the more formal notions that we construct. I guess another thing I find myself wondering about upon reading this approach is how the notion of fundamentality fits into a circular epistemology. I think they are compatible - one way this could occur is if some notions are outside of the loop, but are contingent on concepts that do form such a loop. Unfortunately, this is much harder to explain just via text - ie. without a diagram.
1acylhalide5mo
Yep agreed with all three paras. I'd be keen on your diagram! I'm still not sure I get the loop thing though. Maybe what you call circularity I see as coherence. Like I get that our phenomenal world helps us see the physical world, and then our models of the physical world help us understand others phenomenal experiences. (If we accept that a physical configuration of an entity does in fact determine the phenomenal experience of that entity) But I'd either interpret that as "phenomenal world is more fundamental than physical world" or "phenomenal world and physical world are both fundamental and coherent". (Idk which is a better interpretation or whether it even matters tbh.) You may also be interested in foundationalism versus coherentism [https://plato.stanford.edu/entries/justep-coherence/#CohVerFou] btw. I guess foundationalist thinking works when you have strong foundations, but if you go probing the foundations themselves you end up doing a lot more coherentist thinking. Now you can call that coherentist thinking, or circular foundationalism. Is that what you mean?
2Chris_Leong5mo
I'm not 100% sure on the definition of coherentism, but I reject attempts to define truth in terms of coherence whilst also thinking that our epistemological process should be primarily about seeking coherence (I want to leave myself an out here to acknowledge that sometimes forcing coherence can take us further away from the truth). I guess when we're searching for coherence we need to make decision about which nodes we let update other nodes, so this seems to provide room for some nodes to be considered more foundational than other nodes.
1acylhalide5mo
I see. How do you define truth then?
2Chris_Leong5mo
I think of truth in terms of correspondence. Of course, we don't actually have access to the territory.
1acylhalide5mo
Fair, so for you: truth is correspondence with what? (can I call that thing phenomenal experience?)
2Chris_Leong5mo
Phenomenal experience with external reality.
1acylhalide5mo
Also yeah sorry if I'm taking this convo on a different tangent, I don't see anything more to directly add on the topic of counterfactuals. Feel free to end convo if you feel like.
1acylhalide5mo
What if your phenomenal experience doesn't match external reality, which one decides truth? [Say you're experiencing phantom limb pain, is it a true statement that "you are in pain"]
2Chris_Leong5mo
Phenomenal experience is technically a subset of reality.
1acylhalide5mo
Sure, you can define it that way. Do let me know if you make the diagram (the epistemic justification thing on why there's circularity) - we can discuss anything further then.
1[comment deleted]5mo

I've previously argued that the concept of counterfactuals can only be understood from within the counterfactual perspective.

I think this goes too far. We can give an account of counterfactuals from assumptions of symmetry. This account is unsatisfactory in many ways - for one thing, it implies that counterfactuals exist much more rarely than we want them to. Nonetheless, it seems to account for some properties of a counterfactual and is able to stand up without counterfactual assumptions to support it. I think it also provides an interesting lens for exam... (read more)

What are you trying to get/do? I'm asking very seriously, as I can't quite tell where we land between philosophy of language, human behaviour and cognition, AI architecture or some unification problem of them.

 

From philosophy of language perspective, I personally like to argue that hypotheticals in past tense are just wrong, but are used in the same way present and future tense versions are: expressing internal belief about how causality will play out for the sake of aligning them in a group.

I'm aware of other approaches, but that has a convenient pro... (read more)

2Chris_Leong5mo
A post that attempts to evaluate the arguments for and against this principle would likely be more philosophical. A post that tried to draw out the practical consequences would tend to be more on the side of decision theory, though I expect it would involve delving into the philosophy as well.
1TAG5mo
I dont see why philosophy of language would tell you how reality works

Rejecting David Lewis’ Counterfactual Realism as absurd and therefore concluding that counterfactuals must be at least partially a human construction: either a) in the sense of them being an inevitable and essential part of how we make sense of the world by our very nature or b) in the sense of being a semi-arbitrary and contingent system that we’ve adopted in order to navigate the world

There are at least three possibilities. David Lewis level realism, where counterfactual worlds seem fully real to their inhabitants, is an extreme. Moderate realism abou... (read more)

2Chris_Leong5mo
Regarding moderate realism, if what happened didn't have to happen, then that implies that other things could have happened (these are counterfactuals). But this raises the question, what are these counterfactuals? You've already rejected Counterfactual Realism which seems to lead towards the two possibilities I suggested: a) Counterfactuals are an inevitable and essential part of how we make sense of the world by our very nature b) Counterfactuals are a semi-arbitrary and contingent system that we've adopted in order to navigate the world (Some combination of the two is another possibility.) Presumably, you don't think moderate realism leads you down this path. Where do you think it leads instead? "Even if you accept the Kantian framework, it involves N>1 basic categories" Interesting point. I'm somewhat skeptical of this, but I wouldn't completely rule it out either. (One thing I think plausible is that there could be a category A reducible to a category B which is then reducible back to A; but this wouldn't avoid the circularity) "Well, that's two examples of circular dependency" - Yes, that's what I said. I guess I'm confused why you're repeating it
1TAG5mo
I haven't rejected counterfactual realism. I've pointed out that Lewis's modal realism doesn't deal with counterfactuals as such, because it is a matter of perspective whether a world is factual (ie. contains me) or counterfactual (doesn't). What I have called moderate realism is the only position that holds counterfactuals to be both intrinsically counterfactual and real. Kantianism about counterfactuals might be true, but if it is, you are also going to have problems with causality etc. There's no special problem of counterfactuals. That's an odd thing to say. Kant lays out his categories, and there are more than one .
2Chris_Leong5mo
How so? I would have said the opposite. Yeah, if Kantianism about counterfactuals were true, it would be strange to limit it. My expectation would be that it would apply to a bunch of other things as well. Sorry, I should have been clearer. I wasn't disagreeing with there being more than one category, but your conclusion from this.
1TAG5mo
I wasn't saying that that is true per se, I was saying it's Lewis's view . Well,if you think there is a special problem with counterfactuals , then needs a basis other than general Kantian issues.
2Chris_Leong5mo
Ah, okay. I get it now.

So, this post only deals with agent counterfactuals (not environmental counterfactuals), but I believe I have solved the technical issue you mention about the construction of logical counterfactuals as it concerns TDT. See: https://www.alignmentforum.org/posts/TnkDtTAqCGetvLsgr/a-possible-resolution-to-spurious-counterfactuals

I have fewer thoughts about environmental counterfactuals but think a similar approach could be used to make statements along those lines, i.e. construct alternate agents receiving a different observation about the world. I'm not sure... (read more)

2Chris_Leong5mo
I added a comment on the post directly, but I will add: we seem to roughly agree on counterfactuals existing in the imagination in a broad sense (I highlighted two ways this can go above - with counterfactuals being an intrinsic part of how we interact with the world or a pragmatic response to navigating the world). However, I think that following this through and asking why we care about them if they're just in our imagination ends up taking us down a path where counterfactuals being circular seems plausible. On the other hand, you seem to think that this path takes us somewhere where there isn't any circularity. Anyway, that's the difference in our positions as far as I can tell from having just skimmed your link.
3JoshuaOSHickman5mo
I was attempting to solve a relatively specific technical problem related to self-proofs using counterfactuals. So I suppose I do think (at least non-circular ones) are useful. But I'm not sure I'd commit to any broader philosophical statement about counterfactuals beyond "they can be used in a specific formal way to help functions prove statements about their own output in a way that avoid Lob's Theorem issues". That being said, that's a pretty good use, if that's the type of thing you want to do? It's also not totally clear if you're imagining counterfactuals the same way I am. I am using the English term because it matches the specific thing I'm describing decently well, but the term has a broad meaning, and without having an extremely specific imagining, it's hard to make any more statements about what can be done with them.