Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I'm finally beginning to feel that I have a clear idea of the true nature of counterfactuals. In this post I'll argue that counterfactuals are just intrinsicly a part of how we make sense of the world. However, it would be inaccurate to present them as purely a human invention as we were shaped by evolution in such a way as to ground these conceptions in reality.

Unless you're David Lewis, you're probably going to be rather dubious of the claim that all possibilities exist (ie. that counterfactuals are ontologically real). Instead, you'll probably be willing to concede that they're something we construct; that they're in the map rather than in the territory.

Things in the map are tools, they are constructed because they are useful. In other words, they are constructed for a purpose or a number of purposes. So what is the purpose (or the purposes) of counterfactuals?

I first raised this question in Counterfactuals are an Answer, Not a Question and I struggled with it for around a year. Eventually, I realised that a big part of the challenge is just how abstract the question is. So I replaced it with something more concrete: "Why don't agents construct crazy counterfactuals?" One example would be expecting the world to explode if I made this post. Another would be filling in the future with randomly generated events? What shouldn't I do either of these?

I'll make a modest claim: it's not about aesthetics. We don't construct counterfactuals because we want them to be pretty or funny or entertaining. We want them to be useful. The reason why we don't just construct counterfactuals in a silly or arbitrary manner because we believe in some vague sense that it'd lead outcomes that are sub-optimal or that in expectation it'll lead to sub-optimal outcomes.

I suspect most people will agree that the answer must be something along these lines, but I've hardly defined it very precisely. So let's attempt to clarify. To keep this discussion as general as possible, note that we could have stated similar sentiments in terms of achieving good outcomes, avoiding bad outcomes, achieving better outcomes. But regardless of how we word it, we're comparing worlds and deciding that one is better than another. It's not about just considering one world and comparing it to a standard because we can't produce such a standard without constructing a world non-identical to the first.

Essentially, we conceive of certain worlds being possible, then we consider the expected value or the median outcome or some other metric over these worlds and finally we suggest that according to this metric the agents constructing a sane theory of counterfactuals tend to do better than the agents with crazy theories.

This naturally leads to another question: what worlds should we conceive of as being possible? Again, we can make this concrete by asking what would happen if we were to choose a crazy set of possible worlds - say a world just like this one and then a world with unicorns and fountains of gold - and no other worlds. Well again, the reason why we wouldn't do this is because we'd expect an agent building its decision theory based on these possible worlds to perform poorly.

What do I mean by poorly? Well, again it seems like we're conceiving of certain worlds as possible, imagining how agents constructing their decision theory based on different notions of possibility perform in these worlds and utilising some kind of metric to evaluate performance.

So we're back we're we were before. That is, we're going around in circles. Suppose an agent that believes we should consider set W of worlds as possible and construct a decision theory based on this. Then this agent will evaluate agents who adopt W in order to develop their decision theory as making an optimal decision and they will evaluate agents who adopt a different set of worlds that leads to a different decision theory as making a sub-optimal decision, except for in the rare cases where this doesn't make a difference. In other words, such an agent will reaffirm what it already believes about what worlds are possible.

You might think that the circularity is a problem, but circular epistemology turns out to be viable (see Eliezer's Where Recursive Justification Hits Bottom). And while circular reasoning is less than ideal, if the comparative is eventually hitting a point where we can provide no justification at all, then circular justification might not seem so bad after all.

Kant theorised that certain aspects of phenomenon were the result of intrinsic ways of how we interpret the world and that it is impossible for us to step outside this perspective. He called this Transcendental Idealism and suggested that it provided a form of a priori synthetic knowledge which provided the basic assumptions we needed to begin reasoning about the world (such as causation).

My approach is slightly different as I'm using circular epistemology rather than a priori synthetic knowledge to provide a starting place for reason. By having our starting claims amenable to updating based on evidence, I avoid a particular problem in the Kantian approach that is best highlighted by Einstein's Theory of Relativity. Namely, Kant claimed that space and time existed a priori, but experimental results were able to convince us otherwise, which should not be possible with an a priori result.

However, I agree with him that certain basic concept are frames that we impose on the world due to our cognitive structure (in my case I'm focusing on the notion of possibility). I'm not picturing this as a straightjacket that is completely impossible to escape; indeed these assumptions may be subsumed by something similar as they were in the case of relatively. The point is more that to even begin reasoning we have to begin within a cognitive frame.

Imagine trying to do physics without being able to say things like, "Imagine we have a 1kg frictionless ball...", mathematics without being able to entertain the truth of a proposition that may be false or divide a problem into cases and philosophy without being allowed to do thought experiments. Counterfactuals are such a basic concept that it makes sense to believe that they - or something very much like them - are a primitive.

Another aspect that adds to the merits of this theory - it is simple enough to be plausible (this seems to me like the kind of thing that should have a simple answer), yet also complicated enough to explain why it has been surprisingly difficult to progress on.

After writing this post I found myself in a strange position. I felt certain I had dramatically improved my conceptual understanding of counterfactuals, yet at the same time I found myself struggling to understand where to go from here in order to produce a concrete theory of counterfactuals or even having trouble to articulate how it helps in this regard at all.

A big part of the challenge for me is that I have almost no idea of how we should handle circular epistemology in the general case. There are far too many different strategies you could attempt to produce something consistent. I hope to have more clarity on this in the future.

Note: I posted some additional speculation in a shortform post. I decided to separate it out as I don't feel it's as high quality as the core of the post.

  • The lack of performance metrics for CDT versus EDT, ect. - Caspar Oesterheld - This article suggests that there might be no performance metric for comparing decision theories as it may potentially be decision theory complete which I see as very similar to the claim that decision theories are circularly justified.
New Comment
18 comments, sorted by Click to highlight new comments since: Today at 4:38 AM

A "counterfactual" seems to be just any output of a model given by inputs that were not observed. That is, a counterfactual is conceptually almost identical to a prediction. Even in deterministic universes, being able to make predictions based on incomplete information is likely useful to agents, and ability to handle counterfactuals is basically free if you have anything resembling a predictive model of the world.

If we have a model that Omega's behaviour requires that anyone choosing box B must receive 10 utility, then our counterfactuals (model outputs) should reflect that. We can of course entertain the idea that Omega doesn't behave according to such a model, because we have more general models that we can specialize. We must have, or we couldn't make any sense of text such as "let's suppose Omega is programmed in such a way...". That sentence in itself establishes a counterfactual (with a sub-model!), since I have no knowledge in reality of anyone named Omega nor of how they are programmed.

We might also have (for some reason) near-certain knowledge that Amy can't choose box B, but that wasn't stated as part of the initial scenario. Finding out that Amy in fact chose box A doesn't utterly erase the ability to employ a model in which Amy chooses box B, and so asking "what would have happened if Amy chose box B" is still a question with a reasonable answer using our knowledge about Omega. A less satisfactory counterfactual question might be "what would happen if Amy chose box A and didn't receive 5 utility".

"And ability to handle counterfactuals is basically free if you have anything resembling a predictive model of the world" - ah, but a predictive model also requires counterfatuals.

[-]TAG3y130

No, prediction and counterfactuals share a common mechanism that is neutral between them.

Decision theory is about choosing possible courses of action according to their utility, which implies choosing them for, among other things, their probability. A future action is an event that has not happened yet. A past counterfactual is an event that didn't happen.There's a practical difference between the two, but they share a theoretical component.: "What would be the output given input Y". Note how that verbal formulation gives no information about whether a future or state or a counterfactuals is being considered. The black box making the calculation doesn't know whether the input its receiving represents something that will happen, or something that might have happened.

I'm puzzled that you are puzzled. JBlack's analysis, which I completely agree with, shows how and why agents with limited information consider counterfactuals. What further problems are there? Even the issue of highly atypical agents with perfect knowledge doesn't create that much of a problem, because they can just pretend to have less knowledge --build a simplified model -- in order to expand the range of non contradictory possibilities.

>Imagine trying to do physics without being able to say things like, "Imagine we have a 1kg frictionless ball...", mathematics without being able to entertain the truth of a proposition that may be false or divide a problem into cases and philosophy without being allowed to do thought experiments. Counterfactuals are such a basic concept that it makes sense to believe that they - or something very much like them - are a primitive.

In my mind, there's quite some difference between all these different types of counterfactuals. For example, consider the counterfactual question, "What would have happened if Lee Harvey Oswald hadn't shot Kennedy?" I think the meaning of this counterfactual is kind of like the meaning of the word "chair".
- For one, I don't think this counterfactual is very precisely defined. What exactly are we asked to imagine? A world that is like ours, except the laws of physics in Oswalds gun where temporarily suspended to save JFK's life? (Similarly, it is not exactly clear what counts as a chair (or to what extent) and what doesn't.)
- Second, it seems that the users of the English language all have roughly the same understanding of what the meaning of the counterfactual is, to the extent that we can use it to communicate effectively. For example, if I say, "if LHO hadn't shot JFK, US GDP today would be a bit higher than it is in fact", then you might understand that to mean that I think JFK had good economic policies, or that people were generally influenced negatively by the news of his death, or the like. (Maybe a more specific example: "If it hadn't suddenly started to rain, I would have been on time." This is a counterfactual, but it communicates things about the real world, such as: I didn't just get lost in thought this morning.) (Similarly, when you tell me to get a "chair" from the neighboring room, I will typically do what you want me to do, namely to bring a chair.)
- Third, because it is used for communication, some notions of counterfactuals are more useful than others, because they are better for transferring information between people. At the same time, usefulness as a metric still leaves enough open to make it practically and theoretically impossible to identify a unique optimal notion of counterfactuals. (Again, this is very similar to a concept like "chair". It is objectively useful to have a word for chairs. But it's not clear whether it's more useful for "chair" to include or exclude .)
- Fourth, adopting whatever notion of counterfactual we adopt for this purpose has no normative force outside of communication -- they don't interact with our decision theory or anything. For example, causal counterfactuals as advocated by causal decision theorists are kind of similar to the "If LHO hadn't shot JFK" counterfactuals. (E.g., both are happy to consider literally impossible worlds.) As you probably know, I'm partial to evidential decision theory. So I don't think these causal counterfactuals should ultimately be the guide of our decisions. Nevertheless, I'm as happy as anyone to adopt the linguistic conventions related to "if LHO hadn't shot JFK"-type questions. I don't try to reinterpret the counterfactual question as a conditional one. (Note that answers to, "how would you update on the fact that JFK survived the assassination?", would be very different from answers to the counterfactual question. ("I've been lied to all my life. The history books are all wrong.") But other conditionals could come much closer.) (Similarly, using the word "chair" in the conventional way doesn't commit one to any course of action. In principle, Alice might use the term "chair" normally, but never sit on chairs, or only sit on green chairs, or never think about the chair concept outside of communication, etc.)

So in particular, the meaning of counterfactual claims about JFK's survival don't seem necessarily very related to the counterfactuals used in decision making. (The question, "what would happen if I don't post this comment?" that I asked myself prior to posting this comment.)

In math, meanwhile, people seem to consider counterfactuals mainly for proofs by contradiction, i.e., to prove that the claims are contrary to fact. Cf. https://en.wikipedia.org/wiki/Principle_of_explosion , which makes it difficult to use the regular rules of logic to talk about counterfactuals.

Do you agree or disagree with this (i.e., with the claim that these different uses of counterfactuals aren't very closely connected)?

I agree with almost everything you've written in this comment.

I think you're right that I am brushing over some differences between different kinds of counterfactuals a bit too much here. If you know of any articles that do a good job of separating out the different kinds of counterfactuals, then that's something I'd really appreciate having.

Thinking over this again, it might have been better if I had written that (1) humans seem to have some kind of inner simulator and that (2) one of its main properties is that it is able to simulate situations that aren't actually true. In fact, if this simulator only worked for simulating the actual world, it'd be pretty useless as we are mistaken about some facts, and for other facts, we need to use our best guess.

Further, I could claim that (3) this simulator plays an incredibly fundamental and core part of human thought as opposed to say, our understanding of music or Russian history, such that it's difficult to talk about this simulator without "using its own language" so to speak.

I'm open to the possibility that we might have multiple simulators, such as counterfactual mathematical statements being handled by a different system than counterfactual statements of the world. In fact, these days I tend to make a distinction between our evolved intuitions regarding counterfactuals and the higher-level cognitive concept of counterfactuals that we've built up from these intuitions.

I also like your almost Wittgenstenian-language game model of counterfactuals. I agree that there are a lot of social conventions around how counterfactuals are constructed and that in a lot of circumstances, people aren't really aiming to produce a consistent counterfactual. Hmm... I probably needed to engage with that more for one of my most recent posts on counterfactuals. In that post, I provided an argument for consistent counterfactuals, but I'm now thinking that I need to think more about what circumstances we actually want consistency and in which circumstances we don't really care.

All this said, I don't agree with the claim that these definitions don't have very much to do with each other, although maybe you meant this in a sense other than I'm taking it here. I believe that different notions of counterfactual are likely built upon the same underlying intuitions, such that these definitions are likely to be very closely related.

The underlying thought behind both this and the previous post seems to be the notion that counterfactuals are somehow mysterious or hard to grasp. This looks like a good chance to plug our upcoming ICML paper, w

hich reduces counterfactuals to a programming language feature. It gives a new meaning to "programming Omega." http://www.zenna.org/publications/causal.pdf

In addition to self-consistency, we can also imagine agents that interact with an environment and learn about how to model the environment (thereby having an effective standard for counterfactuals - or it could be explicit if we hand-code the agents to choose actions by explicitly considering counterfactuals) by taking actions and evaluating how good their predictions are.

Interesting to read, here are a couple of comments on parts of what you say:

 

>the claim that all possibilities exist (ie. that counterfactuals are ontologically real)

'counterfactuals are ontologically real' seems like a bad way of re-expressing 'all possibilities exist'. Counterfactuals themselves are sentences or propositions, and even people who think there's e.g. no fact of the matter with many counterfactuals should agree that they themselves are real.

Secondly, most philosophers who would be comfortable with talking seriously about possibilities or possible worlds as real things would not go along with Lewis in holding them to be concrete. The view that possibilities really truly exist is quite mainstream and doesn't commit you to modal realism.

 

>what worlds should we conceive of as being possible? Again, we can make this concrete by asking what would >happen if we were to choose a crazy set of possible worlds - say a world just like this one and then a world with >unicorns and fountains of gold - and no other worlds

I think it's crucial to note that it's not the presence of the unicorns world that makes trouble here, it's the absence of all the other ones here. So what you're gesturing at here is I think the need for a kind of plenitude in the possibilities one believes in.

I think it's crucial to note that it's not the presence of the unicorns world that makes trouble here, it's the absence of all the other ones here. So what you're gesturing at here is I think the need for a kind of plenitude in the possibilities one believes in.

 

I would express this as it not making sense to include some worlds without others.
 

[-]TAG3y10

That is, we’re going around in circles. Suppose an agent that believes we should consider set W of worlds as possible and construct a decision theory based on this. Then this agent will evaluate agents who adopt W in order to develop their decision theory as making an optimal decision and they will evaluate agents who adopt a different set of worlds that leads to a different decision theory as making a sub-optimal decision,

Why? If agent B is has a different state of knowledge to A, B's set of (apparently) possible worlds will be different, but that doesn't mean it's worse. If B has more knowledge than A, it's ideas of possibility will be correspondingly better. In common sense terms, I should accept what an expert tells me about what is and isn't possible. An agent should not regard itself as having the best decision theory because it should not regard itself as omniscient.

You might think that the circularity is a problem, but circular epistemology turns out to be viable (see Eliezer’s Where Recursive Justification Hits Bottom).

No, the usual objections remain. If I say "you owe me a £1000 because you owe me £1000", you would not accept that as validly justified.

Edit. IE , there are an infinite number of circular arguments, most if which you still reject.

It's not clear whether Eliezer’s Where Recursive Justification Hits Bottom is supposed to be a defense of circular justification, and it's not clear how it works if it is. One could reconstruct the argument as "we have knowledge, and it's not foundationally justified, so it's circularly justified", but that depends on our having validly justified knowledge, and on circular justification being the only alternative to foundational justification. And on circular justification actually being feasible.

And yet, despite epistemic circularity being our epistemic reality up to our circularly reasoned limited ability to assess that this is in fact the situation we find ourselves in, we manage to reason anyway.

[-]TAG3y10

We can only short circuit the various circularities, and directly demonstrate our ability to reason successfully, by using paragmatism and prediction...and that is only applicable to reasoning in some domains. The areas where they don't work coincide with philosophical concerns.

The areas where they don't work coincide with philosophical concerns.

As always, this is an interesting topic, because many of the philosophical concerns I can think of here end up being questions about metaphysics (i.e. what is the nature of stuff that lies beyond your epistemic ability to resolve the question) and I think there's some reason perspective by which you might say that metaphysics "doesn't matter", i.e. it's answers to questions that, while interesting, knowing the answer to them doesn't change what actions you take in the world because we already can know enough to figure out practical answers that serve our within-world purposes.

[-]TAG3y10

It all depends on what you value. If you personally value knowing what things really are, then adopting instrumentalism or pragmatism will lose you some potential value.

I argue that for this it doesn't, i.e. my case for how the problem of the criterion gets resolved is that you can't help but be pragmatic because that's a description of how epistemology is physically instantiated in our universe. The only thing you might lose value on is if you have some desire to resolve metaphysical questions and you stop short of resolving them then of course you will fail to receive the full value possible because you didn't get the answer. I argue that getting such answers is impossible, but nonetheless trying to find them may be worthwhile to someone.

[-]TAG3y30

. I argue that getting such answers is impossible,

Ok,but meta level arguments ar still subject to the problen of the criterion.

[-]TAG3y10

what worlds should we conceive of as being possible?

  1. All realistic agents have finite and imperfect knowledge.

  2. Therefore, for any one agent, there is a set of counterfactual claims that are crazy in the sense of contradicting what they already know.

  3. Likewise, for any one agent, there is a set of counterfactual claims that are sane in the sense of not contradicting what they already know.

[-]TAG3y10

And we should believe in the possibility of whatever is actually possible, but that's easier said than done