Counterfactuals are an Answer, Not a Question

AI Alignment Forum

Ω 7

I'm going to subtly contradict my last post on logical counterfactuals. I now think that raw or actually-consistent counterfactuals are just an especially useful model of what counterfactuals are as opposed to the be all and end all.

An Old Classic

You may have heard this one before: The Ship of Theseus is a very old ship, so old that all the original parts had been replaced. Is it still the Ship of Theseus? What if we gathered all of the original parts and used them to rebuild the ship? Would this new construct be the Ship of Theseus now?

If you've internalised the idea of the map and the territory, the answer should be clear. Terms like "The Ship of Theseus" are human constructions with no real existence. They can be defined in whatever manner is most useful.

Counterfactuals Don't Exist in the Universe

So let's start with regular counterfactuals. What are they? Do they exist in the universe itself? Unless you're a modal realist (ie. David Lewis and his clones) the answer is no. Given the exact state of universe and an agent, the agent can only make one decision*. In other words, counterfactuals are something we construct.

This means that it is a mistake to search for an objectively true definition. Just as in the Ship of Theseus, our expectation should be that multiple definitions could have value. Of course, some definitions may be more useful or natural than others and indeed, as I've argued, Raw Counterfactuals are particularly natural. However, I wouldn't go as far as I went in that post where I argued that insofar any other kind of counterfactual had meaning, it was derived from Raw Counterfactuals.

Logical counterfactuals are no different. We may not know everything about logic, but this doesn't mean that logic could have been different. We can construct a model where we imagine that logic being different than it is, but there isn't a fact of the matter about how logic would be if 1+1=3 instead of 2 that exists as part of standard logic without any extensions, any more than counterfactuals exist in-universe without any extensions.

Since counterfactuals don't have an objectively true definition, asking, "What are counterfactuals?" is a confused question. It'd be better to ask, "What definitions of counterfactuals are useful?", but then this is still kind of vague. A better answer is to figure out the kinds of questions that we tend to want to answer when counterfactuals arise. This will vary, but in general counterfactuals relate to problems that can be solved via iteration.

An Initial Question

Raw counterfactuals are especially important because they answer a particularly common kind of question related to partial knowledge. One common simplification is to assume that we can enumerate all the states of the universe that would match our given knowledge. Since states are only valid if they are consistent, the states we enumerate will be raw counterfactuals. Indeed, as I previously argued, most uses of CDT-style counterfactuals are justified in terms of how they approximate raw counterfactuals. Pretending an agent's magically decides to turn left instead of right at the last moment, isn't that different from assuming that the universe was slightly different so that agent was always going to turn left, but that this change wasn't ever going to affect anything else.

When the Story Breaks

So why doesn't the story end here? Why might we desire other notions of counterfactual? I won't claim that all of the following situations would lead to an alternative notion of counterfactual, but I wouldn't want to rule it out.

Firstly, the assumption in the previous question is that we already have abstracted out a world model; that is, for any observation, we have an idea of what worlds are possible and other relevant facts like probabilities. While this describes ideal Bayesian agents, it doesn't really describe more realistic agents who choose their model based at least somewhat upon their observations.

Secondly, we might sometimes want to consider the possibility that our logical assumptions or deductions might be incorrect. This is possible, but tricky, since as if our logic is incorrect, we'll be using our flawed logic to determine how to handle this issues.

Thirdly, it's not always immediately obvious whether a particular state is consistent or not. For example, if you know that you're a utility maximiser, then it would be inconsistent for you to pick any option that provides suboptimal utility, but if you knew the utility was suboptimal you wouldn't need to examine that option. It would often be useful to be able to assign a utility value to inconsistent situations so we could use the common technique of iterating over a list of possibly-inconsistent situations and pick whichever was assigned the highest utility.

Ambient Decision Theory (much of which has been wrapped into FDT) takes advantage of this. You consider various counterfactuals (X chooses action1, action2, ect.), only one of which should be consistent given the fact that the program is a utility maximiser. The hope is that the only consistent result will represent the best move and it will in the right circumstance.

Inconsistency

This last strategy is neat, but it faces significant challenges. For one, what does it mean to say that when we choose the only consistent result we choose the best outcome? "Best" requires a comparison, which requires a value. In other words, we have assumes a way of assigning utility to inconsistent situations. What does this mean?

From a logically viewpoint, if we ever have P and Not P, then we have a contradiction and can prove anything. So we shouldn't strictly view these inconsistent situations as logical entities, at least in the classical sense. Instead, it may be more useful to view them as a datastructure and the utility function as something that extracts a particular element or elements, then runs a regular utility function over these.

For example, suppose we have:

Viewed as logical propositions, we can prove that the mouse is also dead from the contradiction, but viewed as a datastructure, the mouse is only alive. This doesn't perfectly map to the situation where you're given an inconsistent list of logical propositions and told to look for contradictions, but it is close enough to be analogous as in either case, the inconsistent object is more a pseudo-logical object than a logical object. Similarly, we could imagine assigned a utility based upon the consistent part of such a datastructure.

So given these datastructures, we can define a notion of "best" for inconsistent situations, but this begs the question, "Why do we care about the best such (inconsistent) datastructure?" Do we only care about these inconsistent objects insofar as they stand in for an actually consistent situation? I suspect the answer could be no and I don't currently have a very good answer for this, but I'm hoping to have one soon. In any case, my previous stance completely dismissing these possibilities without solid reasons was too absolutionist for me to still maintain it.

*Quantum mechanics allows this to be a probability distribution, but then it's just probabilistically deterministic instead, so it only complicates the issue without really changing anything

Ω 7

New Comment

Those of a Bayesian leaning will tend to say things like "probability is subjective", and claim this is an important insight into the nature of probability -- one might even go so far as to say "probability is an answer, not a question". But this doesn't mean you can believe what you want; not exactly. There are coherence constraints. So, once we see that probability is subjective, we can then seek a theory of the subjectivity, which tells us "objective" information about it (yet which leaves a whole lot of flexibility).

The same might be true of counterfactuals. I personally lean toward the position that the constraints on counterfactuals are that they be consistent with evidential predictions, but I don't claim to be unconfused. My position is a "counterfactuals are subjective but have significant coherence constraints" type position, but (arguably) a fairly minimal one -- the constraint is a version of "counterfacting on what you actually did should yield what actually happened", one of the most basic constraints on what counterfactuals should be.

On the other hand, my theory of counterfactuals is pretty boring and doesn't directly solve problems -- it more says "look elsewhere for the interesting stuff".

Edit --

Oh, also, I wanted to pitch the idea that counterfactuals, like a whole bunch of things, should be thought of as "constructed rather than real". This is subtly different from "subjective". We humans are pretty far along in an ongoing process of figuring out how to be and act in the world. Sometimes we come up with formal theories of things like probability, utility, counterfactuals, and logic. The process of coming up with these formal theories informs our practice. Our practice also informs the formal theories. Sometimes a theory seems to capture what we wanted really nicely. My argument is that in an important sense we've invented, not discovered, what we wanted.

So, for example, utility functions. Do utility functions capture human preferences? No, not really, they are pretty far from preferences observed in the wild. However, we're in the process of figuring out what we prefer. Utility functions capture some nice ideas about idealized preferences, so that when we're talking about idealized versions of what we want (trying to figure out what we prefer upon reflection) it is (a) often pretty convenient to think in terms of utilities, and (b) somewhat difficult to really escape the framework of utilities. Similarly for probability and logic as formal models of idealized reasoning.

So, just as utility functions aren't really out there in the world, counterfactuals aren't really out there in the world. But just as it might be that we should think about our preferences in terms of utility anyway (...or maybe abandon utility in favor of better theoretical tools), we might want to equip our best world-model with counterfactuals anyway (...or abandon them in favor of better theoretical tools).

Hopefully I tie up my old job soon so that I can dive deeper into Agent Foundations, including your sequence on CDT=EDT.

Anyway, I'm slightly confused by your comment, because I get the impression you think there is more divergence between our ideas than I think exists. When you talk about it being constructed rather than real, it's very similar to what i meant when I (briefly) noted that some definitions are more natural than others (https://www.lesswrong.com/posts/peCFP4zGowfe7Xccz/natural-structures-and-definitions).

It's on this basis that I argue raw counterfactuals are particularly important, as opposed to before when I was arguing that all other definitions of counterfactuals need to be justified in terms of them.

Anyway, the next step for me will probably be to look at the notions of counterfactual that exist and try to see which ones, if any, aren't ultimately relying on raw counterfactuals to justify their value.

"counterfacting on what you actually did should yield what actually happened" - what do you mean by this? I can think of one definition where this is pretty much trivial and another where it is essentially circular

[-]TAG20

So let’s start with regular counterfactuals. What are they? Do they exist in the universe itself? Unless you’re a modal realist (ie. David Lewis and his clones) the answer is no. Given the exact state of universe and an agent, the agent can only make one decision*.

Yudkowsky and Deutsch's MWI has exactly the same implications as Lewisian modal realism in this regard. (One of the basic problems with this ).

Quantum mechanics allows this to be a probability distribution, but then it’s just probabilistically deterministic instead, so it only complicates the issue without really changing anything

It changes something: it allows real counterfactuals in a single universe. There is a (contentious) argument that free will based on quantum indeterminism isn't free enough to be considered genuine free will because it is still constrained to probability distributions...but , whether true or false, that isn't relevant to the topic at hand, the existence of real counterfactuals. Real counterfactuals don't depend purely on some fairly "thick" concept of free will, they can follow from indeterminism alone. If it really was the case that a coin toss could have gone differently , then the case where it did is a real counterfactual.

(But not an actual counterfactual. MR and MW have the further implication that all outcomes are actual from their own perspective).

P.S If the overall point is that there is not and should be a single notion counterfactuals, I would agree .. so long as real counterfactuals are included!

We can construct a model where we imagine that logic being different than it is, but there isn't a fact of the matter about how logic would be if 1+1=3 instead of 2 that exists as part of standard logic without any extensions, any more than counterfactuals exist in-universe without any extensions.

Aside from a world where "3" denoted 2, I'm unclear on any aspect of how such a model would work.

[This comment is no longer endorsed by its author]Reply

If you are trying to reach those who think that "If Oswald had not shot Kennedy, then someone else would have" is not a confused question, good luck!

[-]TAG10

Well, it's a statement, not a question.

[+][comment deleted]10