Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I recently became unstuck on counterfactuals. I now believe that counterfactuals are confusing, in large part, because they entail preserving our values through an ontological shift[1].

In our naive ontology[2], when we are faced with a decision, we conceive of ourselves as having free will in the sense of there being multiple choices that we could actually take[3]. These choices are conceived of as actual and we when think about the notion of the "best possible choice" we see ourselves as comparing actual possible ways that the world could be. However, we when start investigating the nature of the universe, we realise that it is essentially deterministic[4] and hence that our naive ontology doesn't make sense. This forces us to ask what it means to make the "best possible choice" in a deterministic ontology where we can't literally take a choice other than the one that we make.

This means that we have to try to find something in our new ontology that roughly maps to our old one. For example, CDT pretends that at the point of the decision that we magically make a decision other than that which we actually end up making. For example, we were intending to turn left, but at the instant of the decision we're magically altered to turn right instead. This works well for many scenarios, but notably fails for Newcomb-like problems. Here updateless counterfactuals are arguably a better fit since they allow us to model the existence of a perfect predictor. However, there is still too much uncertainty about how they should be constructed for this answer to be satisfactory.

Given that we're trying to map a notion[5] onto another ontology that doesn't natively support it, it isn't that surprising there isn't an obvious, nice, neat way of doing it.

Newcomb's problem exposes the tension between the following two factors:

a) wanting to ensure that the past is the same in each counterfactual to ensure that they are comparable with each other, in the sense of it being fair to compare the counterfactuals in order to evaluate the decision
b) maintaining certain consistency conditions that are present in the problem statement[6].

The tension is as follows: If we hold the past constant between counterfactuals then we've violated the consistency conditions as we have a point in time when the agent takes a decision despite the fact that this is inconsistent with the previous time, plus the laws of physics. However, if we backpropogate the impacts of the agent's choice through time to create a consistent counterfactual, it's unclear whether the two counterfactuals are comparable since the agent is facing different scenarios, in terms of how much money is in the box.

It turns out that Newcomb's Problems is more complicated than I previously realised. In the past, I thought I had conclusively demonstrated that we should one-box by refuting the claim that one-boxing only made sense if you accepted backwards causation. However, this now seems insufficient as I haven't explained why we should maintain the consistency conditions over comparability after making the ontological shift.

In the past, I might have said that these consistency conditions are what define the problem and that if we dropped them it would no longer be Newcomb's Problem. However, that seems to take us towards a model of counterfactuals being determined by social convention, which only seems useful as a descriptive model, not a prescriptive model. My current approach now tends to put more focus on the evolutionary[7] process that created the intuitions and instincts underlying these incompatible demands as I believe that this will help us figure out the best way to stitch them together.

In any case, understanding that there is an ontological shift here seems like an important part of the puzzle. I can't exactly say what the consequences are of this yet and maybe this post is just stating the obvious, but my general intuition is that the more we explicitly label the mental moves we are making, the less likely we are to trip ourselves up. In particular, I suspect that it'll allow us to make our arguments less handwavey than they otherwise would have been. I'll try to address the different ways that we could handle this shift in the future.

Thanks to Justis for providing feedback.




  1. ^

    If we knew how to handle ontological shifts we'd be able to handle this automatically, however it is likely easier to go the other way where we figure out how to handle counterfactuals as part of our investigations into how to handle ontological shifts.

  2. ^

    I mean the way in which we naturally interface with the world when we aren't thinking philosophically or scientifically.

  3. ^

    As opposed to "being able to take this choice" being just a convenient model of the world.

  4. ^

    Quantum mechanics only shifts us from the state of the world being deterministic, to the probability distribution being deterministic. It doesn't provide scope for free will, so it doesn't avoid the ontological shift.

  5. ^

    Our naive notion of having different actual choices as opposed to this merely being a useful model.

  6. ^

     Specifically, the consistency conditions are: that a) the action taken should match the prediction of the oracle and b) the box should contain the million if and only if the oracle predicted the agent would one-box c) the each moment of time should follow from the previous given the laws of physics.

  7. ^

    Evolutionary primarily in the sense of natural selection shaping our intuitions, but without ruling out societal influences.


Ω 6

New Comment
35 comments, sorted by Click to highlight new comments since: Today at 4:33 PM

I'm guessing a good way to think about free will under determinism is with logical time that's different from physical time. The points/models that advance in logical time are descriptions of environment with different amount of detail, so that you advance in logical time by filling in more details, and sometimes it's your decisions that are filled in (at all of your instances and predictions-of simultaneously). This is different from physical time, where you fill in details in a particular way determined by laws of physics.

The ingredient of this point of view that's usually missing is that concrete models of environment (individual points of states of knowledge) should be allowed to be partial, only specify some of the data about the environment. Then, actual development of models in response to decisions is easier to see, it's not inherently a kind of illusion borne of lack of omniscience. This is in contrast to the usual expectation that the only thing with partial details is the states of knowledge about complete models of environment (with all possible details already filled in), so that partiality is built on top of lack of partiality.

The filling-in of partial models with logical time probably needs to be value-laden. Most counterfactuals are fictional, and the legible details of decision relevant fiction should preserve its moral significance. So it's veering in the direction of "social convention", though in a normative way, in the sense that value is not up for grabs. On the other hand, it's a possible way of understanding CEV as a better UDT, instead of as a separate additional construction with its own desiderata (the simulations of possible civilizations from CEV reappear in decision theory as counterfactuals developing in logical time).

Determinism doesn't seem like a central example of ontological shift, and bargaining seems like the concept of dealing with more general ontological shifts. You bargain with your variant in a different ontological context for doing valuable things. This starts with extrapolation of value to that context, so that it's not beyond the goodhart boundary, you grow confident in legible proxy goals that talk about that territory. It also seems to be a better framing for updatelessness, as bargaining among possible future epistemic states, acausal trade among them, or at least those that join the coalition of abiding by the decision of the epistemic past. This way, considering varying possible future moral states (~partial probutility functions) is more natural. The motivation to do that is so that the assumption of unchanging preference is not baked in into the decision theory, and it gets a chance of modeling mild optimization.

AFAIK the best known way of reconciling physical causality with "free will" like choice is constructor theory, which someone pointed out was similar to my critical agential approach.

No disrespect to David Deutsch, but every time I try to make sense of constructor theory I run into objections like "there are solutions to the Einstein equations, like Godel universe and Kerr black hole, that contain closed timelike curves and so are not derivable with any constructor... meaning they are "impossible". But one can happily solve the covariant version, not the ADM-decomposed version and find these solutions without any transformations required by the constructor theory." I might be missing something here... on the other hand, Deutsch insists that the double-slit experiment is evidence of MWI (it's not), so some skepticism is warranted.

I commented directly on your post.

Note the preceding

Let's first, within a critical agential ontology, disprove some very basic forms of determinism.

I'm assuming use of a metaphysics in which you, the agent, can make choices. Without this metaphysics there isn't an obvious motivation for a theory of decisions. As in, you could score some actions, but then there isn't a sense in which you "can" choose one according to any criterion.

Maybe this metaphysics leads to contradictions. In the rest of the post I argue that it doesn't contradict belief in physical causality including as applied to the self.

As in, you could score some actions, but then there isn't a sense in which you "can" choose one according to any criterion.


I've noticed that issue as well. Counterfactuals are more a convenient model/story than something to be taken literally. You've grounded decision by taking counterfactuals to exist a priori. I ground them by noting that our desire to construct counterfactuals is ultimately based on evolved instincts and/or behaviours so these stories aren't just arbitrary stories but a way in which we can leverage the lessons that have been instilled in us by evolution. I'm curious, given this explanation, why do we still need choices to be actual?

Do you think of counterfactuals as a speedup on evolution? Could this be operationalized by designing AIs that quantilize on some animal population, therefore not being far from the population distribution, but still surviving/reproducing better than average?

Speedup on evolution?

Maybe? Might work okayish, but doubt the best solution is that speculative.

Without this metaphysics there isn't an obvious motivation for a theory of decisions.

There isn't really any need for "choices", except in the sense of "internal states are also inputs that affect the outputs". Sufficiently complex agents can have one or more decision theories encoded into their internal state, and it seems like being able to communicate, evaluate, and update such theories would at least sometimes be a useful trait.

It's easy to imagine agents that can't do that, but it's more interesting (and more reflective of "intelligence") to assume they can.

I tripped pretty hard on the word "actual" there. When you are listing the options available those are fictious predictions ie very unactual.

Say you have a choice over 4 options. You are going to tell yourself 4 stories, feel about them and then enact a story which may or may not correspond to any of the previous stories.

A agent faces a choice over 0$, 50$ and 100$ and picks 50$. They are a terrible chooser. Another agent faces a choice over 0$ and 50$ and picks 50$. They are an excellent picker. If f({a,b,d}) = max({a,b,c}) good agent otherwise bad. I feel like I am not really understanding what is/was the problem here. There is a temptation to say stuff like f({a,b,c}) = a and f({a,b,c}) = b in the same breath? That the agent produces an output and that it is stable and unambigiuos just makes things straighforward.

That works if we have the counterfactuals/stories, but how do we determine what these should be? Assuming we reject modal realism, they don't directly correspond to anything real, so what should they be?

In the case that the agent generates only one story, its not really a decision point but is rather reflexive action. We could design a bad agent that when faced with a genuine decision point would just reflex through it with some predecided action. So in my mind this is turning into a question of when it is proper to drop out of reflexisive action and go through multiple stories ie how we know we are at a decision point.

If legimate (bayesian) prediction would have significant probablity mass in the future for outcomes that are widely appart in wellfare then choices could matter. If the uncertainty is due to non-self the agent should maybe be anxious but should not start to decide. If the uncertainty is due to the state of the agents actuators then decision should start. Actuator can be taken in a wide sense where everything that is influencable by the agent is an actuator. Now there is a danger that modality is just retreating to influencability. However I think that close correlation between the core self and (potential) actuator can make this issue live in the past or present rather than the future. Maybe if the nerves to your arm have just now been cut you would mistakenly take your hand to be your actuator. But if the hand has until this point obeyed your will it is prudent to make this assumption althought the agent can't actually know whether the decision will in fact be causally linked with the arm when the decision is carried out.

What your brain is causally coupled to is subject to correct and incorrect beliefs and this forms the basis of what options you have or don't have. I guess some of the decision theories could have different rationale how and why it would be prudent to produce what stories (brain that ignores its causal linkages and just picks stories in lockstep with its copies could be more steerable to those that create the copies, but I guess even then percieving what is the state of the world is a kind of causality bind).

I use the term Trivial Decision Theory Problem to refer to circumstances when an agent can only make one decision.

I guess some of the decision theories could have different rationale how and why it would be prudent to produce what stories

Yeah, my approach is to note that the stories are the result of intuitions and/or instincts that are evolved and so that they aren't purely arbitrary.

Well it helps that concept is quite well defined but I think I was focusing on other aspect.

It seems to me that in a Trivial Decision Theory Problem, the list of stories is generated but then one of the stories hogs all the pros with everything else getting all the cons.

Whereras I was thinking of situations that the agent doesn't percieve as having options ie that there is only one thinkable course of action (and whether this is due to impossibility or lack of imagination is not important).

Even in the "trivial analysis" of the examples for trivial decision problems there is a sense that 2 things are possible and a "screw this option" kind of though produced that refers to a course of events  / action that is not undertaken. This makes it a "decision point". Things that are not decisions do not involve referring to such representations. I have a little trouble coming up with a non-hardware examples but closest I have got is it being dark making humans sleepy (not sleep but have the night hormone levels). The human doesn't decide to be sleepy but its a mechanism in it. Another candidate would be visual field processing. Its more like there is visual qualia popping to conciousness rather than there being decisions on "how I should see this?", you don't have to decide to see.

Anyway the point was that even if they are crappy sotries them being stories saves us form the ontological troubles. Another attempt at construing what is the apparent contradiction to explain.

T1: "I could take candy at T20"

T2: "I could take cake at T20"

T3: "I can't take both candy and cake at the same time"

T4: "If I take candy my teeth will hurt at T25"

T5: "If I take candy stomach will hurt at T25"


T15: "Okay so take the candy"


T20: "Soon my teeth will hurt"

T21: "whoops I was supposed to pick up the candy"

The "equal modalities" view would say that its a problem that two mutually exclusive events are designated for T20. But them being stories emphasises that all of those thoughtstep are what is supposed to be singularly determined. T20 taking place is a different thing than having a thought with reference to T20 in it.

If you had a scheme like

T17: "I am hungry I need to eat"

T20: pick up candy

T20: pick up cake

T25: Feel pain in teeth

T25: Feel pain in stomach

And it actually takes place then we would in fact be in trouble with determinism if we are not picking up a superposition of candy+cake to eat. And it can seem like the previous planning is about such a world. But that ain't how the classical world is at all. Exercising choice doesn't branch you in that way. But its not like one of the stories is more priviledged than the rest bu rather the whole scheme being inapplicable.

It seems to me that in a Trivial Decision Theory Problem, the list of stories is generated but then one of the stories hogs all the pros with everything else getting all the cons.

That wasn't how I defined it. I defined it as a decision theory problem with literally one option.

The "screw this" option is available when we don't insist that an agent is actually in a situation, just that a situation be simulated.

I feel like I am having reading comprehension difficulties.

So the Triviality Perspective claims that you should one-box, but also that this is an incredibly boring claim that doesn't provide much insight into decision theory.

This passage seems a lot like applying the concept to get an answer out of two option scenario.

If you accept the premise of a perfect predictor, then seeing $1 million in the transparent box implies that you were predicted to one-box which implies that you will one-box.

This seems to me to be that one option is deemed possible and the other impossible. Deeming an option impossible is a form of "screw this". So this approach forms opinions of two counterfactuals.

A true situation that is so trivial its not a decision would be like. "You come across a box. You can take it." If we complicate it even a little bit with "You come across a box. You can take it. You can leave it be." its a non-trivial decision problem. 

It might be baked in to the paradigm of providing a decision theory that it should process and opine about all the affordances available. In a given by hypothetical situation the affordances are magically fixed. But being in a real situation part of the cognition is responcible to turning the situation into a decision moment if it is warranted.

If you come across a fork in the road one agent might process it as a decisions problem "Do I go left or right?" and another might ask "Do I go north or south?". The chopping of the situation into affordances might also be perspective relative, "Do I go left or right or turn back?" is a way to see three affordances in the same situation where another perspective would see two. An agent that just walks without pondering does not engage in deciding. The "question" of "how many affordances I should see in this situation" can be answered in a more functional manner and a less functional manner (your navigation might be greatly hampered if you can't turn on roads).

The question of counterfactuals is placed before the problem is formulated and not after it.

Quantum mechanics only shifts us from the state of the world being deterministic, to the probability distribution being deterministic. It doesn’t provide scope for free will

But counterfactuals aren't defined in terms of free will, only probability. Which is to say, that as far as everyone who is not a Yudowskian rationalist is concerned,counterfactuals aren't defined in terms of free will, only probability. Rationalists have their own problem of counterfactuals because they have their own definition of counterfactuals.

A probabilistic world, a world in which it is an objective fact that things could have happened differently, cannot be a deterministic world...even if it lacks free will. There are more than two quadrants.

It is Many Worlds that portrays a deterministic evolution of probability distributions...not quantum mechanics.

Counterfactuals are defined in terms of probability, but not of objective probability. Subjective probability is always available because subjects have limited subjective counterfactuals are always available.

Counterfactuals are defined relative to models.

If the model is based on probabilities, then its counterfactuals are defined in terms of probabilities. If the model is not based on probabilities, then its counterfactuals are not defined in terms of probabilities. If the model has something in it called "free will", then its counterfactuals will be defined in terms of "free will".

Even if it's true that counterfactuals are only defined within models, it doesn't follow that you can always define counterfactuals within any given model. A model that contains (libertarian) free will embeds possibilities/probabilities anyway ... the are doing the lifting.

So if I'm not a Yudkowskian rationalist and I want to say that if, in Game of Life, the configuration of cells had been different (so instead of configuration1, it had been configuration2), the outcome would've also been different (outcome2 instead of outcome1), that's not a counterfactual? (Since it's not defined in terms of subjective or objective probability.)

One of the problems Rationalists have with counterfactuals is motivational: why would you think of an alternate history of a game of life, when there is zero probability that it started in a different state?

So you're saying that it is a counterfactual (despite not involving subjective or objective probability), but you're saying there is a problem in nobody being motivated to think about said counterfactual?

I'm saying the Rationalists are saying that ...I don't have a problem with counterfactuals myself.

So you're neither saying it's not a counterfactual (despite it not involving either subjective or objective probability), nor you're saying there is a problem with nobody being motivated to think about them.

So what are you saying?

If you want to think about the outcomes of a a counterfactual its just a conditional whose antecedent didn't happen.

But thats not the problem Rationalists have.

If you want to think about the outcomes of a a counterfactual its just a conditional whose antecedent didn't happen.


But thats not the problem Rationalists have.

So what is the problem?

The motivational problem is "why think about alternative decisions when you could only have made one decision?".

The ontological problem is "where do counterfactuals exist?"

In a deterministic universe (the jury is still out as to whether the indeterminism of our universe impacts our decisions), free will is hidden in the other if-branches of the computation-which-is-you. It could've made another decision, but it didn't. You can imagine that as another possible world with that computation being slightly different (such that it makes another decision).

Counterfactuals don't have ontological existence. We talk about them to talk about other possible worlds which are similar to ours in some aspects and different in others.

Off course zero not being a probability means that you don't know with infinite strength which one is the real history of game of life. There is a perspective that probablities fundamentally are thinkresource allocations, giving a low probablity means you don't/shouldn't think with/about it.

The universe is probably indeterministic, in that quantum mechanics has true randomness. Or put it another way, the Heisenberg uncertainty in a quantum atom is truly randomness, rather than uncertainty. So yeah, our universe is almost certainly not deterministic, and thus you don't need to update your ontology.

[This comment is no longer endorsed by its author]Reply

Randomness does not save you in this case, since it's low-level and not agent-driven.

Unless agents are randomness driven.

Well, either randomness is inherent in everything, in which case there is nothing agenty about it, or agents have some special kind of randomness, which does not mesh with our current understanding of physics at all. 

Or free agents blend randomness and determinisn in a particular way.

See footnote 3: "Quantum mechanics only shifts us from the state of the world being deterministic, to the probability distribution being deterministic. It doesn't provide scope for free will, so it doesn't avoid the ontological shift."