Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I recently became unstuck on counterfactuals. I now believe that counterfactuals are confusing, in large part, because they entail preserving our values through an ontological shift[1].

In our naive ontology[2], when we are faced with a decision, we conceive of ourselves as having free will in the sense of there being multiple choices that we could actually take. These choices are conceived of as actual and we when think about the notion of the "best possible choice" we see ourselves as comparing actual possible ways that the world could be. However, we when start investigating the nature of the universe, we realise that it is essentially deterministic[3] and hence that our naive ontology doesn't make sense. This forces us to ask what it means to make the "best possible choice" in a deterministic ontology where we can't literally take a choice other than the one that we make.

This means that we have to try to find something in our new ontology that roughly maps to our old one. For example, CDT pretends that at the point of the decision that we magically make a decision other than that which we actually end up making. For example, we were intending to turn left, but at the point of the decision we're magically altered to turn right instead. This works well for many scenarios, but notably fails for Newcomb's problem and similar. Here updateless counterfactuals are arguably a better fit since they allow us to model the existence of a perfect predictor. However, there is still too much uncertainty about how they should be constructed for this answer to be satisfactory.

Given that we're trying to map a notion[4] onto another ontology that doesn't natively support it, it isn't that surprising there isn't an obvious, nice, neat way of doing it.

Newcomb's problem exposes the tension between the following two factors:

a) wanting to ensure that the past is the same in each counterfactual to ensure that they are comparable with each other, in the sense of it being fair to compare the counterfactuals in order to evaluate the decision
b) maintaining certain consistency conditions that are present in the problem statement[5].

The tension is as follows: If we hold the past constant between counterfactuals then we've violated the consistency conditions as we have a point in time when the agent takes a decision despite the fact that this is inconsistent with the previous time, plus the laws of physics. However, if we backpropogate the impacts of the agent's choice through time to create a consistent counterfactual, it's unclear whether the two counterfactuals are comparable since the agent is facing different scenarios, in terms of how much money is in the box.

It turns out that Newcomb's Problems is more complicated than I previously realised. In the past, I thought I had conclusively demonstrated that we should one-box by refuting the claim that one-boxing only made sense if you accepted backwards causation. However, this now seems insufficient as I haven't explained why we should maintain the consistency conditions over comparability after making the ontological shift.

In the past, I might have said that these consistency conditions are what define the problem and that if we dropped them it would no longer be Newcomb's Problem. However, that seems to take us towards a model of counterfactuals being determined by social convention, which only seems useful as a descriptive model, not a prescriptive model. My current approach now tends to put more focus on the evolutionary process that created the intuitions and instincts underlying these incompatible demands as I believe that this will help us figure out the best way to stitch them together.

In any case, understanding that there is an ontological shift here seems like an important part of the puzzle. I can't exactly say what the consequences are of this yet and maybe this post is just stating the obvious, but my general intuition is that the more we explicitly label the mental moves we are making, the less likely we are to trip ourselves up. In particular, I suspect that it'll allow us to make our arguments less handwavey than they otherwise would have been. I'll try to address the different ways that we could handle this shift in the future.

Thanks to Justis for providing feedback.

  1. ^

    If we knew how to handle ontological shifts we'd be able to handle this automatically, however it is likely easier to go the other way where we figure out how to handle counterfactuals as part of our investigations into how to handle ontological shifts.

  2. ^

    I mean the way in which we naturally interface with the world when we aren't thinking philosophically or scientifically.

  3. ^

    Quantum mechanics only shifts us from the state of the world being deterministic, to the probability distribution being deterministic. It doesn't provide scope for free will, so it doesn't avoid the ontological shift.

  4. ^

    Our naive notion of having different actual choices as opposed to this merely being a useful model.

  5. ^

     Specifically, the consistency conditions are: that a) the action taken should match the prediction of the oracle and b) the box should contain the million if and only if the oracle predicted the agent would one-box c) the each moment of time should follow from the previous given the laws of physics.

13

Ω 6

12 comments, sorted by Click to highlight new comments since: Today at 3:41 AM
New Comment

I'm guessing a good way to think about free will under determinism is with logical time that's different from physical time. The points/models that advance in logical time are descriptions of environment with different amount of detail, so that you advance in logical time by filling in more details, and sometimes it's your decisions that are filled in (at all of your instances and predictions-of simultaneously). This is different from physical time, where you fill in details in a particular way determined by laws of physics.

The ingredient of this point of view that's usually missing is that concrete models of environment (individual points of states of knowledge) should be allowed to be partial, only specify some of the data about the environment. Then, actual development of models in response to decisions is easier to see, it's not inherently a kind of illusion borne of lack of omniscience. This is in contrast to the usual expectation that the only thing with partial details is the states of knowledge about complete models of environment (with all possible details already filled in), so that partiality is built on top of lack of partiality.


The filling-in of partial models with logical time probably needs to be value-laden. Most counterfactuals are fictional, and the legible details of decision relevant fiction should preserve its moral significance. So it's veering in the direction of "social convention", though in a normative way, in the sense that value is not up for grabs. On the other hand, it's a possible way of understanding CEV as a better UDT, instead of as a separate additional construction with its own desiderata (the simulations of possible civilizations from CEV reappear in decision theory as counterfactuals developing in logical time).

Determinism doesn't seem like a central example of ontological shift, and bargaining seems like the concept of dealing with more general ontological shifts. You bargain with your variant in a different ontological context for doing valuable things. This starts with extrapolation of value to that context, so that it's not beyond the goodhart boundary, you grow confident in legible proxy goals that talk about that territory. It also seems to be a better framing for updatelessness, as bargaining among possible future epistemic states, acausal trade among them, or at least those that join the coalition of abiding by the decision of the epistemic past. This way, considering varying possible future moral states (~partial probutility functions) is more natural. The motivation to do that is so that the assumption of unchanging preference is not baked in into the decision theory, and it gets a chance of modeling mild optimization.

AFAIK the best known way of reconciling physical causality with "free will" like choice is constructor theory, which someone pointed out was similar to my critical agential approach.

No disrespect to David Deutsch, but every time I try to make sense of constructor theory I run into objections like "there are solutions to the Einstein equations, like Godel universe and Kerr black hole, that contain closed timelike curves and so are not derivable with any constructor... meaning they are "impossible". But one can happily solve the covariant version, not the ADM-decomposed version and find these solutions without any transformations required by the constructor theory." I might be missing something here... on the other hand, Deutsch insists that the double-slit experiment is evidence of MWI (it's not), so some skepticism is warranted.

I tripped pretty hard on the word "actual" there. When you are listing the options available those are fictious predictions ie very unactual.

Say you have a choice over 4 options. You are going to tell yourself 4 stories, feel about them and then enact a story which may or may not correspond to any of the previous stories.

A agent faces a choice over 0$, 50$ and 100$ and picks 50$. They are a terrible chooser. Another agent faces a choice over 0$ and 50$ and picks 50$. They are an excellent picker. If f({a,b,d}) = max({a,b,c}) good agent otherwise bad. I feel like I am not really understanding what is/was the problem here. There is a temptation to say stuff like f({a,b,c}) = a and f({a,b,c}) = b in the same breath? That the agent produces an output and that it is stable and unambigiuos just makes things straighforward.

Quantum mechanics only shifts us from the state of the world being deterministic, to the probability distribution being deterministic. It doesn’t provide scope for free will

But counterfactuals aren't defined in terms of free will, only probability. Which is to say, that as far as everyone who is not a Yudowskian rationalist is concerned,counterfactuals aren't defined in terms of free will, only probability. Rationalists have their own problem of counterfactuals because they have their own definition of counterfactuals.

A probabilistic world, a world in which it is an objective fact that things could have happened differently, cannot be a deterministic world...even if it lacks free will. There are more than two quadrants.

It is Many Worlds that portrays a deterministic evolution of probability distributions...not quantum mechanics.

Counterfactuals are defined in terms of probability, but not of objective probability. Subjective probability is always available because subjects have limited knowledge..so subjective counterfactuals are always available.

Counterfactuals are defined relative to models.

If the model is based on probabilities, then its counterfactuals are defined in terms of probabilities. If the model is not based on probabilities, then its counterfactuals are not defined in terms of probabilities. If the model has something in it called "free will", then its counterfactuals will be defined in terms of "free will".

So if I'm not a Yudkowskian rationalist and I want to say that if, in Game of Life, the configuration of cells had been different (so instead of configuration1, it had been configuration2), the outcome would've also been different (outcome2 instead of outcome1), that's not a counterfactual? (Since it's not defined in terms of subjective or objective probability.)

The universe is probably indeterministic, in that quantum mechanics has true randomness. Or put it another way, the Heisenberg uncertainty in a quantum atom is truly randomness, rather than uncertainty. So yeah, our universe is almost certainly not deterministic, and thus you don't need to update your ontology.

Randomness does not save you in this case, since it's low-level and not agent-driven.

Unless agents are randomness driven.

Well, either randomness is inherent in everything, in which case there is nothing agenty about it, or agents have some special kind of randomness, which does not mesh with our current understanding of physics at all. 

Or free agents blend randomness and determinisn in a particular way.

New to LessWrong?