Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Update: Originally I set the utility for AB or BA to -10, -10; but I've now realised that unnecessarily complicates the problem.


Logical counterfactuals (as in Functional Decision Theory) are more about your state of knowledge than the actual physical state of the universe. I will illustrate this with a relatively simple example.

Suppose there are two players in a game where each can choose A or B with the payoffs as follows for the given combinations :

AA: 10, 10

BB, AB or BA: 0, 0

Situation 1:

Suppose you are told that you will make the same decision as the other player. You can quickly conclude that A provides the highest utility.

Situation 2:

Suppose you are told that the other player chooses A. You then reason that A provides the highest utility

Generalised Situation: This situation combines elements of the previous two. Player 1 is an agent that will choose A, although this is not known by Player 2 unless option b) in the next sentence is true. Player 2 is told one of the following:

a) They will inevitably make the same decision as Player 1

b) Player 1 definitely will choose A

If Player 2 is a rational timeless agent, then they will choose A regardless of which one they are told. This means that both agents will choose A, making both a) and b) true statements.

Analysis:

Consider the Generalised Situation, where you are Player 2. Comparing the two cases, we can see that the physical situation is identical, apart from the information you (Player 2) are told. Even the information Player 1 is told is identical. But in one situation we model Player 1's decision as counterfactually varying with yours, while in the other situation, Player 1 is treated as a fixed element of the universe.

On the other hand, if you were told that the other player would choose A and that they would make the same choice as you, then the only choice compatible with that would be to choose A. We could easily end up in all kinds of tangles trying to figure out the logical counterfactuals for this situation. However, the decision problem is really just trivial in this case and the only (non-strict) counterfactual is what actually happened. There is simply no need to attempt to figure out logical counterfactuals given perfect knowledge of a situation.

It is a mistake to focus too much on the world itself as given precisely what happened all (strict) counterfactuals are impossible. The only thing that is possible is what actually happened. This is why we need to focus on your state of knowledge instead.

Resources:

A useful level distinction: A more abstract argument that logical counterfactuals are about mutations of your model rather than an attempt to imagine an external inconsistent universe.

What a reduction of "could" could look like: A conception of "could" in terms of what the agent can prove

Reducing collective rationality to individual optimization in common-payoff games using MCMC: Contains a similar game

New to LessWrong?

New Comment
26 comments, sorted by Click to highlight new comments since: Today at 9:47 AM

Yeah, I've known since my first post on formalizing UDT that counterfactuals are about what the agent will prove, not what's true. I think of them as a complicated but finite tree of tendrils, representing the theorems that the agent will prove before making a decision, living inside the infinite tree of all theorems provable from the agent's axioms. All the interesting questions are about the shape of the tree, given that the agent can affect its shape by exploring more in one direction or the other. I'd love to understand it better.

I've known since my first post on formalizing UDT that counterfactuals are about what the agent will prove, not what's true.

What's "true"? If it's things that can be knowledge, proofs are affected. Even observations may affect proof strategies, as actions in a world where you get particular observations may be monstrously more efficient to find if you take observations into account. You'd just want to coordinate with other versions of yourself that observed differently and used different proof strategies.

Yeah, I've spent the last two weeks making slowly making my way through the posts on Ambient Decision Theory. I found that absolutely fascinating, but rather difficult for me to get my head around. I guess my question was more about whether there are any other simple scenarios that demonstrate that logical counterfactuals are more about your state of knowledge than the physical state of the universe. I think that would help me understand what exactly is going on better. This one took me much longer than you'd expect to construct.

What in particular would you love to understand better?

My current guess is that logical counterfactuals are a wrong thing to focus on in decision making. They have their place in figuring out imperfect reasoning about yourself, but that's specific to coordination with imperfect reasoners of a particular kind, not decision making itself (so should probably be done on meta level, as a symbol pushing game not working on objects of knowledge for the agent).

A trick that works well for that (I'm not sure it was much discussed) is to aim to prove that you win, given provability (validity in form of box modality) of a decision you can make, instead of aiming to discover consequences of a decision (discovering and exploiting dependencies). This way, you don't deal with problematic consequences of non-actual decisions, only with the enacted requirements for an actual proof of winning. This also abstracts away from utility maximization, so there is just winning and not winning (and the issue of how to get back to maximization is more difficult than in the dependency-based approaches, it becomes more like bargaining).

I'm having difficulty following this comment. I don't know what you mean by a "symbol pushing game". Also, does box modality simply refer to modal logic?

Anyway, re: problems like Perfect Parfit's Hitchhiker, proving you win by paying still requires you to define what the predictor actually predicts. Otherwise we can't say that a never paying agent doesn't end up in town and hence win. So I don't understand how this avoids the "problematic consequences of non-actual decisions".

No one "knows" this. You can come up with a DT that optimises metaphysical neutrality, if youre interested. but it won't optimise utility. To optimise utility, you have to find the DT that is the best fit for your universe. But deciding that you value X over Y is not discovering a fact.

What do you mean by "metaphysical neutrality"?

That requires the minimum number of assumptions about the world.

The only thing that is possible is what actually happened.

Not true, for MWI believers. You experienced only one thing, but other possible things happened in other universes. In that conception, it's quite sane to talk about "possible counterfactual" as distinct from "impossible". Possible means "happens in at least one timeline", where impossible means "logically contradictory".

Note that this doesn't change your conclusion - uncertainty is still about your knowledge. It's just about your knowledge of what happens in your timeline, as well as your knowledge of which timeline you're in.

Yeah, given the tendency of multi-world to complicate things I usually ignore it and leave it up to others to figure out how to adapt my arguments to this theory.

The problem with this kind of analysis is that one is using the intuition of a physical scenario to leverage an ambiguity in what we mean by agent and decision.

Ultimately, the notion of decisions and agents are idealizations. Any actual person or AI only acts as the laws of physics dictate and agents, decisions or choices don't appear in any description in terms of fundamental physics. Since people (and programs) are complex systems that often make relatively sophisticated choices about their actions we introduce the idealization of agents and decisions.

That idealization is basically what one sees in the standard formulation of game theory in terms of trees, visibility conditions and payoffs with decisions simply being nodes on the tree and agents being a certain kind of function from visible outcomes and nodes to children of those nodes. The math is all perfectly clear and there is nothing paradoxical or troubling.

What makes it seem like there is a problem is when we redescribe the situation in terms of guarantees the other player will have predicted your choice in a certain way or the like. Formally, that doesn't really make sense...or at least it corresponds to a radically different game, e.g., restricting the tree so that only those outcomes are allowed. However, because we have this other non-formal notion of choice and agent stuck in our heads (choice is something like picking what socks to wear agent is something like a person) we don't realize that our idealization just changed drastically even though in common language we are still playing the same game.

In other words there are no extra facts to be found about which decision theory is best. There are facts about what actual physical systems will do and there are mathematical facts about trees and functions on them but there isn't any room for further facts about what kind of decision theory is the true one.

Was there also no room for the fact that VNM utility maximization is useful? I'm looking for the next step in usefulness after VNM and UDT. Or you could say I'm looking for some nice math to build into an AI. My criteria for judging that math are messy, but that doesn't mean they don't exist.

Looks like our views are not that far apart :) Your approach is way more prescriptive than mine, though. Indeed any kind of counterfactuals (or factuals for that matter, there is not much difference) are in the observer's model of the world.

There is simply no need to attempt to figure out logical counterfactuals given perfect knowledge of a situation.

Right, you enumerate the possible worlds and note which one gives best outcome. In your setup:

Situation 1 has two possible worlds, AA and BB, and the observer 1 who thinks they chose A ends up with higher utility.

Situation 2 has two possible worlds, AA and BA. if the observer 1 lives in the world where they "chose" A, they get higher utility.

Generalised Situation has the possible worlds (AA or BB) or (AA or BA), so AA, BA, BB, and, again, if the observer 1 lives in the world where they "chose" A, they end up with higher utility.

It is a mistake to focus too much on the world itself as given precisely what happened all (strict) counterfactuals are impossible. The only thing that is possible is what actually happened. This is why we need to focus on your state of knowledge instead.

I don't know if this has been discussed enough, since people are prone to the mind projection fallacy, rather vainly thinking that their models correspond to factual or counterfactual worlds, in Eliezer's worlds, branches of the MWI wave function, or Tegmark's universes. And, as you said, "We could easily end up in all kinds of tangles trying to figure out the logical counterfactuals"

How is my approach my more prescriptive than yours? Also, what do you mean by "Eliezer's worlds"?

(Ps. I asked about observer 2's behaviour, not observer 1's)

When you say something like "we need to focus on your state of knowledge instead" it is a prescription :)

Sorry if I renamed your observers, unless I misunderstood the whole setup, which is also possible.

Eliezer often writes, or used to write, something like "It may help to visualize a collection of worlds—Everett branches or Tegmark duplicates" when talking about counterfactuals.

Counterfactuals are only mind projection if there is nothing in the world corresponding to them. There is a surreptitious ontological assumption there. It is hard to see how someone could come to correct conclusions about the nature of reality by thinking about decision theiry. It is easy to see how particular decision theories embed implicit assumptions about ontology

It is a mistake to focus too much on the world itself as given precisely what happened all (strict) counterfactuals are impossible

Perfect knowledge about everything is only possible if strict determinism holds. The non existence of real ,as opposed to merely logical, counterfactuals follows trivially from determinism, but determinism is a very non trivial assumption. It is not known whether the world is deterministic, and it is a fact which needs to be established by empiricism, not surreptitiously assumed in armchair reflection on decision theory.

Perfect knowledge about everything is only possible if strict determinism holds.

What's "perfect knowledge"? If you know there are two copies of you in idential rooms, with different consequences of decisions taken there, it seems like a perfect enough description of what's going on. Determinism seems irrelevant in this sense as well, it's only superficially similar to what it takes for a decision algorithm to be runnable, so that its reasults affect things, and that property is not "determinism", just a setup with some sort of laws of computation.

Perfect knowledge about everything is only possible if strict determinism holds

It's normally quite trivial to extend results from deterministic situations to probabilistically deterministic situations. But are you concerned about the possible existence of libertarian free will?

The non existence of real ,as opposed to merely logical, counterfactuals follows trivially from determinism, but determinism is a very non trivial assumption

If we already know what decision you are going to take, we can't answers questions about what decision is best in a non-trivial sense without constructing a new situation where this knowledge has been erased.

It’s normally quite trivial to extend results from deterministic situations to probabilistically deterministic situations. But are you concerned about the possible existence of libertarian free will?

"Probablistically deterministic" means "indeterministic" for all purposes relevant to the argument. If you are forced to calculate proabilities because more than one thing can actually happen, then you are in world with real counteractuals and an open future, and it is therefore automatically false that counterfactuals are only logical.

Given a deterministic toy world that supports computation, like Game of Life, what's the right way to implement decision-making agents in that world? Keep in mind that, by the diagonal lemma, you can build a Game of Life world containing a computer that knows an exact compressed description of the whole world, including the computer.

To me, this framing shows that determinism vs nondeterminism is irrelevant to decision theory. Chris is right that any satisfactory way to handle counterfactuals should focus on what is known or can be inferred by the agent, not on whether the world is deterministic.

Determinism vs indeterminism is relevant to what you can get out of DT.: if you can make real choices, you can grab utility by choosing more optimal futures, if you can't you are stuck with predtermined utility. It's kind of true that you can define a sort of minimal outcome of DT that's available even in a determinsitic universe, but to stick to that is to give up on potential utility, and why would a rational agent want to do that?

In what kind of situation would we miss out on potential utility?

In a situation where you are not even trying to make the utility-maximising choice, but only to find out which world you are in.

So like some kind of clone situation where deterministic world views would say both must make the same decision?

Mere determinism leaves you somewhat worse off, because physics can mean you can't choose a logically possible action that yields more DT. Choosing to adopt the view that DT is about understanding the world, not grabbing utility, also leaves you worse off, even if determinism isn't true, because resources that are going into the project of understanding the world aren't going into maximising utility. One would expect a businessman to make more money than a scientist.