We know that physics does not support the idea of metaphysical free will. By metaphysical free will I mean the magical ability of agents to change the world by just making a decision to do so.
According to my understanding of the ordinary, everyday, non-magical meanings of the words "decide", "act", "change", etc., we do these things all the time. So do autonomous vehicles, for that matter. So do cats and dogs. Intention, choice, and steering the world into desired configurations are what we do, as do some of our machines.
It is strange that people are so ready to deny these things to people, when they never make the same arguments about machines. Instead, for example, they want to know what a driverless car saw and decided when it crashed, or protest that engine control software detected when it was under test and tuned the engine to misleadingly pass the emissions criteria. And of course there is a whole mathematical field called "decision theory". It's about decisions.
After all, what's the point of making decisions if you are just a passenger spinning a fake steering wheel not attached to any actual wheels?
The simile contradicts yo...
Great post overall, you're making interesting points!
Couple of comments:
There are 8 possible worlds here, with different utilities and probabilities
Some decision theorists tend to get confused over this because they think of this magical thing they call "causality," the qualia of your decisions being yours and free, causing the world to change upon your metaphysical command. They draw fancy causal graphs like this one:
That seems like an unfair criticism of the FDT paper. Drawing such a diagram doesn't imply one believes causality to be magic any more than making your table of possible worlds.
Specifically, the diagrams in the FDT paper don't say decisions are "yours and free", at least if I understand you correctly. Your decisions are caused by your decision algorithm, which in some situations is implemented in other agents as well.
This seems to cut through a lot of confusion present in decision theory, so I guess the obvious question to ask is why don't we already work things this way instead of the way they are normally approached in decision theory?
To the extent that this approach is a decision theory, it is some variant of UDT (see this explanation). The problems with applying and formalizing it are the usual problems with applying and formalizing UDT:
“I am a one-boxer” and “I am a two-boxer” are both possible worlds, and by watching yourself work through the problem you learn in which world you live. Maybe I misunderstand what you are saying though.
The interesting formal question here is: given a description of the world you are in (like the descriptions in this post), how do you enumerate the possible worlds? A solution to this problem would be very useful for decision theory.
If an agent knows its source code, then "I am a one-boxer" and "I am a two-boxer" could be taken to refer to currently-unknown logical facts about what its source code outputs. You could be proposing a decision theory whereby the agent uses some method for reasoning about logical uncertainty (such as enumerating logical worlds), and selects the action such that its expected utility is highest conditional on the event that its source code outputs this action. (I am not actually sure exactly what you are proposing, this is just a guess).
If the logical uncertainty is represented by a logical inductor, then this decision theory is called "LIEDT" (logical inductor EDT) at MIRI, and it has a few problems, as explained in this p...
I note here that simply enumerating possible worlds evades this problem as far as I can tell.
The analogous unfair decision problem would be "punish the agent if they simply enumerate possible worlds and then choose the action that maximizes their expected payout". Not calling something a decision theory doesn't mean it isn't one.
Great post!
I have a question, though, about the “adversarial predictor” section. My question is: how is world #3 possible? You say:
- Agent uses DT1 when rewarded for using DT1 and DT2 when rewarded for using DT2
However, the problem statement said:
Imagine I have a copy of Fiona, and I punish anyone who takes the same action as the copy.
Are we to suppose that the copy of Fiona that the adversarial predictor is running does not know that an adversarial predictor is punishing Fiona for taking certain actions, but that the actual-Fiona does know this, ...
Hey, noticed what might be errors in your lesion chart: No lesion, no cancer should give +1m utils in both cases. And your probabilities don't add to 1. Including p(lesion) explicitly doesn't meaningfully change the EV difference, so eh. However, my understanding is that the core of the lesion problem is recognizing that p(lesion) is independent of smoking; EYNS seems to say the same. Might be worth including it to make that clearer?
(I don't know much about decision theory, so maybe I'm just confused.)
Assuming that an agent who doesn't have the lesion gains no utility from smoking OR from having cancer changes the problem.
But apart from that, this post is pretty good at explaining how to approach these problems from the perspective of Timeless Decision Theory. Worth reading about it if you aren't familiar.
Also, is generally agreed that in a deterministic world we don't really make decisions as per libertarian free will. The question is then how to construct the counterfactuals for the decision problem. I'm in agreement with you TDT is much more consistent as the counterfactuals tend to describe actually consistent worlds.
From Arif Ahmed's Evidence, Decision and Causality (ch. 5.4, p. 142-143; links mine):
Deliberating agents should take their choice to be between worlds that differ over the past as well as over the future. In particular, they differ over the effects of the present choice but also over its unknown causes. Typically these past differences will be microphysical differences that don’t matter to anyone. But in Betting on the Past they matter to Alice.
. . .
...On this new picture, which arises naturally from [evidential decision theory]. . ., it is misleading t
I'm slightly confused. Is it that we're learning about which world we are in or, given that counterfactuals don't actually exist, are we learning what our own decision theory is given some stream of events/worldline?
The compatibilist concept of free will is practical. It tells you under which circumstances someone can be held legally or ethically responsible. It does not require global additions about how the laws of the universe work. Only when compatibilist free will is asserted as being the only kind does it become a metaphysical claim, or rather an anti metaphysical one. The existence of compatibilist free will isn't worth arguing about: it's designed to be compatible with a wide variety of background assumptions.
Magical, or "counter causal" free will is...
We know that physics does not support the idea of metaphysical free will. By metaphysical free will I mean the magical ability of agents to change the world by just making a decision to do so. To the best of our knowledge, we are all (probabilistic) automatons who think themselves as agents with free choice
If a probablistic agent can make a decision that is not fully determined by previous events, then the consequences of that decision trace back to the agent, as a whole system, and no further. That seems to support a respectable enough version of "...
Again, this is just a calculation of expected utilities, though an agent believing in metaphysical free will may take it as a recommendation to act a certain way.
Are you not recommending agents to act in a certain way? You are answering questions from EYNS of the form "Should X do Y?", and answers to such questions are generally taken to be recommendations for X to act in a certain way. You also say things like "The twins would probably be smart enough to cooperate, at least after reading this post" which sure sounds like a recommendation of cooperation (if they do not cooperate, you are lowering their status by calling them not smart)
Cross-posted from my blog.
Epistemic status: Probably discussed to death in multiple places, but people still make this mistake all the time. I am not well versed in UDT, but it seems along the same lines. Or maybe I am reinventing some aspects of Game Theory.
We know that physics does not support the idea of metaphysical free will. By metaphysical free will I mean the magical ability of agents to change the world by just making a decision to do so. To the best of our knowledge, we are all (probabilistic) automatons who think themselves as agents with free choices. A model compatible with the known laws of physics is that what we think of as modeling, predicting and making choices is actually learning which one of the possible worlds we live in. Think of it as being a passenger in a car and seeing new landscapes all the time. The main difference is that the car is invisible to us and we constantly update the map of the expected landscape based on what we see. We have a sophisticated updating and predicting algorithm inside, and it often produces accurate guesses. We experience those as choices made. As if we were the ones in the driver's seat, not just the passengers.
Realizing that decisions are nothing but updates, that making a decision is a subjective experience of discovering which of the possible worlds is the actual one immediately adds clarity to a number of decision theory problems. For example, if you accept that you have no way to change the world, only to learn which of the possible worlds you live in, then the Newcomb's problem with a perfect predictor becomes trivial: there is no possible world where a two-boxer wins. There are only two possible worlds, one where you are a one-boxer who wins, and one where you are a two-boxer who loses. Making a decision to either one-box or two-box is a subjective experience of learning what kind of a person are you, i.e. what world you live in.
This description, while fitting the observations perfectly, is extremely uncomfortable emotionally. After all, what's the point of making decisions if you are just a passenger spinning a fake steering wheel not attached to any actual wheels? The answer is the usual compatibilism one: we are compelled to behave as if we were making decisions by our built-in algorithm. The classic quote from Ambrose Bierce applies:
"There's no free will," says the philosopher; "To hang is most unjust."
"There is no free will," assents the officer; "We hang because we must."
So, while uncomfortable emotionally, this model lets us make better decisions (the irony is not lost on me, but since "making a decision" is nothing but an emotionally comfortable version of "learning what possible world is actual", there is no contradiction).
An aside on quantum mechanics. It follows from the unitary evolution of the quantum state, coupled with the Born rule for observation, that the world is only predictable probabilistically at the quantum level, which, in our model of learning about the world we live in, puts limits on how accurate the world model can be. Otherwise the quantum nature of the universe (or multiverse) has no bearing on the perception of free will.
Let's go through the examples some of which are listed as the numbered dilemmas in a recent paper by Eliezer Yudkowsky and Nate Soares, Functional decision theory: A new theory of instrumental rationality. From here on out we will refer to this paper as EYNS.
Psychological Twin Prisoner’s Dilemma
An agent and her twin must both choose to either “cooperate” or “defect.” If both cooperate, they each receive $1,000,000. If both defect, they each receive $1,000. If one cooperates and the other defects, the defector gets $1,001,000 and the cooperator gets nothing. The agent and the twin know that they reason the same way, using the same considerations to come to their conclusions. However, their decisions are causally independent, made in separate rooms without communication. Should the agent cooperate with her twin?
First we enumerate all the possible worlds, which in this case are just two, once we ignore the meaningless verbal fluff like "their decisions are causally independent, made in separate rooms without communication." This sentence adds zero information, because the "agent and the twin know that they reason the same way", so there is no way for them to make different decisions. These worlds are
There is no possible world, factually or counterfactually, where one twin cooperates and the other defects, no more than there are possible worlds where 1 = 2. Well, we can imagine worlds where math is broken, but they do not usefully map onto observations. The twins would probably be smart enough to cooperate, at least after reading this post. Or maybe they are not smart enough and will defect. Or maybe they hate each other and would rather defect than cooperate, because it gives them more utility than money. If this was a real situation we would wait and see which possible world they live in, the one where they cooperate, or the one where they defect. At the same time, subjectively to the twins in the setup it would feel like they are making decisions and changing their future.
The absent-minded Driver problem:
An absent-minded driver starts driving at START in Figure 1. At X he can either EXIT and get to A (for a payoff of 0) or CONTINUE to Y. At Y he can either EXIT and get to B (payoff 4), or CONTINUE to C (payoff 1). The essential assumption is that he cannot distinguish between intersections X and Y, and cannot remember whether he has already gone through one of them.
There are three possible worlds here, A, B and C, with utilities 0, 4 and 1 correspondingly, and by observing the driver "making a decision" we learn which world they live in. If the driver is a classic CDT agent, they would turn and end up at A, despite it being the lowest-utility action. Sucks to be them, but that's their world.
The Smoking Lesion Problem
An agent is debating whether or not to smoke. She knows that smoking is correlated with an invariably fatal variety of lung cancer, but the correlation is (in this imaginary world) entirely due to a common cause: an arterial lesion that causes those afflicted with it to love smoking and also (99% of the time) causes them to develop lung cancer. There is no direct causal link between smoking and lung cancer. Agents without this lesion contract lung cancer only 1% of the time, and an agent can neither directly observe, nor control whether she suffers from the lesion. The agent gains utility equivalent to $1,000 by smoking (regardless of whether she dies soon), and gains utility equivalent to $1,000,000 if she doesn’t die of cancer. Should she smoke, or refrain?
The problem does not specify this explicitly, but it seems reasonable to assume that the agents without the lesion do not enjoy smoking and get 0 utility from it.
There are 8 possible worlds here, with different utilities and probabilities:
An agent who "decides" to smoke has higher expected utility than the one who decides not to, and this "decision" lets us learn which of the 4 possible worlds could be actual, and eventually when she gets the test results we learn which one is the actual world.
Note that the analysis would be exactly the same if there was a “direct causal link between desire for smoking and lung cancer”, without any “arterial lesion”. In the problem as stated there is no way to distinguish between the two, since there are no other observable consequences of the lesion. There is 99% correlation between the desire to smoke and and cancer, and that’s the only thing that matters. Whether there is a “common cause” or cancer causes the desire to smoke, or desire to smoke causes cancer is irrelevant in this setup. It may become relevant if there were a way to affect this correlation, say, by curing the lesion, but it is not in the problem as stated. Some decision theorists tend to get confused over this because they think of this magical thing they call "causality," the qualia of your decisions being yours and free, causing the world to change upon your metaphysical command. They draw fancy causal graphs like this one:
instead of listing and evaluating possible worlds.
Parfit’s Hitchhiker Problem
An agent is dying in the desert. A driver comes along who offers to give the agent a ride into the city, but only if the agent will agree to visit an ATM once they arrive and give the driver $1,000.
The driver will have no way to enforce this after they arrive, but she does have an extraordinary ability to detect lies with 99% accuracy. Being left to die causes the agent to lose the equivalent of $1,000,000. In the case where the agent gets to the city, should she proceed to visit the ATM and pay the driver?
We note a missing piece in the problem statement: what are the odds of the agent lying about not paying and the driver detecting the lie and giving a ride, anyway? It can be, for example, 0% (the driver does not bother to use her lie detector in this case) or the same 99% accuracy as in the case where the agent lies about paying. We assume the first case for this problem, as this is what makes more sense intuitively.
As usual, we draw possible worlds, partitioned by the "decision" made by the hitchhiker and note the utility of each possible world. We do not know which world would be the actual one for the hitchhiker until we observe it ("we" in this case might denote the agent themselves, even though they feel like they are making a decision).
So, while the highest utility world is where the agent does not pay and the driver believes they would, the odds of this possible world being actual are very low, and the agent who will end up paying after the trip has higher expected utility before the trip. This is pretty confusing, because the intuitive CDT approach would be to promise to pay, yet refuse after. This is effectively thwarted by the driver's lie detector. Note that if the lie detector was perfect, then there would be just two possible worlds:
Once the possible worlds are written down, it becomes clear that the problem is essentially isomorphic to Newcomb's.
Another problem that is isomorphic to it is
The Transparent Newcomb Problem
Events transpire as they do in Newcomb’s problem, except that this time both boxes are transparent — so the agent can see exactly what decision the predictor made before making her own decision. The predictor placed $1,000,000 in box B iff she predicted that the agent would leave behind box A (which contains $1,000) upon seeing that both boxes are full. In the case where the agent faces two full boxes, should she leave the $1,000 behind?
Once you are used to enumerating possible worlds, whether the boxes are transparent or not, does not matter. The decision whether to take one box or two already made before the boxes are presented, transparent or not. The analysis of the conceivable worlds is identical to the original Newcomb’s problem. To clarify, if you are in the world where you see two full boxes, wouldn’t it make sense to two-box? Well, yes, it would, but if this is what you "decide" to do (and all decisions are made in advance, as far as the predictor is concerned, even if the agent is not aware of this), you will never (or very rarely, if the predictor is almost, but not fully infallible) find yourself in this world. Conversely, if you one-box even if you see two full boxes, that situation is always, or almost always happens.
If you think you pre-committed to one-boxing but then are capable of two boxing, congratulations! You are in the rare world where you have successfully fooled the predictor!
From this analysis it becomes clear that the word “transparent” is yet another superfluous stipulation, as it contains no new information. Two-boxers will two-box, one-boxers will one-box, transparency or not.
At this point it is worth pointing out the difference between world counting and EDT, CDT and FDT. The latter three tend to get mired in reasoning about their own reasoning, instead of reasoning about the problem they are trying to decide. In contrast, we mindlessly evaluate probability-weighted utilities, unconcerned with the pitfalls of causality, retro-causality, counterfactuals, counter-possibilities, subjunctive dependence and other hypothetical epicycles. There are only recursion-free possible worlds of different probabilities and utilities, and a single actual world observed after everything is said and done. While reasoning about reasoning is clearly extremely important in the field of AI research, the dilemmas presented in EYNS do not require anything as involved. Simple counting does the trick better.
The next problem is rather confusing in its original presentation.
The Cosmic Ray Problem
An agent must choose whether to take $1 or $100. With vanishingly small probability, a cosmic ray will cause her to do the opposite of what she would have done otherwise. If she learns that she has been affected by a cosmic ray in this way, she will need to go to the hospital and pay $1,000 for a check-up. Should she take the $1, or the $100?
A bit of clarification is in order before we proceed. What does “do the opposite of what she would have done otherwise” mean, operationally?. Here let us interpret it in the following way:
Deciding and attempting to do X, but ending up doing the opposite of X and realizing it after the fact.
Something like “OK, let me take $100… Oops, how come I took $1 instead? I must have been struck by a cosmic ray, gotta do the $1000 check-up!”
Another point is that here again there are two probabilities in play, the odds of taking $1 while intending to take $100 and the odds of taking $100 while intending to take $1. We assume these are the same, and denote the (small) probability of a cosmic ray strike as p.
The analysis of the dilemma is boringly similar to the previous ones:
Thus attempting to take $100 has a higher payoff as long as the “vanishingly small” probability of the cosmic ray strike is under 50%. Again, this is just a calculation of expected utilities, though an agent believing in metaphysical free will may take it as a recommendation to act a certain way.
The following setup and analysis is slightly more tricky, but not by much.
The XOR Blackmail
An agent has been alerted to a rumor that her house has a terrible termite infestation that would cost her $1,000,000 in damages. She doesn’t know whether this rumor is true. A greedy predictor with a strong reputation for honesty learns whether or not it’s true, and drafts a letter:
I know whether or not you have termites, and I have sent you this letter iff exactly one of the following is true: (i) the rumor is false, and you are going to pay me $1,000 upon receiving this letter; or (ii) the rumor is true, and you will not pay me upon receiving this letter.
The predictor then predicts what the agent would do upon receiving the letter, and sends the agent the letter iff exactly one of (i) or (ii) is true. 13 Thus, the claim made by the letter is true. Assume the agent receives the letter. Should she pay up?
The problem is called “blackmail” because those susceptible to paying the ransom receive the letter when their house doesn’t have termites, while those who are not susceptible do not. The predictor has no influence on the infestation, only on who receives the letter. So, by pre-committing to not paying, one avoids the blackmail and if they receive the letter, it is basically an advanced notification of the infestation, nothing more. EYNS states “the rational move is to refuse to pay” assuming the agent receives the letter. This tentatively assumes that the agent has a choice in the matter once the letter is received. This turns the problem on its head and gives the agent a counterintuitive option of having to decide whether to pay after the letter has been received, as opposed to analyzing the problem in advance (and precommitting to not paying, thus preventing the letter from being sent, if you are the sort of person who believes in choice).
The possible worlds analysis of the problem is as follows. Let’s assume that the probability of having termites is p, the greedy predictor is perfect, and the letter is sent to everyone “eligible”, i.e. to everyone with an infestation who would not pay, and to everyone without the infestation who would pay upon receiving the letter. We further assume that there are no paranoid agents, those who would pay “just in case” even when not receiving the letter. In general, this case would have to be considered as a separate world.
Now the analysis is quite routine:
Thus not paying is, not surprisingly, always better than paying, by the “blackmail amount” 1,000(1-p).
One thing to note is that the case of where the would-pay agent has termites, but does not receive a letter is easy to overlook, since it does not include receiving a letter from the predictor. However, this is a possible world contributing to the overall utility, if it is not explicitly stated in the problem.
Other dilemmas that yield to a straightforward analysis by world enumeration are Death in Damascus, regular and with a random coin, the Mechanical Blackmail and the Psychopath Button.
One final point that I would like to address is that treating the apparent decision making as a self- and world-discovery process, not as an attempt to change the world, helps one analyze adversarial setups that stump the decision theories that assume free will.
Immunity from Adversarial Predictors
EYNS states in Section 9:
“There is no perfect decision theory for all possible scenarios, but there may be a general-purpose decision theory that matches or outperforms all rivals in fair dilemmas, if a satisfactory notion of “fairness” can be formalized.” and later “There are some immediate technical obstacles to precisely articulating this notion of fairness. Imagine I have a copy of Fiona, and I punish anyone who takes the same action as the copy. Fiona will always lose at this game, whereas Carl and Eve might win. Intuitively, this problem is unfair to Fiona, and we should compare her performance to Carl’s not on the “act differently from Fiona” game, but on the analogous “act differently from Carl” game. It remains unclear how to transform a problem that’s unfair to one decision theory into an analogous one that is unfair to a different one (if an analog exists) in a reasonably principled and general way.”
I note here that simply enumerating possible worlds evades this problem as far as I can tell.
Let’s consider a simple “unfair” problem: If the agent is predicted to use a certain decision theory DT1, she gets nothing, and if she is predicted to use some other approach (DT2), she gets $100. There are two possible worlds here, one where the agent uses DT1, and the other where she uses DT2:
So a principled agent who always uses DT1 is penalized. Suppose another time the agent might face the opposite situation, where she is punished for following DT2 instead of DT1. What is the poor agent to do, being stuck between Scylla and Charybdis? There are 4 possible worlds in this case:
The world number 3 is where a the agent wins, regardless of how adversarial or "unfair" the predictor is trying to be to her. Enumerating possible worlds lets us crystallize the type of an agent that would always get maximum possible payoff, no matter what. Such an agent would subjectively feel that they are excellent at making decisions, whereas they simply live in the world where they happen to win.