Related: Conceptual Problems with UDT and Policy Selection, Formalising decision theory is hard

## Target

Anyone who is interested in decision theory. The post is pretty general and not really technical; some familiarity with counterfactual mugging can be useful, but overall the required background knowledge is not much.

# Outline

The post develops the claim that identifying the correct solution to some decision problems might be intricate, if not impossible, when certain details about the specific scenario are not given. First I show that, in counterfactual mugging, some important elements in the problem description and in a possible formalisation are actually underspecified. Next I describe issues related to the concept of perfect prediction and briefly discuss whether they apply to other decision scenarios involving predictors. Then I present some advantages and disadvantages of the formalisation of agents as computer programs. A summary with bullet points concludes.

# Missing parts of a “correct” solution

I focus on the version of the problem with cards and two humans since, to me, it feels more grounded in reality—a game that could actually be played—but what I say applies also to the version with a coin toss and Omega.

What makes the problem interesting is the conflict between these two intuitions:

- Before Player A looks at the card, the best strategy seems to never show the card, because it is the strategy that makes Player A lose the least in expectation, given the uncertainty about the value of the card (50/50 high or low)
- After Player A sees a low card, showing it seems a really good idea, because that action gives Player A a loss of 0, which is the best possible result considering that the game is played only once and never again. Thus, the incentive to not reveal the card seems to disappear after Player A knows that the card is low.

[In the other version, the conflict is between paying before the coin toss and refusing to pay after knowing the coin landed tails.]

One attempt at formalising the problem is to represent it as a tree (a formalisation similar to the following one is considered here). The root is a 50/50 chance node representing the possible values of the card. Then Player A chooses between showing and not showing the card; each action leads to a leaf with a value which indicates the loss for Player A. The peculiarity of counterfactual mugging is that some payoffs depend on actions taken in a different subtree.

[The tree of the other version is a bit different since the player has a choice only when the coin lands tails; anyway, the payoff in the heads case is “peculiar” in the same sense of the card version, since it depends on the action taken when the coin lands tails.]

With this representation, it is easy to see that we can assign an expected value (EV) to each deterministic policy available to the player: we start from the root of the tree, then we follow the path prescribed by the policy until we reach a payoff, which is assigned a weight according to the chance nodes that we’ve run into.

Therefore it is possible to order the policies according to their expected values and determine which one gives the lowest expected loss [or, in the other version, the highest EV] respect to the root of the tree. This is the formalism behind the first of the two intuitions presented before.

On the other hand, one could object that it is far from trivial that the correct thing to do is to minimise expected loss from the root of the tree. In fact, in the original problem statement, the card is low [tails], so the relevance of the payoffs in the other subtree—where the card is high [heads]—is not clear and the focus should be on the decision node with the low card, not on the root of the tree. This is the formalism behind the second intuition.

Even though the objection related to the second intuition sounds reasonable, I think one could point to other, more important issues underlying the problem statement and formalisation. Why is there a root in the first place and what does it represent? What do we mean when we say that we minimise loss “from the start”?

These questions are more complicated than they seem: let me elaborate on them. Suppose that the advice of maximising EV “from the start” is generally correct from a decision theory point of view. It is not clear how we should apply that advice in order to make correct decisions as humans, or to create an AI that makes correct decisions. Should we maximise value...

- ...from the instant in which we are “making the decision”? This seems to bring us back to the second intuition, where we want to show the card once we’ve seen it is low.
- ...from our first conscious moment, or from when we started collecting data about the world, or maybe from the moment which the first data point in our memory is about? In the case of an AI, this would correspond to the moment of the “creation” of the AI, whatever that means, or maybe to the first instant which the data we put into the AI points to.
- ...from the very first moment since the beginning of space-time? After all, the universe we are observing could be one possible outcome of a random process, analogous to the 50/50 high/low card [or the coin toss].

Regarding point 1, I’ve mentioned the second intuition, but other interpretations could be closer to the first intuition instead. The root could represent the moment in which we settle our policy, and this is what we would mean with “making the decision”.

Then, however, other questions should be answered about policy selection. Why and when should we change policy? If selecting a policy is what constitutes a decision, what exactly is the role of actions, or how is changing policy fundamentally different from other actions? It seems we are treating policies and actions as concepts belonging to two different levels in a hierarchy: if this is a correct model, it is not clear to me why we do not use further levels, or why we need two different levels, especially when thinking in terms of embedded agency.

Note that giving precise answers to the questions in the previous paragraph could help us find a criterion to distinguish fair problems from unfair ones, which would be useful to compare the performance of different decision theories, as pointed out in the conclusion of the paper on FDT. Considering fair all the problems in which *the outcome depends only on the agent’s behavior in the dilemma at hand* (p.29) is not a satisfactory criterion when all the issues outlined before are taken into account: the lack of clarity about the role of root, decision nodes, policies and actions makes the “borders” of a decision problem blurred, and leaves *the agent’s behaviour* as an underspecified concept.

Moreover, resolving the ambiguities in the expression “from the start” could also explain why it seems difficult to apply updatelessness to game theory (see the sections “Two Ways UDT Hasn’t Generalized” and “What UDT Wants”).

# Predictors

## A weird scenario with perfect prediction

So far, we’ve reasoned as if Player B—who determines the loss of Player A by choosing the value of that best represents his belief that the card is high—can perfectly guess the strategy that Player A adopts. Analogously, in the version with the coin toss, Omega is capable of perfectly predicting what the decision maker does when the coin lands tails, because that information is necessary to determine the payoff in case the coin lands heads.

However, I think that also the concept of perfect prediction deserves further investigation: not because it is an implausible idealisation of a highly accurate prediction, but because it can lead to strange conclusions, if not downright contradictions, even in very simple settings.

Consider a human that is going to choose only one between two options: M or N. Before the choice, a perfect predictor analyses the human and writes the letter (M or N) corresponding to the predicted choice on a piece of paper, which is given to the human. Now, what exactly prevents the human from reading the piece of paper and choosing the other option instead?

From a slightly different perspective: assume there exists a human, facing a decision between M and N, who is capable of reading a piece of paper containing only one letter, M or N, and choosing the opposite—seems quite a weak assumption. Is a “perfect predictor” that writes the predicted option on a piece of paper and gives it to the human… always wrong?

Note that allowing probabilities doesn’t help: a human capable of always choosing M when reading a prediction like “probability p of choosing M, probability 1-p of choosing N” seems as plausible as the previous human, but again would make the prediction always wrong.

## Other predictions

Unlike the previous example, Newcomb’s and other problems involve decision makers who are not told about the prediction outcome. However, the difference might not be as clear-cut as it first appears. If the decision maker regards some information—maybe elements of the deliberation process itself—as evidence about the imminent choice, the DM will also have information about the prediction outcome, since the predictor is known to be reliable. To what extent is this information about the prediction outcome different from the piece of paper in the previous example? What exactly can be considered evidence about one’s own future choices? The answer seems to be related to the details of the prediction process and how it is carried out.

It may be useful to consider how a prediction is implemented as a specific program. In this paper by Critch, the algorithm plays the prisoner’s dilemma by cooperating if it successfully predicts that the opponent will cooperate, and defecting otherwise. Here the “prediction” consists in a search for proofs, up to a certain length, that the other algorithm outputs Cooperate when given as input. Thanks to a bounded version of Löb’s theorem, this specific prediction implementation allows to cooperate when playing against itself.

Results of this kind (open-source game theory / program equilibrium) could be especially relevant in a future in which important policy choices are made by AIs that interact with each other. Note, however, that no claim is made about the rationality of 's overall behaviour—it is debatable whether 's decision to cooperate against a program that always cooperates is correct.

Moreover, seeing decision makers as programs can be confusing and less precise than one would intuitively think, because it is still unclear how to properly formalise concepts such as action, policy and decision-making procedure, as discussed previously. If actions in certain situations correspond to program outputs given certain inputs, does policy selection correspond to program selection? If so, why is policy selection not an action like the other ones? And—related to what I said before about using a hierarchy of exactly two levels—why don’t we also “select” the code fragment that does policy selection?

In general, approaches that use some kind of formalism tend to be more precise than purely philosophical approaches, but there are some disadvantages as well. Focusing on low-level details can make us lose sight of the bigger picture and limit lateral thinking, which can be a great source of insight for finding alternative solutions in certain situations. In a blackmail scenario, besides the decision to pay or not, we could consider what factors caused the leakage of sensible information, or the exposure of something we care about, to adversarial agents. Another example: in a prisoner’s dilemma, the equilibrium can shift to mutual cooperation thanks to the intervention of an external actor that makes the payoffs for defection worse (the chapter on game theory in Algorithms to Live By gives a nice presentation of this equilibrium shift and related concepts).

We may also take into account that, for efficiency reasons, predictions in practice might be made with methods different from close-to-perfect physical or algorithmic simulation, and the specific method used could be relevant for an accurate analysis of the situation, as mentioned before. In the case of human interaction, sometimes it is possible to infer something about one’s future actions by reading facial expressions; but this also means that a predictor can be tricked if one is capable of masking their own intentions by keeping a poker face.

# Summary

- The claim that a certain decision is correct because it maximises utility may require further explanation, since every decision problem sits in a context which might not be fully captured in the problem formalisation.
- Perfect prediction leads to seemingly paradoxical situations. It is unclear whether these problems underlie other scenarios involving prediction. This does not mean the concept must be rejected; but our current understanding of prediction might lack critical details. Certain problems may require clarification of how the prediction is made before a solution is claimed as correct.
- The use of precise mathematical formalism
*can*resolve some ambiguities. At the same time, interesting solutions to certain situations may lie “outside” the original problem statement.

*Thanks to Abram Demski, Wolfgang Schwarz and Caspar Oesterheld for extensive feedback.*

*This work was supported by CEEALAR.*

**Appendix**

## Biases

There are biases in favor of the there-is-always-a-correct-solution framework. Uncovering the right solution in decision problems can be fun, and finding the Decision Theory to solve them all can be appealing.

## On “wrong” solutions

Many of the reasons provided in this post explain also why it’s tricky to determine what a certain decision theory does in a problem, and if the given solution is wrong. But I want to provide another reason, namely the following informal...

**Conjecture**: for any decision problem that you believe CDT/EDT gets wrong, there exists a paper or book in which a particular version of CDT/EDT gives the solution that you believe is correct, and/or a paper or book that argues that the solution you believe is correct is actually wrong.

Here’s an example about Newcomb’s problem.