Reference Post: Trivial Decision Theory Problem

Chris_Leong

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

A trivial decision problem is one where there is only a single option that the agent can take. In that case, the most natural answer to the answer to the question, "What action should we take?" would be "The only action that we can take!". We will call this the Triviality Perspective.

A particularly interesting example is Transparent Newcomb's Problem. If you accept the premise of a perfect predictor, then seeing $1 million in the transparent box implies that you were predicted to one-box which implies that you will one-box. So the Triviality Perspective claims that you should one-box, but also that this is an incredibly boring claim that doesn't provide much insight into decision theory.

We can see that in general, any decision theory problem with a perfectly defined universe and a perfectly defined agent will be trivial. Evidential decision theory treats the fact of the matter about which counterfactual action we select the same as any other fact in the problem statement and hence arguably embraces the Triviality Perspective.

Alternatively, the Triviality Perspective can be seen as an overly restrictive and literal interpretation of what the problem is. We could interpret "What action should we take?" to be asking not about the set of actions that are consistent with the problem statement, but instead about a set of counterfactual worlds each corresponding to a different actions. We will call this the Counterfactual Perspective. From this perspective, the problem is only trivial before we have augmented it with counterfactuals.

Here are some examples: In Causal Decision Theory, we can just construct counterfactuals by changing the value of the node in the causal graph to whatever we want and remove any inbound links. In Functional Decision Theory, we imagine that a particular program outputs a value that it does not and then update other program that subjunctively depend on that program's value. The Erasure Approach reinterprets the problem removing an assumption so that there will then be multiple possible counterfactuals consistent with the problem statement.

Combining perspectives

It is actually possible to be sympathetic to both the Triviality Perspective and the Counterfactual Perspective. Instead of being seen as opposed perspectives, they can be seen as two different lens for viewing the same situation so long as we don't try to mix both at the same time. We will call this the Dual Perspective.

One area where combining both perspectives could be useful is when considering fatalistic arguments. Suppose there is a student who has a crucial exam in a week. They have the option to study or to go to the beach. Now the student reasons that it was determined at the start of time whether or not they were going to pass the exam and nothing they can do can change that. Therefore they decide to go to the beach. What is wrong with this reasoning?

One resolution would be to say that when we limit ourselves to considering the factual, the Triviality Perspective applies and student can only pick one option and therefore can only obtain one outcome. On the other hand, when we allow ourselves to augment the situation with counterfactuals, we might say the Counterfactual Perspective applies and there are both multiple outcomes and multiple possible choices. Here we are applying the first perspective when discussing what actually occurs in the world, and the second when analysing decisions (see The Prediction Problem for an application of this to Newcomb's problem).

(Sometimes it is useful to have a short post that contains a clear definition of a single concept for linking to, even if it doesn't contain any fundamentally new content. I'm still uncertain about the norms for the Alignment forum, so please let me know if you think this isn't the best place to post this)

This post was written with the support of the AI Safety Research Program

[-]Charlie Steiner4yΩ120

I'm definitely satisfied with this kind of content.

The names suggest you're classifying decision procedures by what kind of thoughts they have in special cases. But "sneakily" the point is this is relevant because these are the kinds of thoughts they have all the time.

I think the next place to go is to put this in the context of methods of choosing decision theories - the big ones being reflective modification and evolutionary/population level change. Pretty generally it seems like the trivial perspective is unstable is under these, but there are some circumstances where it's not.

[-]Chris_Leong4yΩ120

"I think the next place to go is to put this in the context of methods of choosing decision theories - the big ones being reflective modification and evolutionary/population level change. Pretty generally it seems like the trivial perspective is unstable is under these, but there are some circumstances where it's not." - sorry, I'm not following what you're saying here

[-]Charlie Steiner4yΩ240

Reflective modification flow: Suppose we have an EDT agent that can take an action to modify its decision theory. It will try to choose based on the average outcome conditioned on taking the different decision. In some circumstances, EDT agents are doing well so it will expect to do well by not changing; in other circumstances, maybe it expects to do better conditional on self-modifying to use the Counterfactual Perspective more.

Evolutionary flow: If you put a mixture of EDT and FDT agents in an evolutionary competition where they're playing some iterated game and high scorers get to reproduce, what does the population look like at large times, for different games and starting populations?

[-]Pattern4y10

Some problems/posts are also about

a) implications which may or may not be trivial

b) what do you value? (If you can only take one box, and there are different harder to compare things than money and money in the boxes, which would you choose?)

LESSWRONG
LW

Reference Post: Trivial Decision Theory Problem

16

Ω 6

New to LessWrong?

16

Ω 6