[Comment edited for clarity]

Since when does CDT include backtracking on noticing other people's predictive inconsistency?

I agree that CDT does not including backtracking on noticing other people's predictive inconsistency. My assumption is that decision-theories (including CDT) takesa world-map and outputs an action. I'm claiming that this post is conflating an error in constructing an accurate world-map with an error in the decision theory.

CDT cannot notice that Omega's prediction aligns with its hypothetical decision because Omega's prediction is causally "before" CDT's decision, so any causal decision graph cannot condition on it. This is why post-TDT decision theories are also called "acausal."

Here is a more explicit version of what I'm talking about. CDT makes a decision to act based on the expected value of its action. To produce such an action, we need to estimate an expected value. In the original post, there are two parts to this:

Part 1 (Building a World Model):

  • I believe that the predictor modeled my reasoning process and has made a prediction based on that model. This prediction happens before I actually instantiate my reasoning process
  • I believe this model to be accurate/quasi-accurate
  • I start unaware of what my causal reasoning process is so I have no idea what the predictor will do. In any case, the causal reasoning process must continue because I'm thinking.
  • As I think, I get more information about my causal reasoning process. Because I know that the predictor is modeling my reasoning process, this let's me update my prediction of the predictor's prediction.
  • Because the above step was part of my causal reasoning process and information about my causal reasoning process affects my model of the predictor's model of me, I must update on the above step as well
  • [The Dubious Step] Because I am modeling myself as CDT, I will make a statement intended to inverse the predictor. Because I believe the predictor is modeling me, this requires me to inverse myself. That is to say, every update my causal reasoning process makes to my probabilities is inversing the previous update
    • Note that this only works if I believe my reasoning process (but not necessarily the ultimate action) gives me information about the predictor's prediction.
  • The above leads to infinite regress

Part 2 (CDT)

  • Ask the world model what the odds are that the predictor said "one" or "zero"
  • Find the one with higher likelihood and inverse it

I believe Part 1 fails and that this isn't the fault of CDT. For instance, imagine the above problem with zero stakes such that decision theory is irrelevant. If you ask any agent to give the inverse of its probabilities that Omega will say "one" or "zero" with the added information that Omega will perfectly predict those inverses and align with them, that agent won't be able to give you probabilities. Hence, the failure occurs in building a world model rather than in implementing a decision theory.



-------------------------------- Original version

Since when does CDT include backtracking on noticing other people's predictive inconsistency?

Ever since the process of updating a causal model of the world based on new information was considered an epistemic question outside the scope of decision theory.

To see how this is true, imagine the exact same situation as described in the post with zero stakes. Then ask any agent with any decision theory about the inverse of the prediction it expects the predictor to make. The answer will always be "I don't know", independent of decision theory. Ask that same agent if it can assign probabilities to the answers and it will say "I don't know; every time I try to come up with one, the answer reverses."

All I'm trying to do is compute the probability that the predictor will guess "one" or "zero" and failing. The output of failing here isn't "well, I guess I'll default to fifty-fifty so I should pick at random"[1], it's NaN.

Here's a causal explanation:

  • I believe the predictor modeled my reasoning process and has made a prediction based on that model.
  • I believe this model to be accurate/quasi-accurate
  • I start unaware of what my causal reasoning process is so I have no idea what the predictor will do. But my prediction of the predictor depends on my causal reasoning process
  • Because my causal reasoning process is contingent on my prediction and my prediction is contingent on my causal reasoning process, I end up in an infinite loop where my causal reasoning process cannot converge on an actual answer. Every time it tries, it just keeps updating.
  • I quit the game because my prediction is incomputable

Predictors exist: CDT going bonkers... forever

by Stuart_Armstrong 1 min read14th Jan 202029 comments

42

Ω 16


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I've been wanting to get a better example of CDT (causal decision theory) misbehaving, where the behaviour is more clearly suboptimal than it is in the Newcomb problem (which many people don't seem to accept as CDT being suboptimal), and simpler to grasp than Death in Damascus.

The "predictors exist" problem

So consider this simple example: the player is playing against Omega, who will predict their actions[1]. The player can take three actions: "zero", "one", or "leave".

If ever they do "leave", then the experiment is over and they leave. If they choose "zero" or "one", then Omega will predict their action, and compare this to their actual action. If the two match, then the player loses utility and the game repeats; if the action and the prediction differs, then the player gains utility and the experiment ends.

Assume that actually Omega is a perfect or quasi-perfect predictor, with a good model of the player. An FDT or EDT agent would soon realise that they couldn't trick Omega, after a few tries, and would quickly end the game.

But the CDT player would be incapable of reaching this reasoning. Whatever distribution they compute over Omega's prediction, they will always estimate that they (the CDT player) have at least a chance of choosing the other option[2], for an expected utility gain of at least .

Basically, the CDT agent can never learn that Omega is a good predictor of themselves[3]. And so they will continue playing, and continue losing... for ever.


  1. Omega will make this prediction not necessarily before the player takes their action, not even necessarily without seeing this action, but still makes the prediction independently of this knowledge. And that's enough for CDT. ↩︎

  2. For example, suppose the CDT agent estimates the prediction will be "zero" with probability , and "one" with probability 1-p. Then if , they can say "one", and have a probability of winning, in their own view. If , they can say "zero", and have a subjective probability of winning. ↩︎

  3. The CDT agent has no problem believing that Omega is a perfect predictor of other agents, however. ↩︎

42

Ω 16