1891

LESSWRONG
LW

1890
Coordination / CooperationDecision theoryFunctional Decision TheoryGame TheoryPlanning & Decision-MakingTimeless Decision TheoryUpdateless Decision TheoryWorld Modeling
Frontpage

23

FDT Does Not Endorse Itself in Asymmetric Games

by jackmastermind
15th Jun 2025
6 min read
3

23

23

FDT Does Not Endorse Itself in Asymmetric Games
10Menotim
6jackmastermind
9quetzal_rainbow
New Comment
3 comments, sorted by
top scoring
Click to highlight new comments since: Today at 1:59 AM
[-]Menotim3mo101

In the FDT paper there is this footnote:

  1. In the authors’ preferred formalization of FDT, agents actually iterate over policies (mappings from observations to actions) rather than actions. This makes a difference in certain multi-agent dilemmas, but will not make a difference in this paper.

And it does seem that using FDT, but as a function that returns a policy rather than an action, solves this problem. So this is not an intrinsic problem with FDT that UDT doesn't have, it's a problem that arises in simpler versions of both theories and can be solved in both with the same modification.

Reply1
[-]jackmastermind3mo60

I see. I suppose you'd do this by creating a policy node that is subjunctively upstream of every individual FDT decision, and intervening on that. The possible values would be every combination of FDT decisions, and you'd calculate updateless expected value over them.

This seems to work, though I'll think on it some more. I'm a little disappointed that this isn't the formulation of FDT in the paper, since that feels like a pretty critical distinction. But in any case, I should have read more carefully, so that's on me. Thank you for bringing that up! Your comment is now linked in the introduction :)

Reply
[-]quetzal_rainbow3mo92

Thanks, I finally understood the problem with UDT 1.0.

Reply1
Moderation Log
More from jackmastermind
View more
Curated and popular this week
3Comments
Coordination / CooperationDecision theoryFunctional Decision TheoryGame TheoryPlanning & Decision-MakingTimeless Decision TheoryUpdateless Decision TheoryWorld Modeling
Frontpage
A twin guard-inmate dilemma (twin GID) is an asymmetric game that breaks FDT. [Image: GPT Image-1]

0. Introduction

TL;DR: FDT and UDT diverge in how they handle "behave as you would have ideally precommitted to behaving" in asymmetric games where a player is assigned a role after a deterministic clone is made. FDT updates, whereas UDT does not. ∴ an agent who knows in advance that they will enter one of these games would convert to UDT, not FDT, on this problem. [UPDATE: this applies to the formulation of FDT in the paper, but not necessarily to Yudkowsky & Soares' "preferred" version of FDT; see Menotim's comment]

I wrote a version of this post on my substack; it was for a less technical audience, and at the time I didn't understand updateless decision theory. I assumed that UDT and FDT just used different methods to compute the same recommendations. I was wrong! In fact, there are very simple scenarios in which FDT does not recommend precommitting to itself. 

1. Definitions

According to Yudkowsky & Soares' "Functional Decision Theory: A New Theory of Instrumental Rationality," FDT, CDT, and EDT all maximize expected utility as defined by this formula:

EU(a):=N∑j=1P(a↪oj;x)⋅U(oj)

where o1,o2,o3 . . . are the possible outcomes from some countable set O; a is an action from some finite set A; x is an observation history from some countable set X ; P(a↪oj;x) is the probability that oj will obtain in the hypothetical scenario where the action a is executed after receiving observations x; and U is a real-valued utility function bounded in such a way that [the above equation] is always finite.

 …

From this perspective, the three decision theories differ only in two ways: how they prescribe representing [the world-model] M, and how they prescribe constructing hypotheticals Ma↪ from M. (emphasis mine)

From here, the three decision theories are formalized by:

EDT(P,x):=argmaxa∈AE(V|Obs = x,Act = a)CDT(P,G,x):=argmaxa∈AE(V|do(Act = a),Obs = x)FDT(P,G,x):=argmaxa∈AE(V|do(FDT(P––,G––,x––)=a))

where V is a variable representing U(Outcome), G is a Pearl-style digraph (of causal relations for CDT, subjunctive relations for FDT), and FDT(P––,G––,x––) is notation for a variable representing the output of FDT given P, G, and x. 

Given the equation for FDT and the equation for expected utility maximization, it's a little unclear whether FDT is totally updateless here. In the FDT equation, V is not conditioned on x, but in the EU equation, x is an input to P in all three theories. If FDT is updateful, its problems get much worse (as described in my original article), so I'll assume FDT is updateless in how it assesses outcomes and show that there is still a problem.

In any case, notice that an FDT agent constructs hypotheticals by considering interventions on FDT's recommendation for agents with its precise priors, digraph, and observation history, and not considering interventions on FDT's recommendation for agents in the same scenario with a different observation history.

This is the first clue that what an agent would ideally precommit to could diverge from FDT's recommendations. When precommitting, you can choose a single policy which stipulates a full strategy profile for you and any clones of you who might have different observation histories in the scenario. But FDT only considers what agents with your observation history should do.

2. The Twin Guard-Inmate Dilemma

Let a "guard-inmate dilemma" (GID) be a prisoners' dilemma, with one twist: one player is randomly assigned the role of "guard", the other the role of "inmate". The guard has slightly different payoffs that are overall more favorable but do not change the Nash equilibrium of the problem. Here is the payoff matrix I use, where the guard gets a consistent +1 relative to the inmate:

  Guard
  CoopDefect
InmateCoop3, 4-5, 6
Defect5, -4-1, 0

Here is the setup for a twin GID: a deterministic agent is cloned and made to play a GID against its twin. Each is told their own role in the dilemma but cannot communicate with the other. So the two agents now have different observation histories, since the twin GID is asymmetric: one learns that they are a guard, the other that they are an inmate. Yudkowsky & Soares describe how they model behaving in response to different observations:

When CDT’s observation history updates from x to y, CDT changes from conditioning its model on Obs=x to conditioning its model on Obs=y, whereas FDT changes from intervening on the variable FDT(P––,G––,x––) to FDT(P––,G––,y–) instead.

Therefore, here is the digraph for the problem:

where g is the observation history corresponding to learning you're a guard, and i is the observation history corresponding to learning you're an inmate.

Immediately, there's a problem: neither the guard nor the inmate's FDT-recommendation descend from the other! Suppose you, as an FDT agent, find yourself as the guard. According to Yudkowsky & Soares' equations, you choose your action by considering the (updateless) expected utility of each possible intervention on FDT(P––,G––,g). But neither FDT(P––,G––,i) nor the inmate's action are subjunctively altered by your intervention. Thus, FDT treats the guard and inmate actions as independent and therefore recommends defection. The inmate will reason similarly and also defect. Thus, following FDT leads to mutual defection.

However, agents with the policy "always cooperate in twin GIDs, no matter your role" will achieve mutual cooperation, outperforming FDT agents. Therefore, if a winning agent knew in advance that they were going to face a twin GID, they would not want to act like an FDT agent. So either FDT does not endorse the winning strategy, or it does not endorse itself. In Yudkowsky & Soares' words,

A decision theory that (like CDT) advises agents to change their decision-making methodology as soon as possible can be lauded for its ability to recognize its own flaws, but is not a strong candidate for the normative theory of rational choice.

3. Implications

One objection is that my digraph for the twin GID might neglect some subjunctive relationship between FDT(P––,G––,g). and  FDT(P––,G––,i). However, it is fully logically consistent to have a function which recommends different actions based on whether you are a guard or an inmate, so it cannot be the case that either one is subjunctively downstream of the other. It could be the case that FDT(P––,G––,g) and FDT(P––,G––,i) are both subjunctively downstream of some other computation C:

The problem is that FDT does not intervene on C. Instead it is stipulated to intervene directly on FDT(P––,G––,x––), and Pearl-style intervention breaks all incoming arrows to the node; this is the difference between the do-operator and Bayesian conditioning. 

So this doesn't solve the problem. However, it does show how the problem might be solved: by having a decision theory that intervenes on upstream policies themselves, rather than on the outputs of policies. This is what UDT does! The problem I've described is very similar to the problem Wei Dai found in an earlier version of UDT, and suggested that timeless decision theory might share the same bug. This prompted the switch from this action-based equation (where o∈O is your observation history):

choice(o):=argmaxa∈AEP(U∣choice(o)=a)

to a policy-based equation (where π:O→A):

choice(o):=π∗(o)π∗:=argmaxπ∈ΠEP(U∣π∗=π)

Correspondingly, UDT does not appear to have the same problem with asymmetry.

Therefore, for the moment I believe UDT, or a UDT-like theory should be the optimal decision theory for a self-modifying AI, not FDT.