Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Janos and I started working on extending the fixed point theorem to the infinite case at MIRI's June decision theory workshop with Patrick and Benja. This post attempts to exhibit the finite version of the theorem and speculates about a plausible extension.

Algorithm for the finite case

Suppose we're given variables , and statements where each is fully modalized (no variables occur outside modal subformulas). We will describe an algorithm for constructing sentences with no free variables, such that the original statements are equivalent to .

For simplification purposes, we will assume that each is only singly modalized (none of the modal subformulas contain further modal subformulas). If not, we can introduce new variables for each subformula of the form that occurs in a fully modalized context.

Now consider a sequence of theories , where .

In , it's easy to determine the truth value of each : every modal subformula can be replaced with , leaving a propositional formula with no variables.

Now suppose we've done this for . Then, a statement will be true in iff

Therefore will be true in iff is true in ; this will let us evaluate the truth value of in all of the theories .

It's clearly the case that every modal subformula will have truth value stabilizing, therefore every will also. So there is an such that has the same truth values as for . Now if , construct ; otherwise construct . These are variable-free formulas.

Infinite case example: Procrastination Bot

def ProcrastinationBot(X):

if :

return C

else: return D

Constructing the fixed point

Let denote whether cooperates with . Then where stands for (e.g. ).

Then the following statements hold

In , any statement with a box in front of it is true, so any statement where a boxed statement is implied is also true. Thus, all the relevant statements are true in :


In , all the implied statements are boxed statements that were true in , e.g. or . Thus, all the relevant statements are true in :


Thus, any Procrastination Bot cooperates with any other Procrastination Bot.

Existence and uniqueness of the fixed point via quining

Given a formula , there exists formula such that . The quine construction (thanks Benja!) is

Let . Then we have

Now consider that . Using the above, we have

By Lob's theorem, , so a fixed point exists.

Assume there are two fixed points and . Then we have Thus, , so the fixed points are the same.


Ω 2

New Comment