In my previous post, I explained how a set of programs predicting each
other using a probabilistic oracle and taking utility-maximizing
actions will naturally play a Nash equilibrium. However, this seems
to be suspiciously CDT-like. A program will, for example, betray in a
Nash equilibrium when playing against itself.
Naturally, probabilistic oracle machines are a good way to formalize
Newcomblike problems (since we have prediction), so we can explore
alternative decision theories in this framework. Assume we have some
probabilistic world program W(O) that runs the world and calls the
oracle (which is not considered part of the world), returning an
outcome. There may be (probabilistic) agent programs embedded in this
world program (as is common in formalizations of UDT). Different
agents value outcomes differently: define Vi(w) to be the utility
player i assigns to the outcome W(O)=w. I am using Vi
instead of Ui to differentiate this utility function from the game
theoretic utility function in the previous post.
Assume that player i's output is ai(O)=O(Ei,1/2) for some
machine Ei. This is a convenient assumption and it might be the
case that this is enough, but I haven't actually proven this.
Here's an example of player 1 playing Prisoner's dilemma against
W(O)=let x=O(E1,1/2),y=O(E2,1/2) in (x,y)
Note that x and y are the results of 2 independent calls to E1.
We assume that action 1 is defection and action 0 is cooperation.
As a side note, perhaps I should be using monadic do-notation to show
that these calls are independent, but this is simple enough that
it doesn't seem necessary.
Here's an example of player 1 playing Newcomb's problem:
W(O)=let x=O(E1,1/2),y=O(E1,1/2) in (x,y)
Here, action 1 is 2-boxing and action 0 is 1-boxing. x is
the player's actual action while y is Omega's prediction
of the player's action. Note the strong resemblance between Newcomb's
problem and playing prisoner's dilemma against yourself.
How might agents in this environment act to maximize utility? In
the previous post, we had a quantity we wanted to estimate,
We had P(Ei=1) equal to this expectation, so O(Ei,1/2) will
return a utility-maximizing action for player i. The quantities
Ui(a−i,1) and Ui(a−i,0) are causal counterfactuals, but
maybe we want to replace them with other kinds of counterfactuals.
To do this, we could define masked oracles Mask(s,p,q,O) for state s, probabilities p and q, and oracle O as follows:
P(Mask(s,p,q,O)(s2,p2)1)=P(O(s2,p2)=1) for s≠s2∨p≠p2
This is the same as O except that, when given the query (s,p),
it returns 1 with probability q. Note that Mask(s,p,q,O) might
not be a reflective oracle even if O is.
For a UDT-style decision theory, we would like to define the
machine Ei to compute a utility difference between oracular counterfactuals (similar to logical counterfactuals):
In other words, we try masking the oracle to make our agent
O(Ei,1/2) return either always 1 or always 0, and then simulate the world
twice with these different oracles and get the expected utility difference.
It is not hard to confirm that this reasoner succeeds in cooperating
on the prisoner's dilemma with itself and one-boxing in Newcomb's
problem. However, since the masked oracle need not be reflective,
it will not propagate any consequences of the change we made, and we get a few
counterintuitive results, such as not cooperating in a prisoner's dilemma with a
slightly different but logically equivalent program.
There's also an issue with Newcomb's problem when we add a step of indirection
to Omega's prediction:
W(O)=let x=O(E1,1/2),y=O(a1,1/2) in (x,y)
Since the counterfactual only modifies O(E1,1/2) and not O(a1,1/2),
the player in this case will conclude that 2-boxing is correct.
So, maybe we want to get a modified version O′ of O so that
O′(s,p)=q, but O′ is also reflective, and is probably
mostly similar to O when possible. However, we have no guarantee
that O′ actually exists. For example, if M is a machine that
always returns 1, we cannot have a reflective oracle in which O′(M,1/2)=0.
Perhaps we can construct our Ei machines so that such an oracle
always exists. I'm not sure what approach makes sense here. This
seems similar to the problem that playing chicken with the universe is
meant to solve, but I don't know how well this translates to a context
based on probabilistic oracles rather than logic.
If we succeeded in specifying more reasonable modified oracles, we might end up
with a nice alternative formalism of UDT that can be computed using Nash
One natural approach is to try to specify an algorithm that uses reflective oracles to reason about itself under logical uncertainty (under axioms characterizing the reflective oracle), and then to implement UDT by conditioning on logical facts. I don't know if this adds anything beyond implementing logical uncertainty with a halting oracle. If this doesn't work there are a few other similar approaches (e.g. restrict to Sigma_1 sentences, or see if the oracle's behavior can motivate the definition of an idealized math intuition module with nicer behavior than our current candidates).
I suspect that something like this could give us clearer insight into whether or not conditioning on logical facts is an acceptable method of taking counterfactuals. Right now that seems hard to settle, because the answer seems dependent on the algorithm used for reasoning under logical uncertainty. I think most people aren't very optimistic about conditioning, but it's still my leading contender (unless I've missed some recent discouraging news).