This post collects a few situations where agents might want to make their decisions either predictable or unpredictable to certain methods of prediction, and considers a method of making a decision unpredictable by "diagonalizing" a hypothetical prediction of that decision. The last section takes a stab at applying this tool to the ASP problem.
The diagonal step
To start off, consider the halting problem, interpreted in terms of agents and predictors. Suppose that there is a Universal Predictor, an algorithm that is able to decide whether any given program halts or runs forever. Then, it's easy for a program (agent) to evade its gaze by including a diagonal step in its decision procedure: the agent checks (by simulation) if Universal Predictor comes to some decision about the agent, and if it does, the agent acts contrary to the Predictor's decision. This makes the prediction wrong, and Universal Predictors impossible.
The same trick could be performed against something that could exist, normal non-universal Predictors, which allows an agent to make itself immune to their predictions. In particular, ability of other agents to infer decisions of our agent may be thought of as prediction that an agent might want to hinder. This is possible so long as the predictors in question can be simulated in enough detail, that is it's known what they do (what they know) and our agent has enough computational resources to anticipate their hypothetical conclusions. (If an agent does perform the diagonal step with respect to other agents, the predictions of other agents don't necessarily become wrong, as they could be formally correct by construction, but they cease to be possible, which could mean that the predictions won't be made at all.)
Knowing own decision
In UDT/ADT, an agent is given definitions of its action (strategy) and utility value, and evaluates an action (strategy) by inferring the logical implications of taking it on the utility value (see this post for a more detailed description of the decision procedure and this post for an explanation of how it might be better to optimize strategy, rather than individual decisions). However, this algorithm breaks down if an agent becomes able to infer its own decision (or that a certain decision is impossible): in that case, it can infer anything from the hypothesis that it makes an impossible decision. This is one case where an agent might want to make its decision unpredictable to itself!
It seems that normally an agent won't be able to predict its own decision (see these posts for some discussion), but it's unclear how to set up an agent so that it provably doesn't do that. Worse, an ADT agent is a program that reasons using first-order logic and knows its own source code, which means that a statement declaring agent's actual decision is provable (by a standard inference system that enumerates all valid statements), and so the best we can hope for is that the agent won't stumble upon such a proof before it made a decision (see the section of this post on provability by the agent).
Countable diagonal step
Adding the diagonal step to ADT agents, to make them unable to predict (infer) their own decision, in the simplest case requires the agents to simulate themselves up until the point where they make a decision. In general this is too hard, so one way out is to grant the agents provability oracles. The resulting setting is described in this post.
The inference (prediction) system of an agent with a provability oracle can be completely described as a formal system S, which the agent uses (with help of the oracle) to check validity of any statement (in one step). Trouble with predicting own decisions then corresponds to S being too strong, able to infer agent's decision. The ("countable") diagonal step can be implemented by the agent checking, for each possible action A, whether S proves that actual action doesn't equal A, and performing A if S proves so. If S is sound, that situation is never realized, which means that S is weak enough to fail to predict agent's decision. The agent makes itself just complicated enough for its own inference system to become unable to reason about (origins of) its decision. This condition is then sufficient to prove that the agent satisfactorily resolves some thought experiments, such as Newcomb's Problem.
Avoiding spurious statements (without oracles)
An ADT agent without provability oracle only needs to hold off on predicting its decision until the decision is made. This can be arranged by use of the diagonal step if it's known when the decision can be made. Let's consider an example.
box1 = 1000
box2 = (agent() == 2) ? 0 : 1000000
return box2 + ((agent() == 2) ? box1 : 0)
Define constant symbols A=agent(), U=world(). It's easy to see that statements [A=1 => U=1000000] and [A=2 => U=1000] are valid, no matter what agent() is (assuming it terminates and always outputs 1 or 2). But if actually A=2, then A=1 is false, and so the statement [A=1 => U=0] is true as well. How do we know that the agent won't find specifically the statements [A=1 => U=0] and [A=2 => U=1000], choose A=2 as a result, with A=1 indeed becoming false?
The problem is that even when we know that in the intended situation certain statements assigning value to possible actions are true, that doesn't automatically guarantee that the agent will find these same statements, and thus that it will make the decision implied by the statements we know to be true. In Newcomb's problem, an agent could in principle encounter true statements [A=1 => U=0] and [A=2 => U=1000] listed at the very beginning of a contrived inference system and two-box as a result, which is what makes those statements true.
Thus, it matters how agent's inference system works, even if we know that it's correct (i.e. enumerates all valid first order statements following from the stated axioms). It has to be sufficiently "natural" in some sense, so that we can rely on [A=1 => U=1000000] coming before statements like [A=1 => U=0], with agent acting on the more "natural" arguments about value of a possible action, and not on one of the spurious ones (implied by the possible action not being actually taken).
Finite diagonal step
In the preceding example, we can point out specific proofs of [A=1 => U=1000000] and [A=2 => U=1000] that don't depend on definition of A. Knowing the length of such proofs (say, L) and using an inference system that enumerates all valid proofs in the order of their length allows predicting when those particular proofs will be encountered, and so when, at the latest, the decision will be made (if it's not made before that, but using some other, shorter proof). It is then only until that point that the agent has to hold off on predicting its own decision.
If we know the number of steps N for the duration of which the inference system needs to be protected (while it considers the proofs no longer than L), the ("finite") diagonal step consists in running agent's inference system S for N steps, and contradicting (by action) any prediction about agent's action reached during these N steps. This step is decidable and doesn't require an oracle. Implementing it guarantees that (if S is sound and N is selected as discussed above) agent won't be able to predict its decision before it's made.
Sufficiently increasing the threshold N allows to prove that no other statements of the form [A=A1 => U=U1] will be proved before a decision is made (and so that the decision will be as expected). Specifically, let M>N be the new threshold. We know that (for example) [A=1 => U=1000000] is proved by S within N steps, so if (say) [A=1 => U=0] is proved also within N steps, then the proof of negation of [A=1] is only a few steps longer than 2*L, and so if we choose M to include all proofs of that length, the disproof of [A=1] will be encountered within the first M steps, which would trigger the diagonal step. Since the diagonal step can't be triggered if S is sound, we've thus proved that presence of the diagonal step guarding the first M steps of the inference system allows to conclude that the agent won't infer any spurious statements of the form [A=A1 => U=U1], and will behave as expected.
This is the same guarantee that countable diagonal rule gives, albeit at the cost of having to know L.
Prisoner's dilemma presents a situation where some measure of predictability is advantageous to both agents. Specifically, if an agent can prove that cooperating implies cooperation of the opponent, and defecting implies defection of the opponent, it can then cooperate, reaching the outcome (C,C) that is better than the equilibrium solution (D,D). It's harder to arrange cooperation without being able to predict the opponent's reaction to a hypothetical decision, so taking away all predictability of decisions will hurt both agents.
The trick is that UDT only wants that statements like [A1=C] be unprovable (in N steps), but statements like [A1=C => A2=C] may remain provable, and it's conditional statements like this that are needed to broker cooperation in PD. It seems that cooperation is possible both for simple agents without oracles or use of the diagonal step, and for agents with oracles that use countable diagonal step.
Agent simulates predictor
Finally, there is the ASP problem that involves a weak predictor and a powerful agent that need to cooperate on Newcomb's problem. The difficulty is that it seems that since the agent can anticipate predictor's decision by simulating it, it would take both boxes no matter what, and so the predictor has to predict two-boxing (which is unfortunate). The predictor is defenseless, it can't insist on making itself unpredictable, and as a result the agent's predictions of predictor's decision (given either hypothetical action of the agent) coincide.
Just as it's necessary to entertain multiple hypothetical actions to gauge their consequences, it's necessary for there to be multiple hypothetical outcomes for their comparison to yield a nontrivial decision. The preceding sections addressed the problem of making agent's own decision unpredictable, and in this case it seems that the agent has to make the decision of its opponent (the weak predictor) unpredictable (to the agent).
One way to patch this up seems to be requiring that the weak predictor is trying to prove that agent one-boxes (with however little computational power it has), and predicting two-boxing otherwise. This way, if predictor says that agent one-boxes, agent must really be one-boxing, and so we can use this condition in a diagonal step in the agent, making this situation unpredictable to the agent. Namely, the agent will two-box if it predicts that the predictor decides that the agent one-boxes (while running agent's inference system for at most M steps, as with finite diagonal step). As a result, the agent won't be able to predict that the predictor one-boxes (within M steps of agent's inference system), which might make the agent simple enough for the predictor to predict, and the two players seem to regain an equal footing.
- Formalize this analysis of ASP, prove that predictor and agent one-box
- What happens if we try making the opponent in PD "unpredictable" in the same sense?
- See if making all kinds of facts unpredictable in this way can be made to retain their conditional predictability (predictability given an assumption of what agent's action is)