A Short Note on UDT


11


Chris_Leong

In my last post, I stumbled across some ideas which I thought were original, but which were already contained in UDT. I suspect that was because these ideas haven’t been given much emphasis in any of the articles I’ve read about UDT, so I wanted to highlight them here.

We begin with some definitions. Some inputs in an Input-Output map will be possible for some agents to experience, but not for others. We will describe such inputs and the situations they represent as conditionally consistent. Given a particular agent, we will call a input/situation compatible if the agent is consistent with the corresponding situation and incompatible otherwise. We will call agents consistent with a conditionally consistent input/situation compatible and those who aren’t incompatible.

We note the following points:

  • UDT uses an Input-Output map instead of a Situation-Output map. It is easy to miss how important this choice is. Suppose we have an input representing a situation that is conditionally consistent. Trying to ask what an incompatible agent does in such a situation is problematic or at least difficult as the Principle of Explosion means that all such situations are equivalent. On the other hand, it is much easier to ask how the agent responds to a sequences of inputs representing an incompatible situation. The agent must respond somehow to such an input, even if it is by doing nothing or crashing. Situations are also modelled (via the Mathematical Intuition Function), but the point is that UDT models inputs and situations separately.
  • Given the previous point, it is convenient to define an agent’s counterfactual action in an incompatible situation as its response to the input representing the situation. For all compatible situations, this produces the same action as if we’d simply asked what the agent would do in such a situation. For conditionally consistent situations the agent is incompatible with, it explains the incompatibility: any agent that would respond a certain way to particular inputs won’t be put in such a situation. (There might be conditionally consistent situations where compatibility isn’t dependent on responses to inputs, ie. only agents running particular source code are placed in a particular position, but UDT isn’t designed to optimise for this)
  • Similarly, UDT predictors don’t actually predict what an agent does in a situation, but what an agent does when given an input representing a situation. This is a broader concept that allows them to predict behaviours in inconsistent situations. For a more formal explanation of these ideas, see Logical Counterfactuals for Perfect Predictors.