Using modal fixed points to formalize logical causality

[-]Benya_Fallenstein11yΩ470

I know this is supposed to be just introductory, but I actually think that the complete reformulation of UDT-with-a-halting-oracle in terms of modal logic is really interesting! For starters, it allows us to compare UDT and modal agents in the same framework (with the right $o$ 's, we can see this version of UDT as a modal agent). It would also be really neat if we could write an "interpreter" that allows us to write UDT as a program calling a halting oracle, and then evaluate what it does by way of modal logic.

But also, it allows us to give a nice definition of "decision theory" and "decision problem" in the context with halting oracles. I was planning on posting soon about the definition that I showed you when you were visiting, which is designed for actually computable agents with bounded proof lengths, and is more complicated because of that. As a stepping stone to that, using the provability logic framework, I think we can define:

a decision theory to be a set of fully modalized formulas $F_{i} (a_{1}, \dots, a_{m}, o_{1}, \dots, o_{n})$ , for $i = 1, \dots, m$ (fully modalized meaning that the propositional arguments only appear inside boxes);
and a decision problem to be a set of formulas $G_{j} (a_{1}, \dots, a_{m})$ , for $j = 1, \dots, n$ , which do not need to be modalized (which isn't a problem because they're not self-referential).

The condition that $F_{i}$ must be fully modalized, but $G_{j}$ doesn't need to be, seems to be the natural thing corresponding to how in the bounded case, we allow the universe to run the agent, but the agent must use abstract reasoning about the universe, it can't just run it.

Given such a set, we can define ${~ F}_{i} (a_{1}, \dots, a_{m})$ , for $i = 1, \dots, m$ , as follows: ${~ F}_{i} (a_{1}, \dots, a_{m}) := F_{i} (a_{1}, \dots, a_{m}, G_{1} (a_{1}, \dots, a_{m}), \dots, G_{n} (a_{1}, \dots, a_{m})) .$ Then the actual action taken by decision theory $F$ when run on the decision problem $G$ is the modal fixed point of the equations $A_{i} \leftrightarrow {~ F}_{i} (A_{1}, \dots, A_{m})$ , for $i = 1, \dots, m$ .

[-]cousin_it11yΩ000

Yes, modal logic seems to be the most natural setting for these kinds of ideas. Also the "chicken rule" from the usual oracle formulations is gone now, I can't remember why we needed it anymore.

[-]abramdemski11yΩ000

In a probabilistic setting (with a prior over logical theories), EDT wants to condition on the possible actions with a Bayesian conditional, in order to then find the expected utility of each action.

If the agent can prove that it will take a particular action, then conditioning on this action may yield inconsistent stuff (divide-by-zero error for Bayesian conditional). This makes the result ill-defined.

The chicken rule makes this impossible, ensuring that the conditional probabilities are well defined.

So, the chicken rule at least seems useful for EDT.

[-]Benya_Fallenstein11yΩ000

A model of UDT with a halting oracle searches only for one utility value for each action. I'm guessing the other formulation just wasn't obvious at the time? (I don't remember realizing the possibility of playing chicken implicitly before Will Sawin advertised it to me, though I think he attributed it to you.)

[-]Benya_Fallenstein11yΩ120

However, the approach is designed so that these "spurious" logical implications are unprovable, so they don’t interfere with decision-making. The proof of that is left as an easy exercise.

I don't think this is technically true as stated; it seems to be possible that the agent proves some spurious counterfactuals as long as the outcome it does in fact obtain is the best possible one. (This is of course harmless!) Say the agent has two possible actions, ${¯ a}^{(5)}$ and ${¯ a}^{(10)}$ , leading to outcomes ${¯ o}^{(5)}$ and ${¯ o}^{(10)}$ , respectively. The latter is preferred, and these are the only two outcomes. Suppose that ${¯ a}^{(10)}$ happens to be lexicographically lower than ${¯ a}^{(5)}$ in the agent's reasoning. Then it seems to be provable that the agent will in fact choose ${¯ a}^{(10)}$ , meaning that it's provable that it won't choose ${¯ a}^{(5)}$ , meaning that it finds both $({¯ a}^{(5)}, {¯ o}^{(5)})$ and the spurious $({¯ a}^{(5)}, {¯ o}^{(10)})$ in the first step.

So I think the correct statement is a disjunction: The agent obtains the highest possible outcome or it finds no spurious counterfactuals.

[-]orthonormal11yΩ000

Ooh, nice: we don't need to eliminate all spurious counterfactuals, only the malignant ones!

[-]cousin_it11yΩ000

Yes, that's correct. Thanks!

[-]abramdemski11yΩ000

This can evidently deal with probabilistic settings; the $\to o$ would encode an expected value which the agent is trying to maximize, with the obvious preference ordering. It would get unrealistic for very complicated situations, since expected values are very often intractable and must be approximated. Given that we know the exact probabilistic model, this is a sort of logical uncertainty.

How might this be modified to deal with logical uncertainty?

A simple idea that comes to mind is to modify the procedure to look for lower bounds on expected values, rather than trying to prove exact expected values. The procedure of choosing the best & breaking ties remains unmodified. (I'm not really trying to think about the modalized form at the moment.)

This should work for a lot more cases, but still has a feeling of being potentially unrealistic. It doesn't take advantage of any approaches to logical uncertainty that have been discussed.

[-]abramdemski11yΩ000

One really nice thing about the approach is that we do not have to identify the agent within the universe. All that is needed is the logical implications of supposing that the agent takes various actions. As a result, the agent will treat programs provably equivalent to its own as copies of itself with no fuss.

We can take a prior on all programs as our universe, with no special interface between the agent and the program as in AIXI. (We do need to somehow specify utilities on programs, which is a non-obvious procedure.) The agent then searches for proofs that if it takes some particular action, then at least some particular expected utility is achieved. There are no longer finitely many possible outcomes, so modal tricks to do this without a halting oracle won't work. Instead, the agent must be doing this for some bounded time. It then takes the action with highest proven utility.

[-]abramdemski11yΩ000

This implicitly contains a sort of chicken rule, since if the agent can prove that it will not take a particular $\to a$ , it can proceed to prove arbitrarily good $(\to a, \to o)$ for that $\to a$ . So, it will want to take that action.

I wouldn't really call this logical causality, by the way; to me, that suggests the ability to take arbitrarily mathematical counterfactuals ("What if $π = 3$ ?"), not only counterfactuals in service of actions.

LESSWRONG
LW

LESSWRONG
LW

21

Using modal fixed points to formalize logical causality

21

Ω 9

21

Ω 9