Modeling naturalized decision problems in linear logic

by jessicata6 min read6th May 20202 comments


Ω 6

Decision TheoryAI
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

The following is a model of a simple decision problem (namely, the 5 and 10 problem) in linear logic. Basic familiarity with linear logic is assumed (enough to know what it means to say linear logic is a resource logic), although knowing all the operators isn't necessary.

The 5 and 10 problem is, simply, a choice between taking a 5 dollar bill and a 10 dollar bill, with the 10 dollar bill valued more highly.

While the problem itself is trivial, the main theoretical issue is in modeling counterfactuals. If you took the 10 dollar bill, what would have happened if you had taken the 5 dollar bill? If your source code is fixed, then there isn't a logically coherent possible world where you took the 5 dollar bill.

I became interested in using linear logic to model decision problems due to noticing a structural similarity between linear logic and the real world, namely irreversibility. A vending machine may, in linear logic, be represented as a proposition "$1 → CandyBar", encoding the fact that $1 may be exchanged for a candy bar, being consumed in the process. Since the $1 is consumed, the operation is irreversible. Additionally, there may be multiple options offered, e.g. "$1 → Gumball", such that only one option may be taken. (Note that I am using "→" as notation for linear implication.)

This is a good fit for real-world decision problems, where e.g. taking the $10 bill precludes also taking the $5 bill. Modeling decision problems using linear logic may, then, yield insights regarding the sense in which counterfactuals do or don't exist.

First try: just the decision problem

As a first try, let's simply try to translate the logic of the 5 and 10 situation into linear logic. We assume logical atoms named "Start", "End", "$5", and "$10". Respectively, these represent: the state of being at the start of the problem, the state of being at the end of the problem, having $5, and having $10.

To represent that we have the option of taking either bill, we assume the following implications:

TakeFive : Start → End ⊗ $5

TakeTen : Start → End ⊗ $10

The "⊗" operator can be read as "and" in the sense of "I have a book and some cheese on the table"; it combines multiple resources into a single linear proposition.

So, the above implications state that it is possible, starting from the start state, to end up in the end state, yielding $5 if you took the five dollar bill, and $10 if you took the 10 dollar bill.

The agent's goal is to prove "Start → End ⊗ $X", for X as high as possible. Clearly, "TakeTen" is a solution for X = 10. Assuming the logic is consistent, no better proof is possible. By the Curry-Howard isomorphism, the proof represents a computational strategy for acting in the world, namely, taking the $10 bill.

Second try: source code determining action

The above analysis is utterly trivial. What makes the 5 and 10 problem nontrivial is naturalizing it, to the point where the agent is a causal entity similar to the environment. One way to model the agent being a causal entity is to assume that it has source code.

Let "M" be a Turing machine specification. Let "Ret(M, x)" represent the proposition that M returns x. Note that, if M never halts, then Ret(M, x) is not true for any x.

How do we model the fact that the agent's action is produced by a computer program? What we would like to be able to assume is that the agent's action is equal to the output of some machine M. To do this, we need to augment the TakeFive/TakeTen actions to yield additional data:

TakeFive : Start → End ⊗ $5 ⊗ ITookFive

TakeTen : Start → End ⊗ $10 ⊗ ITookTen

The ITookFive / ITookTen propositions are a kind of token assuring that the agent ("I") took five or ten. (Both of these are interpreted as classical propositions, so they may be duplicated or deleted freely).

How do we relate these propositions to the source code, M? We will say that M must agree with whatever action the agent took:

MachineFive : ITookFive → Ret(M, "Five")

MachineTen : ITookTen → Ret(M, "Ten")

These operations yield, from the fact that "I" have taken five or ten, that the source code "M" eventually returns a string identical with this action. Thus, these encode the assumption that "my source code is M", in the sense that my action always agrees with M's.

Operationally speaking, after the agent has taken 5 or 10, the agent can be assured of the mathematical fact that M returns the same action. (This is relevant in more complex decision problems, such as twin prisoner's dilemma, where the agent's utility depends on mathematical facts about what values different machines return)

Importantly, the agent can't use MachineFive/MachineTen to know what action M takes before actually taking the action. Otherwise, the agent could take the opposite of the action they know they will take, causing a logical inconsistency. The above construction would not work if the machine were only run for a finite number of steps before being forced to return an answer; that would lead to the agent being able to know what action it will take, by running M for that finite number of steps.

This model naturally handles cases where M never halts; if the agent never executes either TakeFive or TakeTen, then it can never execute either MachineFive or MachineTen, and so cannot be assured of Ret(M, x) for any x; indeed, if the agent never takes any action, then Ret(M, x) isn't true for any x, as that would imply that the agent eventually takes action x.

Interpreting the counterfactuals

At this point, it's worth discussing the sense in which counterfactuals do or do not exist. Let's first discuss the simpler case, where there is no assumption about source code.

First, from the perspective of the logic itself, only one of TakeFive or TakeTen may be evaluated. There cannot be both a fact of the matter about what happens if the agent takes five, and a fact of the matter about what happens if the agent takes ten. This is because even defining both facts at once requires re-using the Start proposition.

So, from the perspective of the logic, there aren't counterfactuals; only one operation is actually run, and what "would have happened" if the other operation were run is undefinable.

On the other hand, there is an important sense in which the proof system contains counterfactuals. In constructing a linear logic proof, different choices may be made. Given "Start" as an assumption, I may prove "End ⊗ $5" by executing TakeFive, or "End ⊗ $10" by executing TakeTen, but not both.

Proof systems are, in general, systems of rules for constructing proofs, which leave quite a lot of freedom in which proofs are constructed. By the Curry-Howard isomorphism, the freedom in how the proofs are constructed corresponds to freedom in how the agent behaves in the real world; using TakeFive in a proof has the effect, if executed, of actually (irreversibly) taking the $5 bill.

So, we can say, by reasoning about the proof system, that if TakeFive is run, then $5 will be yielded, and if TakeTen is run, then $10 will be yielded, and only one of these may be run.

The logic itself says there can't be a fact of the matter about both what happens if 5 is taken and if 10 is taken. On the other hand, the proof system says that both proofs that get $5 by taking 5, and proofs that get $10 by taking 10, are possible.

How to interpret this difference? One way is by asserting that the logic is about the territory, while the proof system is about the map; so, counterfactuals are represented in the map, even though the map itself asserts that there is only a singular territory.

And, importantly, the map doesn't represent the entire territory; it's a proof system for reasoning about the territory, not the territory itself. The map may, thus, be "looser" than the territory, allowing more possibilities than could possibly be actually realized.

What prevents the map from drawing out logical implications to the point where it becomes clear that only one action may possibly be taken? Given the second-try setup, the agent simply cannot use the fact of their source code being M, until actually taking the action; thus, no amount of drawing implications can conclude anything about the relationship between M and the agent's action. In addition to this, reasoning about M itself becomes harder the longer M runs, i.e. the longer the agent is waiting to make the decision; so, simply reasoning about the map, without taking actions, need not conclude anything about which action will be taken, leaving both possibilities live until one is selected.


This approach aligns significantly with the less-formal descriptions given of subjective implication decision theory and counterfactual nonrealism. Counterfactuals aren't real in the sense that they are definable after having taken the relevant action; rather, an agent in a state of uncertainty about which action it will take may consider multiple possibilities as freely selectable, even if they are assured that their selection will be equal to the output of some computer program.

The linear logic formalization increases my confidence in this approach, by providing a very precise notion of the sense in which the counterfactuals do and don't exist, which would be hard to make precise without similar formalism.

I am, at this point, less worried about the problems with counterfactual nonrealism (such as global accounting) than I was when I wrote the post, and more worried about the problems of policy-dependent source code (which requires the environment to be an ensemble of deterministic universes, rather than a single one), such that I have updated towards counterfactual nonrealism as a result of this analysis, although I am still not confident.

Overall, I find linear logic quite promising for modeling embedded decision problems from the perspective of an embedded agent, as it builds critical facts such as non-reversibility into the logic itself.

Appendix: spurious counterfactuals

The following describes the problem of spurious counterfactuals in relation to the model.

Assume the second-try setup. Suppose the agent becomes assured that Ret(M, "Five"); that is, that M returns the action "Five". From this, it is provable that the agent may, given Start, attain the linear logic proposition 0, by taking action "Ten" and then running MachineTen to get Ret(M, "Ten"), which yields inconsistency with Ret(M, "Five"). From 0, anything follows, e.g. $1000000, by the principle of explosion.

If the agent is maximizing guaranteed utility, then they will take the $10 bill, to be assured of the highest utility possible. So, it cannot be the case that the agent can be correctly assured that they will take action five, as that would lead to them taking a different action.

If, on the other hand, the agent would have provably taken the $5 bill upon receiving the assurance (say, because they notice that taking the $10 bill could result in the worst possible utility), then there is a potential issue with this assurance being a self-fulfilling prophecy. But, if the agent is constructing proofs (plans for action) so as to maximize guaranteed utility, this will not occur.

This solution is essentially the same as the one given in the paper on UDT with a known search order.


Ω 6