# Ω 7

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

The following is a model of a simple decision problem (namely, the 5 and 10 problem) in linear logic. Basic familiarity with linear logic is assumed (enough to know what it means to say linear logic is a resource logic), although knowing all the operators isn't necessary.

The 5 and 10 problem is, simply, a choice between taking a 5 dollar bill and a 10 dollar bill, with the 10 dollar bill valued more highly.

While the problem itself is trivial, the main theoretical issue is in modeling counterfactuals. If you took the 10 dollar bill, what would have happened if you had taken the 5 dollar bill? If your source code is fixed, then there isn't a logically coherent possible world where you took the 5 dollar bill.

I became interested in using linear logic to model decision problems due to noticing a structural similarity between linear logic and the real world, namely irreversibility. A vending machine may, in linear logic, be represented as a proposition "\$1 → CandyBar", encoding the fact that \$1 may be exchanged for a candy bar, being consumed in the process. Since the \$1 is consumed, the operation is irreversible. Additionally, there may be multiple options offered, e.g. "\$1 → Gumball", such that only one option may be taken. (Note that I am using "→" as notation for linear implication.)

This is a good fit for real-world decision problems, where e.g. taking the \$10 bill precludes also taking the \$5 bill. Modeling decision problems using linear logic may, then, yield insights regarding the sense in which counterfactuals do or don't exist.

## First try: just the decision problem

As a first try, let's simply try to translate the logic of the 5 and 10 situation into linear logic. We assume logical atoms named "Start", "End", "\$5", and "\$10". Respectively, these represent: the state of being at the start of the problem, the state of being at the end of the problem, having \$5, and having \$10.

To represent that we have the option of taking either bill, we assume the following implications:

TakeFive : Start → End ⊗ \$5

TakeTen : Start → End ⊗ \$10

The "⊗" operator can be read as "and" in the sense of "I have a book and some cheese on the table"; it combines multiple resources into a single linear proposition.

So, the above implications state that it is possible, starting from the start state, to end up in the end state, yielding \$5 if you took the five dollar bill, and \$10 if you took the 10 dollar bill.

The agent's goal is to prove "Start → End ⊗ \$X", for X as high as possible. Clearly, "TakeTen" is a solution for X = 10. Assuming the logic is consistent, no better proof is possible. By the Curry-Howard isomorphism, the proof represents a computational strategy for acting in the world, namely, taking the \$10 bill.

## Second try: source code determining action

The above analysis is utterly trivial. What makes the 5 and 10 problem nontrivial is naturalizing it, to the point where the agent is a causal entity similar to the environment. One way to model the agent being a causal entity is to assume that it has source code.

Let "M" be a Turing machine specification. Let "Ret(M, x)" represent the proposition that M returns x. Note that, if M never halts, then Ret(M, x) is not true for any x.

How do we model the fact that the agent's action is produced by a computer program? What we would like to be able to assume is that the agent's action is equal to the output of some machine M. To do this, we need to augment the TakeFive/TakeTen actions to yield additional data:

TakeFive : Start → End ⊗ \$5 ⊗ ITookFive

TakeTen : Start → End ⊗ \$10 ⊗ ITookTen

The ITookFive / ITookTen propositions are a kind of token assuring that the agent ("I") took five or ten. (Both of these are interpreted as classical propositions, so they may be duplicated or deleted freely).

How do we relate these propositions to the source code, M? We will say that M must agree with whatever action the agent took:

MachineFive : ITookFive → Ret(M, "Five")

MachineTen : ITookTen → Ret(M, "Ten")

These operations yield, from the fact that "I" have taken five or ten, that the source code "M" eventually returns a string identical with this action. Thus, these encode the assumption that "my source code is M", in the sense that my action always agrees with M's.

Operationally speaking, after the agent has taken 5 or 10, the agent can be assured of the mathematical fact that M returns the same action. (This is relevant in more complex decision problems, such as twin prisoner's dilemma, where the agent's utility depends on mathematical facts about what values different machines return)

Importantly, the agent can't use MachineFive/MachineTen to know what action M takes before actually taking the action. Otherwise, the agent could take the opposite of the action they know they will take, causing a logical inconsistency. The above construction would not work if the machine were only run for a finite number of steps before being forced to return an answer; that would lead to the agent being able to know what action it will take, by running M for that finite number of steps.

This model naturally handles cases where M never halts; if the agent never executes either TakeFive or TakeTen, then it can never execute either MachineFive or MachineTen, and so cannot be assured of Ret(M, x) for any x; indeed, if the agent never takes any action, then Ret(M, x) isn't true for any x, as that would imply that the agent eventually takes action x.

## Interpreting the counterfactuals

At this point, it's worth discussing the sense in which counterfactuals do or do not exist. Let's first discuss the simpler case, where there is no assumption about source code.

First, from the perspective of the logic itself, only one of TakeFive or TakeTen may be evaluated. There cannot be both a fact of the matter about what happens if the agent takes five, and a fact of the matter about what happens if the agent takes ten. This is because even defining both facts at once requires re-using the Start proposition.

So, from the perspective of the logic, there aren't counterfactuals; only one operation is actually run, and what "would have happened" if the other operation were run is undefinable.

On the other hand, there is an important sense in which the proof system contains counterfactuals. In constructing a linear logic proof, different choices may be made. Given "Start" as an assumption, I may prove "End ⊗ \$5" by executing TakeFive, or "End ⊗ \$10" by executing TakeTen, but not both.

Proof systems are, in general, systems of rules for constructing proofs, which leave quite a lot of freedom in which proofs are constructed. By the Curry-Howard isomorphism, the freedom in how the proofs are constructed corresponds to freedom in how the agent behaves in the real world; using TakeFive in a proof has the effect, if executed, of actually (irreversibly) taking the \$5 bill.

So, we can say, by reasoning about the proof system, that if TakeFive is run, then \$5 will be yielded, and if TakeTen is run, then \$10 will be yielded, and only one of these may be run.

The logic itself says there can't be a fact of the matter about both what happens if 5 is taken and if 10 is taken. On the other hand, the proof system says that both proofs that get \$5 by taking 5, and proofs that get \$10 by taking 10, are possible.

How to interpret this difference? One way is by asserting that the logic is about the territory, while the proof system is about the map; so, counterfactuals are represented in the map, even though the map itself asserts that there is only a singular territory.

And, importantly, the map doesn't represent the entire territory; it's a proof system for reasoning about the territory, not the territory itself. The map may, thus, be "looser" than the territory, allowing more possibilities than could possibly be actually realized.

What prevents the map from drawing out logical implications to the point where it becomes clear that only one action may possibly be taken? Given the second-try setup, the agent simply cannot use the fact of their source code being M, until actually taking the action; thus, no amount of drawing implications can conclude anything about the relationship between M and the agent's action. In addition to this, reasoning about M itself becomes harder the longer M runs, i.e. the longer the agent is waiting to make the decision; so, simply reasoning about the map, without taking actions, need not conclude anything about which action will be taken, leaving both possibilities live until one is selected.

## Conclusion

This approach aligns significantly with the less-formal descriptions given of subjective implication decision theory and counterfactual nonrealism. Counterfactuals aren't real in the sense that they are definable after having taken the relevant action; rather, an agent in a state of uncertainty about which action it will take may consider multiple possibilities as freely selectable, even if they are assured that their selection will be equal to the output of some computer program.

The linear logic formalization increases my confidence in this approach, by providing a very precise notion of the sense in which the counterfactuals do and don't exist, which would be hard to make precise without similar formalism.

I am, at this point, less worried about the problems with counterfactual nonrealism (such as global accounting) than I was when I wrote the post, and more worried about the problems of policy-dependent source code (which requires the environment to be an ensemble of deterministic universes, rather than a single one), such that I have updated towards counterfactual nonrealism as a result of this analysis, although I am still not confident.

Overall, I find linear logic quite promising for modeling embedded decision problems from the perspective of an embedded agent, as it builds critical facts such as non-reversibility into the logic itself.

## Appendix: spurious counterfactuals

The following describes the problem of spurious counterfactuals in relation to the model.

Assume the second-try setup. Suppose the agent becomes assured that Ret(M, "Five"); that is, that M returns the action "Five". From this, it is provable that the agent may, given Start, attain the linear logic proposition 0, by taking action "Ten" and then running MachineTen to get Ret(M, "Ten"), which yields inconsistency with Ret(M, "Five"). From 0, anything follows, e.g. \$1000000, by the principle of explosion.

If the agent is maximizing guaranteed utility, then they will take the \$10 bill, to be assured of the highest utility possible. So, it cannot be the case that the agent can be correctly assured that they will take action five, as that would lead to them taking a different action.

If, on the other hand, the agent would have provably taken the \$5 bill upon receiving the assurance (say, because they notice that taking the \$10 bill could result in the worst possible utility), then there is a potential issue with this assurance being a self-fulfilling prophecy. But, if the agent is constructing proofs (plans for action) so as to maximize guaranteed utility, this will not occur.

This solution is essentially the same as the one given in the paper on UDT with a known search order.

# Ω 7

New Comment

I haven't tried as hard as I could have to understand, so, sorry if this comment is low quality.

But I currently don't see the point of employing linear logic in the way you are doing it.

The appendix suggests that the solution to spurious counterfactuals here is the same as known ideas for resolving that problem. Which seems right to me. So solving spurious counterfactuals isn't the novel aspect here.

But then I'm confused why you focus on 5&10 in the main body of the post, since that's the main point of the 5&10 problem.

Maybe 5&10 is just a super simple example to illustrate things. But then, I don't know what it is you are illustrating. What is linear logic doing for you that you could not do some other way?

I have heard the suggestion that linear logic should possibly be used to aid in the difficulties of logical counterfactuals, before. But (somehow) those suggestions seemed to be doing something more radical. Spurious counterfactuals were supposed to be blocked by something about the restrictive logic. By allowing the chosen action to be used only once (since it gets consumed when used), something nicer is supposed to happen, perhaps avoiding problematic self-referential chains of reasoning.

(As I see it at the moment, linear logic seems to -- if anything -- work against the kind of thing we typically want to achieve. If you can use "my program, when executed, outputs 'one-box'" only once, you can't re-use the result both within Omega's thinking and within the physical choice of box. So linear logic would seem to make it hard to respect logical correlations. Of course this doesn't happen for your proposal here, since you treat the program output as classical.)

But your use (iiuc!) seems less radical. You are kind of just using linear logic as a way to specify a world model. But I don't see what this does for you. What am I missing?

CDT and EDT have known problems on 5 and 10. TDT/UDT are insufficiently formalized, and seem like they might rely on known-to-be-unfomalizable logical counterfactuals.

So 5 and 10 isn't trivial even without spurious counterfactuals.

What does this add over modal UDT?

• No requirement to do infinite proof search
• More elegant handling of multi-step decision problems
• Also works on problems where the agent doesn't know its source code (of course, this prevents logical dependencies due to source code from being taken into account)

Philosophically, it works as a nice derivation of similar conclusions to modal UDT. The modal UDT algorithm doesn't by itself seem entirely well-motivated; why would material implication be what to search for? On the other hand, every step in the linear logic derivation is quite natural, building action into the logic, and encoding facts about what the agent can be assured of upon taking different actions. This makes it easier to think clearly about what the solution says about counterfactuals, e.g. in a section of this post.