# Ω 7

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is a follow-up to a previous post on critical agentialism, to explore the straightforward decision-theoretic consequences. I call this subjective implication decision theory, since the agent is looking at the logical implications of their decision according to their beliefs.

We already covered observable action-consequences. Since these are falsifiable, they have clear semantics in the ontology. So we will in general assume observable rewards, as in reinforcement learning, while leaving un-observable goals for later work.

Now let's look at a sequence of decision theory problems. We will assume, as before, the existence of some agent that falsifiably believes itself to run on at least one computer, C.

## 5 and 10

Assume the agent is before a table containing a 5 dollar bill and a 10 dollar bill. The agent will decide which dollar bill to take. Thereafter, the agent will receive a reward signal: 5 if the 5 dollar bill is taken, and 10 if the 10 dollar bill is taken.

The agent may have the following beliefs about action-consequences: "If I take action 5, then I will get 5 reward. If I take action 10, then I will get 10 reward." These beliefs follow directly from the problem description. Notably, the beliefs include beliefs about actions that might not actually be taken; it is enough that these actions are possible that their consequences are falsifiable.

Now, how do we translate these beliefs about action-consequences into decisions? The most straightforward way to do so is to select the policy that is believed to return the most reward. (This method is ambiguous under conditions of partial knowledge, though that is not a problem for 5 and 10).

This method (which I will call "subjective implication decision theory") yields the action 10 in this case.

This is all extremely straightforward. We directly translated the problem description into a set of beliefs about action consequences. And these beliefs, along with the rule of subjective causal decision theory, yield an optimal action.

The difficulty of 5 and 10 comes when the problem is naturalized. The devil is in the details: how to naturalize the problem? The previous post examined a case of both external and internal physics, compatible with free will. There is no obvious obstacle to translating these physical beliefs to the 5 and 10 case: the dollar bills may be hypothesized to follow physical laws, as may the computer C.

Realistically, the agent should assume that the proximate cause of the selection of the dollar bill is not their action, but C's action. Recall that the agent falsifiably believes it runs on C, in the sense that its observations/actions necessarily equal C's.

Now, "I run on C" implies in particular: "If I select 'pick up the 5 dollar bill' at time t, then C does. If I select 'pick up the 10 dollar bill' at time t, then C does." And the assumption that C controls the dollar bill implies: "If C selects 'pick up the 5 dollar bill at time t', then the 5 dollar bill will be held at some time between t and t+k", and also for the 10 dollar bill (for some k that is an upper bound of the time it takes for the dollar bill to be picked up). Together, these beliefs imply: "If I select 'pick up the 5 dollar bill' at time t, then the 5 dollar bill will be held at some time between t and t+k", and likewise for the 10 dollar bill. At this point, the agent's beliefs include ones quite similar to the ones in the non-naturalized case, and so subjective implication decision theory selects the 10 dollar bill.

## Twin prisoner's dilemma

Consider an agent that believes itself to run on computer C. It also believes there is another computer, C', which has identical initial state and dynamics to C.

Each computer will output an action; the agent will receive 10 reward if C' cooperates plus 1 reward if C defects (receiving 0 reward for defection).

As in 5 and 10, the agent believes: "If I cooperate, C cooperates. If I defect, C defects." However, this does not specify the behavior of C' as a function of the agent's action.

It can be noted at this point that, because the agent believes C' has identical initial state and dynamics to C, the agent believes (falsifiably) that C' must output the same actions as C on each time step, as long as C and C' receive idential observations. Since, in this setup, observations are assumed to be equal until C receives the reward (with C' perhaps receiving a different reward), these beliefs imply: "If I cooperate, C' cooperates. If I defect, C' defects".

In total we now have: "If I cooperate, C and C' both cooperate. If I defect, C and C' both defect". Thus the agent believes itself to be straightforwardly choosing between a total reward of 10 for cooperation, and a total of 1 reward for defection. And so subjective implication decision theory cooperates.

Note that this comes apart from the conventional interpretation of CDT, which considers interventions on C's action, rather than on "my action". CDT's hypothesized intervention updates C but not C', as C and C' are physically distinct.

## Newcomb's problem

This is very much similar to twin prisoner's dilemma. The agent may falsifiably believe: "The Predictor filled box A with \$1,000,000 if and only if I will choose only box A." From here it is straightforward to derive that the agent believes: "If I choose to take only box A, then I will have \$1,000,000. If I choose to take both boxes, then I will have \$1,000." Hence subjective CDT selects only box A.

The usual dominance argument for selecting both boxes does not apply. The agent is not considering interventions on C's action, but rather on "my action", which is falsifiably predicted to be identical with C's action.

## Counterfactual mugging

In this problem, a Predictor flips a coin; if the coin is heads, the Predictor asks the agent for \$10 (and the agent may or may not give it); if the coin is tails, the Predictor gives the agent \$1,000,000 iff the Predictor predicts the agent would have given \$10 in the heads case.

We run into a problem with translating this to a critical agential ontology. Since both branches don't happen in the same world, it is not possible to state the Predictor's accuracy as a falsifiable statement, as it relates two incompatible branches.

To avoid this problem, we will say that the Predictor predicts the agent's behavior ahead of time, before flipping the coin. This prediction is not told to the agent in the heads case.

Now, the agent falsifiably believes the following:

• If the coin is heads, then the Predictor's prediction is equal to my choice.
• If the coin is tails, then I get \$1,000,000 if the Predictor's prediction is that I'd give \$10, otherwise \$0.
• If the coin is heads, then I get \$0 if I don't give the predictor \$10, and -\$10 if I do give the predictor \$10.

From the last point, it is possible to show that, after the agent observes heads, the agent believes they get \$0 if they don't give \$10, and -\$10 if they do give \$10. So subjective implication decision theory doesn't pay.

This may be present a dynamic inconsistency in that the agent's decision does not agree with what they would previously have wished they would decide. Let us examine this.

In a case where the agent chooses their action before the coin flip, the agent believes that, if they will pay up, the Predictor will predict this, and likewise for not paying up. Therefore, the agent believes they will get \$1,000,000 if they decide to pay up and then the coin comes up tails.

If the agent weights the heads/tails branches evenly, then the agent will decide to pay up. This presents a dynamic inconsistency.

My sense is that this inconsistency should be resolved by considering theories of identity other than closed individualism. That is, it seems possible that the abstraction of receiving an observation and taking on action on each time step, while having a linear lifetime, is not a good-enough fit for the counterfactual mugging problem to achieve dynamic consistency.

## Conclusion

It seems that subjective implication decision theory agrees with timeless decision theory and evidential decision theory on the problems considered, while diverging from causal decision theory and functional decision theory.

I consider this a major advance, in that the ontology is more cleanly defined than the ontology of timeless decision theory, which considers interventions on logical facts. It is not at all clear what it means to "intervene on a logical fact"; the ontology of logic does not natively contain the affordance of intervention. The motivation for considering logical interventions was the belief that the agent is identical with some computation, such that its actions are logical facts. Critical agential ontology, on the other hand, does not say the agent is identical with any computation, but rather than the agent effectively runs on some computer (which implements some computation), while still being metaphysically distinct. Thus, we need not consider "logical counterfactuals" directly; rather, we consider subjective implications, and consider whether these subjective implications are consistent with the agent effectively running on some computer.

To handle cases such as counterfactual mugging in a dynamically consistent way (similar to functional decision theory), I believe that it will be necessary to consider agents outside the closed-individualist paradigm, in which one is assumed to have a linear lifetime with memory and observations/actions on each time step. However, I have not proceeded exploring in this direction yet.

[ED NOTE: After the time of writing I realized subjective implication decision theory, being very similar to proof-based UDT, has problems with spurious counterfactuals by default, but can similarly avoid these problems by "playing chicken with the universe", i.e. taking some action it has proven it will not take.]

# Ω 7

New Comment

I'm kind of tired right now, so I might be missing something obvious, but:

It seems that subjective implication decision theory agrees with timeless decision theory on the problems considered, while diverging from causal decision theory, evidential decision theory, and functional decision theory.

Why do you say that it diverges from evidential decision theory (EDT)? AFAICT on all problems listed it does the same thing as EDT, and the style of reasoning seems pretty similar. Would you mind saying what SIDT would do in XOR mugging? (I'd try to work this out myself but for the aforementioned tiredness and the fear that I don't quite understand SIDT well enough).

Looking back on this, it does seem quite similar to EDT. I'm actually, at this point, not clear on how EDT and TDT differ, except in that EDT has potential problems in cases where it's sure about its own action. I'll change the text so it notes the similarity to EDT.

On XOR blackmail, SIDT will indeed pay up.

I'm new to the whole decision theory thing (I'm a definite one-boxer with Newcomb's) - I've enjoyed this post and the discussions in comments.

It'll take me a while to process - but please let the discussions continue!

Note that there is no such thing as agency or free will in the world where perfect predictors exist. Thus a self-consistent question to ask is not "what decision should I make?" but "what decision am I programmed to make?" If your programming is to two-box, then that is what you will do, nothing you can think or "decide" that would change that. In practice, there is (almost) always a way out of this, for example by using a quantum RNG to make decisions (unless the world is super-deterministic and the predictor has access to the initial condition that resulted in you using the RNG). In this case the best the predictor can do is to conclude that the subject "would use a quantum RNG", and what probability distribution the subject would pick, i.e. the predictable features of the world. I wrote about it ages ago, but our discussion at the time waned without any kind of shared conclusion. Not surprising, after all, if we cannot make decisions, what's the point of decision theories?

Re counterfactual mugging, if you are an embedded agent, the best you can state is that

an agent who does not pay lives in the world where she has higher utility. It does not matter what Omega says or does, or what the 1000th digit of pi is.

Note that there is no such thing as agency or free will in the world where perfect predictors exist.

I am explicitly disputing this. I do not believe assuming there is both free will and perfect predictors runs into contradictions, as long as the agent cannot observe predictions of their future actions. See the first post on the hypothesis "I effectively run on computer C".

[-]TAG30

I do not believe assuming there is both free will and perfect predictors runs into contradictions, as long as the agent cannot observe predictions of their future actions.

The objection is more that perfect predictors cannot operate without determinism, and determinism excludes free will.

I addressed this argument in the previous post.

We see, now, that if free will and determinism are compatible, it is due to limitations on the agent's knowledge. The agent, knowing it runs on C, cannot thereby determine what action it takes at time t, until a later time. And the initial attempt to provide this knowledge externally fails.

The way I interpret it is "my belief in free will is compatible with determinism", not "I have free will, defined as 'the ability to choose between different possible courses of action unimpeded,' if 'unimpeded' is interpreted as 'unimpeded by an algorithm that runs on C'." I have no objection to something like "my algorithm running on C includes subroutines that generate different possible worlds and evaluate their utility, thus giving me a perception of making choices." However, a perfect predictor would know what your algorithm will do before you are ever instantiated, whether by analyzing the algorithm, by running it on a "virtual Jessica machine" or through some combination of both. In that sense, you are not free to make decisions, but, after being run for some time as an algorithm on C, you get to learn what your algorithm is up to that moment.

My assertion is that, subjectively, at the time of the decision, it is not the case that this decision is "impeded" by the algorithm that C is running. There is no way that, at that time, one could even possibly "be blocked by" C. It is never the case, nor could it possibly be the case, that I try to take action A and am prevented somehow by C. The action taken by C at the time of decision is not even known, so there isn't a paradox created either.

In my view, the conflict between free will and determinism only comes about due to being uncareful about distinguishing different perspectives and paying attention to the ontology and knowledge appropriate to each perspective. E.g. confusing between my perspective and the perspective of a predictor outside me, such that it seems like I should believe my action is determined because the predictor should believe this. Perhaps the main point of my original post is to be very detail-oriented about what a perspective is, and how ontology relativizes to a perspective.

[-]TAG40

. E.g. confusing between my perspective and the perspective of a predictor outside me, such that it seems like I should believe my action is determined because the predictor should believe this.

If a predictor can predict your actions because, from a third person perspective perspective, then you don't have libertarian free will, and all you are left with is the illusion that you do, based on not being able to predict your decisions, as per Thou Art Physics. That notoriously isn't showing that free will as such is compatible with determinism, only that an illusion thereof is.

You're saying the third person perspective is "real" and the first person perspective is "illusory" and I am disputing that. This is a matter of reference frame like in special relativity. Maybe an outside observer believes I am moving slower than I believe I am moving but that doesn't mean either belief is wrong, they're relative to different reference frames.

[-]TAG40

You’re saying the third person perspective is “real” and the first person perspective is “illusory”

I am not saying that the first person perspective is necessarily illusory purely because it is a first person perspective. I am saying that it is trumped by the third person perspective. I you feel like a million dollars, but don't have a million in the bank, you are not a millionaire. That's a standard, uncontentious and usually unstated epistemological assumption behind most rational , science-based thinking.

and I am disputing that.

If you are going to invert the most basic principle of rationality -- "things aren't real just because they seem real to you" -- you should probably make that explicit. BTW, relativity doesn't prove relativism. There's a similarity in the name, that's all.

Interpreting relativism as "thinking something is real makes it real" is a strawman; no individual subject makes such a map-territory error. Relativism means that truth is a two+ place predicate.

Here's analogy. Say it's a few days ago and I haven't made this post. At that time I believe that it has not yet been determined whether I will make the post.

Later I have made the post. Then I believe that it has now already been determined.

Are these beliefs incompatible? No. They're both true in the ordinary common-sense way. And this is possible because they're relative to different reference frames. All spaciotemporal references are indexical (i.e. in some way starting from a spaciotemporally local center of reference), most obviously explicitly indexical ones like "me" and "now".

The relativity/relativism thing isn't just nominal. Relativity means references are resolved differently depending on the time and place of the observer, and paying attention to these different resolutions is critical to avoid paradox.

There is no view from nowhere. All "third person perspectives" are first person perspectives from a different time and place.

[-]TAG20

Interpreting relativism as “thinking something is real makes it real” is a strawman; no individual subject makes such a map-territory error.

People make all sorts of errors.

Relativism means that truth is a two+ place predicate.

Relativism is a family of claims. The specific claim depends on what is in the two places.

Here’s analogy. Say it’s a few days ago and I haven’t made this post. At that time I believe that it has not yet been determined whether I will make the post.

Later I have made the post. Then I believe that it has now already been determined.

Are these beliefs incompatible?

Yes. "is determined" and "is not determined" are incompatible.

Determinism means that the whole future is determined at any point in time. That means that at time T1, your belief that your decision at time T2 is not determined is a false belief.

No. They’re both true in the ordinary common-sense way.

They can't both be true. But someone who doesn't understand determinism could hold both beliefs. That isn't very significant.

And this is possible because they’re relative to different reference frames. All spaciotemporal references are indexical

Says who? Where does it say that determinism is a "spatiotemporal reference"?

The relativity/relativism thing isn’t just nominal. Relativity means references are resolved differently depending on the time and place of the observer, and paying attention to these different resolutions is critical to avoid paradox.

Relativity does not assert that everything is relative to an observer, and in fact insists that some things, such as the speed of light, are not.

Since you don't have a basis for saying that everything is relative, you need a specific reason for asserting that determinism is relative.

There is no view from nowhere. All “third person perspectives” are first person perspectives from a different time and place.

Says who?

It was confusing that I used the word "determined" in the analogy. The meaning is clearer if I say it has "already happened" at the later time and not the previous time.

Since you don’t have a basis for saying that everything is relative, you need a specific reason for asserting that determinism is relative.

Burden of proof issue. I've explicated a metaphysics that claims that whether something has been determined (i.e. is independent of future choices) depends on the standpoint. You claim this metaphysics is wrong. Your argument was that the third-person view overrides the first-person view. My argument is that this is wrong because of, among other things, relativity. Now you're saying I need a "specific reason" for asserting that determinism is relative. But I've refuted your argument, which is that the third-person view in-general overrides the first-person view.

The positive case is the explication of the metaphysics! You have not located a contradiction in it. This doesn't prove it to be true but I have never claimed to have such a proof.

Says who?

It's relatively clear if you think physically (how could any observer even potentially imagine accessing a view that isn't from a place? Their imagining-accessing proceeds starting from their own time and place; see deixis). Besides this, see Nagel and Brian Cantwell Smith, who have more detailed arguments.

[-]TAG10

It was confusing that I used the word “determined” in the analogy. The meaning is clearer if I say it has “already happened” at the later time and not the previous time.

That's clearly true, but it's harder to see the connection to determinism.

You claim this metaphysics is wrong.

I claim it's insufficently supported. You have a version of the claim that applies to spatio-temporal objects, but determinism isn't a spatio temporal object, it's a putative property of the universe as a whole.

Your argument was that the third-person view overrides the first-person view.

I point out that that is what this audience believes, so the burden is on you to argue otherwise.

Now you’re saying I need a “specific reason” for asserting that determinism is relative.

Relativity does not say that determinism is relative, so you need another argument.

But I’ve refuted your argument, which is that the third-person view in-general overrides the first-person view.

Disagreement is not refutation. If you had an argument that proved relativism to be true of everyhthing whatsoever, then that would be a refutation -- but all the arguments you are resting apply only to specific categories.

You have not located a contradiction in it.

Non contradiction is not a sufficient criterion of truth.

This doesn’t prove it to be true but I have never claimed to have such a proof.

So you agree that your theory is insufficiently supported?

It’s relatively clear if you think physically (how could any observer even potentially imagine accessing a view that isn’t from a place

Look at google Earth. This is a solved problem.

Its entirely true that some kinds of objective, mathematical science require things to be indexed to an observer. But,so long as you are dealing with quantifiable physical properties ,it is still possible to predict exactly how things will appear to an observer other than yourself. That kind of thing is much more objective than subjective

On the other hand , there is another set of arguments,such as Nagel's "what is it like to be a bat" ,which tend to the conclusion that subjective experience cannot be captured mathematically at all. You might be able to capture the XYZ coordinates of a bat ,and it's velocity and so on, but that tells you very little about its inner world,its subjective, sensations and feelings.

Qualiaphilic arguments go much further in the direction of "view from nowhere" than physics or maths based arguments. Relativity goes a little further than classsical physics,because it holds a larger set of properties to be observed dependent ... energy velocity and momentum in addition to location. But that still falls very far short of
full subjectivism.

But it's still not clear how Chalmers or Nagel style arguments, that deal with consciousness and subjectivity would relate to determinism.

Perhaps the main point of my original post is to be very detail-oriented about what a perspective is, and how ontology relativizes to a perspective.

I have no argument that in most circumstances this difference in perspectives is essential. However, if you are talking about decision theories, the agents who do not believe that their actions are determined just because the predictor knows this (definitely knows, by definition, not just believes), those agents's algorithms end up two-boxing, because they believe that "their actions are not determined," and so two-boxing is the higher-utility choice. Unless I'm missing something in your argument again. But if not, then my point is that this relativization does not make a better decision-making algorithm.

The agents I'm considering one-box, as shown in this post. This is because the agent logically knows (as a consequence of their beliefs) that, if they take 1 box, they get \$1,000,000, and if they take both boxes, they get \$1000. That is, the agent believes the contents of the box are a logical consequence of their own action.