This work was inspired by a question by Vanessa Kosoy, who also contributed several of the core ideas, as well as feedback and mentorship.

Abstract

We outline a computationalist interpretation of quantum mechanics, using the framework of infra-Bayesian physicalism. Some epistemic and normative aspects of this interpretation are illuminated by a number of examples and theorems.

1. Introduction

Infra-Bayesian physicalism was introduced as a framework to investigate the relationship between a belief about a joint computational-physical universe and a corresponding belief about which computations are realized in the physical world, in the context of "infra-beliefs". Although the framework is still somewhat tentative and the definitions are not set in stone, it is interesting to explore applications in the case of quantum mechanics.

1.1. Discussion of the results

Quantum mechanics has been notoriously difficult to interpret in a fully satisfactory manner. Investigating the question through the lens of computationalism, and more specifically in the setting of infra-Bayesian physicalism provides a new perspective on some of the questions via its emphasis on formalizing aspects of metaphysics, as well as its focus on a decision-theoretic approach. Naturally, some questions remain, and some new interesting questions are raised by this framework itself.

The toy setup can be described on the high level as follows (with details given in Sections 2 to 4). We have an "agent": in this toy model simply consisting of a policy, and a memory tape to record observations. The agent interacts with a quantum mechanical "environment": performing actions and making observations. We assume the entire agent-environment system evolves unitarily. We'll consider the agent having complete Knightian uncertainty over its own policy, and for each policy the agent's beliefs about the "universe" (the joint agent-environment system) is given by the Born rule for each observable, without any assumption on the correlation between observables (formally given by the free product). We can then use the key construction in infra-Bayesian physicalism — the bridge transform — to answer questions about the agent's corresponding beliefs about what copies of the agent (having made different observations) are instantiated in the given universe.

In light of the falsity of Claims 4.15 and 4.17, we can think of the infra-Bayesian physicalist setup as a form of many-worlds interpretation. However, unlike the traditional many-worlds interpretation, we have a meaningful way of assigning probabilities to (sets of) Everett branches, and Theorem 4.19 shows statistical consistency with the Copenhagen interpretation. In contrast with the Copenhagen interpretation, there is no "collapse", but we do assume a form of the Born rule as a basic ingredient in our setup. Finally, in contrast with the de Broglie–Bohm interpretation, the infra-Bayesian physicalist setup does not privilege particular observables, and is expected to extend naturally to relativistic settings. See also Section 8 for further discussion on properties that are specific to the toy setting and ones that are more inherent to the framework. It is worth pointing out that the author is not an expert in quantum interpretations, so a lot of opportunities are left open for making connections with the existing literature on the topic.

1.2. Outline

In Section 2 we describe the formal setup of a quantum mechanical agent-environment system. In Section 3 we recall some of the central constructions in infra-Bayesian physicalism, then in Section 4 we apply this framework to the agent-environment system. In Sections 4.2 and 4.3 we write down various statements relating quantities arising in the infra-Bayesian physicalist framework to the Copenhagen interpretation of quantum mechanics. While Section 4.2 focuses on "epistemic" statements, Section 4.3 is dedicated to the "normative" aspects. A general theme in both sections is that the stronger, "on the nose" relationships between the interpretations fail, while certain weaker "asymptotic" relationships hold. In Section 5.1 we construct counterexamples to the stronger claims, and in in Sections 6 and 7 we prove the weaker claims relating the interpretations. In Section 8 we discuss which aspects of our setup are for the sake of simplicity in the toy model, and which are properties of the broader theory.

2. Setup

First, we'll describe a standard abstract setup for a simplified agent-environment joint system. We have the following ingredients:

A finite set A of possible actions of the agent.

A finite set O of possible observations of the agent.
We'll write E=O×A, the set of observation-action pairs.

For technical reasons it will be convenient to add a symbol 0 for "blank", and fix a bijection E+=E⊔{0}≅Z/N
preserving 0, where N=|O|⋅|A|+1.
We'll use this bijection to treat E+ as an abelian group implicitly.

A Hilbert space He corresponding to states of the environment.

Fix a finite time horizon^{[1]}T∈N. A classical state of a cyclic, length T memory tape is a function τ:Z/T→E+. Let TpT be the set of all classical tape states.

A Hilbert space Hg with orthonormal basis ∣∣ψgτ⟩ for τ∈Tp, corresponding to the quantum state of the agent.

For each a∈A a unitary map of the environment Ua:He→He, describing the "result of the action".

A projection-valued measure P on O, valued in He (giving projections Po:He→He for each observation o∈O).

Let H=Hg⊗He be the state space of the joint agent-environment system.

Remark 2.1. It would be interesting to consider a setting where the agent is allowed to choose the observation in each step (e.g. have the projection-valued measure P depend on the action taken). For simplicity we'll work with a fixed observation as described above.

Definition 2.2. Let O≤T=t≤T⨆t∈NOtE≤T=t≤T⨆t∈NEt be the set of observation histories and observation-action histories respectively, i.e. finite strings of observations (resp. observation-action pairs) up to length T. There's a natural map obs:E≤T→O≤T, extracting the string of observations from a string of observation-action pairs. We'll call a function π:O≤T→A a policy. For two histories h1,h2 (of either type), we'll sometimes write h1⊏h2 to mean h1 is a (not necessarily proper) prefix (i.e. initial substring) of h2.

Remark 2.3. We only consider deterministic policies here. It's not immediately clear how one would generalize Definition 2.7 to randomized policies. In fact, we can always (and is perhaps more principled to) think of our source of randomness for a randomized policy to be included in the environment, so we don't lose out on generality by only considering deterministic policies. For example, if the source of our randomness is a quantum coin flip, then our approach offers a convenient way of modeling this by including the coin as a factor of He, i.e. part of the environment subsystem.

Definition 2.4. For a tape state τ:Z/T→E+ and an observation-action pair ε∈E, let mem(τ,ε):Z/T→E+ be the state of the tape after writing the pair ε to the tape, defined by mem(τ,ε)(n)={τ(n−1)n≠0τ(−1)+εn=0.

Remark 2.5. Choosing a group structure on E+ is in order to make the map mem(−,ε):Tp→Tp invertible, which in turn makes the map Uoπ in Definition 2.7 unitary.

Definition 2.6. Let the "history extraction" map hist:Tp→E≤T be defined by hist(τ)=(τ(N−1),…,τ(0))∈EN, where 0≤N≤T is largest such that there's no 0≤n<N with τ(n)=0 (i.e. so that the [0,N) portion of the tape contains no blanks).

Definition 2.7 (Time evolution of a policy). For each policy π:O≤T→A, we define the single time-step unitary evolution operator Uπ on H as the composite of an "observation" and an "action" operator Uπ=UA,π∘UO,π, where UO,π(∣∣ψgτ⟩⊗Po|ψe⟩)=∣∣ψgmem(τ,(o,a))⟩⊗Po|ψe⟩for all o∈OUA,π(∣∣ψgτ⟩⊗|ψe⟩)=∣∣ψgτ⟩⊗Ua|ψe⟩for a=π(obs(hist(τ)))
The time evolution after t∈N time-steps is given by Utπ=Uπ∘…∘Uπ, i.e. Uπ composed with itself t times.

Remark 2.8. As defined above, the first step in the evolution is an observation, so we never use the value of the policy on the empty observation string. In this respect it would be more natural to start with an action instead, but it would make some of the notation and the examples more cumbersome, so we sacrifice a bit of naturality for the sake of simplicity overall.

Lemma 2.9. The operator Uπ is unitary on H.

Proof. The operator UA,π is clearly unitary since each Ua is. We can see that UO,π is unitary as follows. Choose an orthonormal basis ∣∣ψeo,i⟩ of PoHe for each o∈O, so together they form an orthonormal basis for He (note that the range of i might vary for varying o). Then ∣∣ψgτ⟩⊗∣∣ψeo,i⟩ forms an orthonormal basis for H, and UO,π permutes this basis, hence is unitary.□

3. Prerequisites

We recall some definitions and lemmas within infra-Bayesianism. This is in order to make the current article fairly self-contained, all the relevant notions here were introducted in [IBP], [BIMT] and [LBIMT]. In particular we omit proofs in this section, all the relevant proofs can be found in the articles listed.

3.1. Ultracontributions

First of all, we work with a notion of belief intended to incorporate a form of Knightian uncertainty. Formally, this means that we work with sets of distributions (or rather "contributions" turn out to be a more flexible tool).

Definition 3.1. Given a finite set X, a contribution μ is a non-negative measure on X, such that μ(X)≤1. We denote the set of contributions ΔcX. A contribution is a distribution if μ(X)=1, so we have ΔX⊂ΔcX.

There's a natural order on ΔcX, given by pointwise comparison.

Definition 3.2. We call a subset A⊂ΔcXdownward closed if for μ∈A, ν≤μ implies ν∈A.

As a subspace of RX, the set ΔcX inherits a metric and a convex structure.

Definition 3.3. We call a closed, convex, downward closed subset Θ⊂ΔcX a homogenious ulta-contribution (HUC for short). We denote the set of HUCs by □X.

We'll work with HUCs as our central formal notion of belief in this article. The exact properties required (closed, convex and downward closed) should be illuminated by Lemma 3.6.

Definition 3.4. Given a HUC Θ∈□X, and a function f:X→[0,1], we define the expected valueEΘ[f]=maxθ∈ΘEθ[f]=maxθ∈Θ∑x∈Xθ(x)f(x).

Thinking of f as a loss function, this is a worst-case expected value, given Knightian uncertainty over the probabilities.

Remark 3.5. It's worth mentioning that the prefix "infra" originates from the concept of infradistributions, which is the notion corresponding to ultracontributions, in the dual setup of utility functions instead of loss functions. We still often use the term "infra" in phrases such as infra-belief or infra-Bayesianism, but now simply carrying the connotation of a "weaker form" of belief etc., compared to the Bayesian analog.

Lemma 3.6. For Θ∈□X, the expected value defines a convex, monotone, homogeneous functional EΘ[−]:[0,1]X→[0,1].

Lemma 3.7. There is a duality Θ↦EΘ, between □X (i.e. closed, convex, and downward closed subsets of ΔcX) and convex, monotone, and homogeneous functionals [0,1]X→[0,1].

For a functional F:[0,1]X→[0,1], the inverse map in the duality is given by F↦ΘF=⋂f:X→[0,1]{θ∈ΔcX:Eθ[f]≤F[f]}.

3.2. Some constructions

For the current article to be more self-contained, we spell out a few definitions used in this discussion.

Definition 3.8. Given a map of finite sets f:X→Y, we define the pushforward
f∗:ΔcX→ΔcYto be given by the pushforward measure. We use the same notation to denote the pushforward on HUCs, f∗:□X→□Y, given by forward image, that is f∗(Θ)={f∗(θ)∈ΔcY|θ∈Θ}. Equivalently, in terms of the expectation values we have for g:Y→[0,1]Ef∗(Θ)[g]=EΘ[g∘f].

Definition 3.9. Given a collection of finite sets Xi, and HUCs Θi∈□Xi, we define the free product⋈iΘi∈□(∏iXi) as follows. For a contribution θ∈Δc∏iXi we have θ∈⋈iΘi if and only if for each j,
(prj)∗(θ)∈Θj⊂ΔcXj, where prj:∏iXi→Xj is projection onto the ith factor.

The free product thus specifies the allowed marginal values, but puts no further restriction on the possible correlations.

Definition 3.10 (Total uncertainty). The state of total (Knightian) uncertainty ⊤X∈□X is defined as ⊤X=ΔcX, i.e. the subset of all contributions.

Definition 3.11 (Semidirect product). Given a map β:X→□Y, and an element Θ∈□X, we can define the semidirect productΘ⋉β∈□(X×Y). This is easier to write down in terms of the expectation functionals, as follows. For g:X×Y→[0,1], define EΘ⋉β[g]=EΘ[Eβ(x)[g(x,−)]]. Here Eβ(x)[g(x,−)] is the function X→[0,1], whose value at x∈X is given by by taking expected value with respect to β(x)∈□Y of the function g(x,−):Y→[0,1].

As a subset of Δc(X×Y),
⊤X⋉β can be understood as the convex hull of the δx×θ for all x∈X and all θ∈β(x)⊂ΔcY. For Θ⋉β one needs to further restrict to contributions that project down into Θ⊂ΔcX.

3.3. The bridge transform

The key construction we'll be considering in infra-Bayesian physicalism is the bridge transform. This construction is aimed at answering the question "given a belief about the joint computational-physical universe, what should our corresponding belief be about which computations are realized in the physical universe?".

We'll discuss these notions in a bit more detail, but for now both the physical universe Φ and the computational universe Γ are just assumed to be finite sets.

Definition 3.12. Given Θ∈□(Γ×Φ), the bridge transform of Θ,
Br(Θ)∈□(Γ×2Γ×Φ) is defined as follows (cf. [IBP Definition 1.1]). For a contribution θ∈Δc(Γ×2Γ×Φ) we have θ∈Br(Θ) if and only if for any s:Γ→Γ, under the composite
we have ~s(θ)∈Θ⊂Δc(Γ×Φ).

Remark 3.13. The use of all endomorphism s:Γ→Γ in Definition 3.12, although concise, doesn't feel fully principled as of now. We would typically think of the computational universe Γ as the set of all possible assignments of outputs to programs, i.e. Γ=ΣR, for a certain output alphabet Σ, and a set of programs R (see Definition 4.1). In this context, ΓΓ feels somewhat unnatural. That being said, in the current discussion we mainly use the fact that ΓΓ acts transivitely on Γ, so it's possible that these results would survive in some form under a modified definition of the bridge transform.

For easy reference, we spell out [IBP Proposition 2.10]:

Lemma 3.14 (Refinement). Given a mapping between physical universes f:Φ1→Φ2, we have
That is, for a belief Θ∈□(Φ1×Γ) we have (idelΓ×f)∗(Br(Θ))⊂Br((idΓ×f)∗(Θ)).

4. An infra-Bayesian physicalist interpretation

We'll work with a certain specialized setup of [IBP].

Definition 4.1. Let the set of "programs"
R=O≤T, the "output alphabet"
Σ=A, and the set of "computational universe states"
Γ=ΣR=AO≤T be the set of policies up to time horizon T. We'll write elΓ={(π,α)∈Γ×2Γ:π∈α}.

Definition 4.2. Let a "universal observable" B be a triple (VB,QB,tB), where VB is a finite set (of "observation outcomes"), QB is a projection-valued measure on VB, valued in H (giving projections QB(v):H→H for each v∈VB), and an "observation time"
tB∈N<T. Let U be the set of all universal observables, up to the natural notion of equivalence.

Remark: We use the term "universal observable" here to distinguish between observables of the "universe" (i.e. the joint agent-environment system) from the observations of the environment by the agent.

Definition 4.3 (Initial state). Fix a normalized (norm 1) initial state ∣∣ψe0⟩∈He of the environment, and let ∣∣ψg0⟩ be the state of the agent corresponding to an empty memory tape, i.e. τ:Z/T→E given by τ(n)=0 for all n. Let |ψ0⟩=∣∣ψg0⟩⊗∣∣ψe0⟩∈H be the initial state of the joint system.

Definition 4.4. For a policy π∈Γ, let the marginal distribution of the universal observable B be defined according to the Born rule:
βB(v|π)=∥QB(v)UtBπ|ψ0⟩∥2. I.e. the norm square of the vector obtained by evolving the universe following policy π for tB time-steps from the initial state, and then projecting onto the observation subspace corresponding to the universal observation v∈VB. So βB(−|π)∈ΔVB.

Definition 4.5. Let ΦU=∏B∈UVB be the set of "all possible states of the universe" (more precisely the set of all possible outcomes of all observations on the joint agent-environment system). More generally, define ΦS analogously for any subset S⊂U.

Definition 4.6. For a finite subset F⊂U, let βF(π)=⋈B∈FβB(−|π)∈□ΦF be the free product of the βB, as defined in Definition 3.9. For varying π this defines an ultrakernel βF:Γ→□ΦF, and the associated semidirect product ΘF=⊤Γ⋉βF∈□(Γ×ΦF). Taking the bridge transform and projecting out the physical factor ΦF:
□(Γ×ΦF)Br−→□(Γ×2Γ×ΦF)pr∗−−→□(Γ×2Γ), we get Θ∗F=pr∗(Br(ΘF))∈□(Γ×2Γ).

If F1⊂F2⊂U, we have a natural "refinement" map p:ΦF2→ΦF1, given by projecting out the additional factors in ΦF2. By Lemma 3.14, we have
so Θ∗F2⊂Θ∗F1. Inspired by this, we have the following.

Definition 4.7. Let Θ∗U=⋂F⊂UΘ∗F, where the intersection is over all finite subsets of U.

4.1. Copenhagen interpretation

Definition 4.8. Let h∈E≤T be an observation-action history, and denote by Qh:H→H the projection corresponding to the proposition "the memory tape recorded history h". More precisely Qh=Qgh⊗idHe, where Qgh∣∣ψgτ⟩={∣∣ψgτ⟩if hist(τ)=h0otherwise.

Definition 4.9. Given a sequence of observation-action pairs h∈En, let h≤m∈E≤m denote the truncated history (i.e. the image under projecting out the last n−m components of En if n>m, and h itself if n≤m).

In the Copenhagen interpretation the "universe" (i.e. the joint system of the agent and the environment) collapses after each observation of the agent.

Definition 4.10. Given a policy π:O≤T→A, the initial state |ψ0⟩∈H, and a sequence of observation-action pairs h∈En, we can define |ψt⟩=Qh≤tUπ|ψt−1⟩ for t>0 recursively. Then according to the Copenhagen interpretation, the probability of observing h is Cop(h|π)=∥|ψn⟩∥2.

Lemma 4.11. Collapsing at each step is the same as collapsing at the end, that is |ψt⟩=Qh≤tUtπ|ψ0⟩.

Proof. The claim is true for t=0,1 by definition. Assume it's true for t−1, so |ψt−1⟩=Qh≤t−1Ut−1π|ψ0⟩. Let's write Ut−1π|ψ0⟩=∑τ∈Tp∣∣ψgτ⟩⊗|φeτ⟩, so |ψt−1⟩=∑τ∈Tphist(τ)=h≤t−1∣∣ψgτ⟩⊗|φeτ⟩. Then if a=π(obs(h≤t−1))∈A, we have |ψt⟩=∑τ∈Tphist(τ)=h≤t∣∣ψgτ⟩⊗Ph(t)Ua|φeτ⟩, while Utπ|ψ0⟩=∑τ∈Tpo∈O∣∣ψgmem(τ,(o,a))⟩⊗PoUπ(hist(τ))|φeτ⟩. Now Qh≤t∣∣ψgmem(τ,(o,a))⟩=0 unless (o,a)=h(t) and hist(τ)=h≤t−1, hence Qh≤tUtπ|ψ0⟩=|ψt⟩ as claimed.□

4.2. Relating the two interpretations

Since Θ∗U∈□elΓ, we can take expectations of functions f:elΓ→[0,1], in particular indicator functions χq for q⊂elΓ.

Definition 4.12. For a policy π∈Γ, and a tuple of observations h∈On, define αh|π={γ∈Γ|γ(h)=π(h)}⊂Γ, and let qh|π={(π,α)∈elΓ|α⊂αh|π}⊂elΓ.

Remark 4.13. In what follows we'll assume |A|>1. This assures that the set of policies is richer than the set of histories (i.e. |Γ|>|O≤T|). Much of the following fails in the degenerate case |A|=1.

When considering the infra-Bayesian physicalist interpretation of a quantum event h, we'll consider the expected value EΘ∗U[χπ(1−χqh|π)]. As defined in Definition 4.6, ΘU can be thought of as the infra-belief ⊤Γ⋉βU∈□(Γ×ΦU), which is a joint belief over the computational-physical world, with complete Knightian uncertainty over the policy of the agent (as a representation of "free will"), and for each policy the corresponding belief about the physical world is as given by the unitary quantum evolution of the agent-environment system under the given policy. The bridge transform Θ∗U∈□elΓ of ΘU then packages the relevant beliefs about which computational facts are manifest in the physical world. The subset αh|π corresponds to the proposition "the policy outputs action a=π(h) upon observing h", and hence qh|π corresponds to the belief "the physical world witnesses the output of the policy on h to be a=π(h) (which is to say there's a version of the agent instantiated in the physical world that observed history h, and acted a)". We'll be investigating various claims about the quantity EΘ∗U[χπ(1−χqh|π)], which is the ultraprobability (i.e. the highest probability for the given Knightian uncertainty) of the agent following policy π and h not being observed (i.e. no agent being instantiated acting on history h).

Remark 4.14. It might at first seem more natural to consider the complement instead, that is χπχqh|π, which corresponds to the agent following policy π, and history h being observed. However, it turns out that EΘ∗U[χπχqh|π]=1 always. This can be understood intuitively via refinement (see Lemma 3.14): we can always extend our model of the physical world to include a copy of the agent instantiated on history h, so the highest probability of h being observed will be 1. This is also related to the monotonicity principle discussed in [IBP]. Thus although at first glance this might seem less natural, in our setup it's more meaningful to study the ultraprobability of the complement, i.e. of h not being observed. Note that since we're working with convex instead of linear expectation functionals (see Lemma 3.7), the complementary ultraprobabilities will typically sum to something greater than one.

We first state Claims 4.15 and 4.17 relating the IBP and Copenhagen interpretations "on the nose", which both turn out to be false in general. Then we state the weaker Theorem 4.19, which is true, and establishes a form of asymptotic relationship between the two interpretations.

Claim 4.15. The two interpretations agree on the probability that a certain history is not realized given a policy. That is,
EΘ∗U[χπ(1−χqh|π)]=1−Cop(h|π).

This claim turns out to be false in general, and we give a counterexample in Counterexample 5.3. Note, however, that the claim seems to be true in the limit with many actions (i.e. |A|→∞), which would warrant further study. Now consider the following definition concerning two copies of the agent being instantiated.

Definition 4.18. For a policy π∈Γ, and two tuples of observations h1,h2∈On, define αh1,h2|π={γ∈Γ|γ(hi)=π(hi) for i=1,2}⊂Γ, and let qh1,h2|π={(π,α)∈elΓ|α⊂αh1,h2|π}⊂elΓ.

Claim 4.17. There is only one copy of the agent (i.e. the agent is not instantiated on multiple histories, there are no "many worlds"). That is, if neither of h1,h2∈On is a prefix of the other, then EΘ∗U[χπ(1−χqh1,h2|π)]=1.

This claim is the relative counterpart of Claims 4.15 and fails as well in general (see Counterexample 5.5). Again, however, this claim might hold in the |A|→∞ limit.

Definition 4.18. An event is a subset of histories E⊂OT. We define the corresponding qE|π=⋃h∈Eqh|π⊂elΓ, and Cop(E|π)=∑h∈ECop(h|π).

Theorem 4.19. The ultraprobability of an agent not being instantiated on a certain event can be bounded via functions of the (Copenhagen) probability of the event. More precisely,
1−√(2−Cop(E|π))Cop(E|π)≤EΘ∗U[χπ(1−χqE|π)]≤1−Cop(E|π).

Due to the failure of Claims 4.15 and 4.17, we can think of the infra-Bayesian physicalist setup as a form of many-worlds interpretation. However, since √(2−Cop(E|π))Cop(E|π)→0 as Cop(E|π)→0, the above Theorem 4.19 shows statistical consistency with the Copenhagen interpretation in the sense that observations that are unlikely according to the Born rule have close to 1 ultraprobability of not being instantiated (while very likely observations have close to 0 ultraprobability of uninstantiation).

Remark 4.20. For simplicity we assumed E only contains entire histories (i.e. ones of maximal length T). It's easy to modify the definitions to account for partial histories. The inequalities in Theorem 4.19 remain true even if E includes partial histories, and the proofs are easy to adjust. We avoid doing this here in order to keep the notation cleaner. However, it's worth noting some important points here. For a partial history h, let H⊂OT be the set of all completions of h, i.e. H={~h∈OT:h⊏~h}. Then we have Cop(h|π)=Cop(H|π)=∑~h∈HCop(~h|π). On the other hand,
qh|π≠qH|π=⋃~h∈Hq~h|π, so there is an important difference here between the two interpretations, which would warrant further discussion. In particular, under the infra-Bayesian physicalist interpretation it can happen that EΘ∗U[χπ(1−χqH|π)]>EΘ∗U[χπ(1−χqh|π)] for a partial history h and its set of completions H. This could be loosely interpreted as Everett branches "disappearing", as the ultraprobability of an agent not being instantiated on the partial history h is less than that of the agent not being instantiated on any completion of that history.

4.3. Decision theory

To shed more light on the way the infra-Bayesian physicalist interpretation functions, it is interesting to consider the decision theory of the framework, along with the epistemic considerations above.

Definition 4.21. Consider a loss function L:D→R≥0, where D=ET is the set of destinies. We can then construct the physicalized loss function (cf. [IBP Definition 3.1])
Lphys:elΓ→R≥0, given by Lphys(γ,α)=minh∈Xαmaxd∈Dh⊏dL(d), where Xα is the set of histories witnessed by α, that is Xα={h∈E≤T|∀ga⊏h,∀~γ∈α:~γ(obs(g))=a}. Note that in our simplified context, Lphys(γ,α) doesn't depend on γ.

Definition 4.22. We can define the worst-case expected physicalized loss associated to a policy π by LIBP(π)=EΘ∗U[χπ⋅Lphys]. Under the Copenhagen model, we would instead simply consider LCop(π)=ECop[L|π]=∑d∈DCop(d|π)L(d).

Remark 4.23. Given a policy π∈Γ, we can consider the set of "fair" counterfactuals (cf. [IBP Definition 1.5])
Cπfair={(γ,α)∈elΓ|∀h∈O≤T:(∀~γ∈α,∀~h⊏h:~γ(~h)=γ(~h))⟹γ(h)=π(h)}, i.e. where if α witnesses the history h, then γ agrees with π on that history. This definition is in contrast with the "naive" counterfactuals we considered above (when writing χπ):
Cπnaive={(γ,α)∈elΓ|γ=π}. In Definition 4.22 above, and generally whenever we use χπ, we could have used the indicator function of Cπfair instead. The choice of counterfactuals affects the various expected values, however, all of the theorems in this article remain true (and Claims 4.15 and 4.17 remain false) for both naive and fair counterfactuals. We thus work with naive counterfactuals for the sake of simplicity.

Similarly to Section 4.2, the "on the nose" claim relating the two interpretations fails, but we have an asymptotic relationship which holds.

Claim 4.24. The two interpretations agree on the loss of any policy:
LIBP(π)=LCop(π).

Again, this turns out to be false, and we give a simple counterexample in Counterexample 5.6.

To allow discussing the asymptotic behavior, assume now that we incur a loss at each timestep, given by ℓ:E=O×A→R≥0, and we consider the total loss L=T∑t=1ℓt:D→R≥0. We might hope that we could have at least the following.

Claim 4.25. The two interpretations agree on the loss of any policy asymptotically:
LIBP(π)∼LCop(π), i.e. the difference is bounded sublinearly in T.

This claim is still false in general for essentially the same reason as Claim 4.24 since certain policies might involve a one-off step that then affect the entire asymptotic loss. We give a detailed explanation in Counterexample 5.7. We do however have the following.

Theorem 4.26. If the resulting MDP is communicating (see Definition 7.8), then for any policy π we have LCop(π∗)−o(T)≤LIBP(π)≤LCop(π), where π∗ is a Copenhagen-optimal policy. In particular, optimal losses for the IBP and Copenhagen frameworks agree asymptotically.

We'll look at a few concrete examples in detail, firstly to gain some insight into how Claims 4.15 and 4.17 fail in general, and secondly to see how our framework operates in the famously puzzling Wigner's friend scenario.

5.1. Counterexamples

We'll construct simple counterexamples to Claims 4.15 and 4.17 in the smallest non-degenerate case, i.e. when |O|=2 and |A|=2, and T=1. Let O={o0,o1} and A={a0,a1}. There are four policies in this case (ignoring the value of the policies on the empty input, which is irrelevant in our setting, see Remark 2.8), which we'll abbreviate as π00,π01,π10,π11, where πij(o0)=aiπij(o1)=aj. Assume h=o0, and π=π00, so αh|π={π00,π01}.

Lemma 5.1. For ρ∈ΔcelΓ×Φ, we have ρ∈Br(Θ) if and only if for each s:Γ→Γ and g:Γ×Φ→[0,1]Eρ[~g]≤EΘ[g], where ~g:elΓ×Φ→[0,1] is given by γ,α,x↦χs(γ)∈α⋅g(s(γ),x).

Lemma 5.2. Let β:Γ→ΔΦ be a kernel,
Θ=⊤Γ⋉β, and Θ∗=pr∗(Br(Θ)) as above. Then EΘ∗[χπ(1−χqh|π)]=E(β(π10)+β(π11))∧β(π00)[1].

Proof. To obtain a lower bound (although we'll only use the upper bound for the counterexample), define the contribution ρ∈Δc(elΓ×Φ) by ρ=δπ00,{π00,π10}×ϕ10+δπ00,{π00,π11}×ϕ11, where ϕ10,ϕ11∈ΔcΦ are such that ϕ10≤β(π10),ϕ11≤β(π11), and ϕ10+ϕ11=(β(π10)+β(π11))∧β(π00). One possible such choice is ϕ10=β(π10)∧β(π00)ϕ11=β(π11)∧(β(π00)−ϕ10). Then it's easy to verify that ρ∈Br(Θ), and Eρ[χπ(1−χqh|π)]=E(β(π10)+β(π11))∧β(π00)[1].
To obtain an upper bound, fix x0∈Φ, and use Lemma 5.1 for constant s=π00, and g(γ,x)=χγ=π00⋅χx=x0.
We have ~g(γ,α,x)=χπ00∈α⋅g(π00,x)=χπ00∈α⋅χx=x0, and so
Eρ[χπ00∈α⋅χx=x0]=Eρ[~g]≤EΘ[χγ=π00⋅χx=x0]=Eβ(π00)[χx=x0].(1)
Analogously for π10 and π11 we get
Eρ[χπ10∈α⋅χx=x0]≤EΘ[χγ=π10⋅χx=x0]=Eβ(π10)[χx=x0],(2) and
Eρ[χπ11∈α⋅χx=x0]≤EΘ[χγ=π11⋅χx=x0]=Eβ(π11)[χx=x0].(3)

Now,
χπ(1−χqh|π)χx=x0≤χπ00∈α⋅χx=x0, so by (1) we get
Eρ[χπ(1−χqh|π)χx=x0]≤Eβ(π00)[χx=x0].(4)

We also have 1−χqh|π≤χπ10∈α+χπ11∈α, since π10∉α and π11∉α together would imply α⊂αh|π00. Thus
χπ(1−χqh|π)χx=x0≤(χπ10∈α+χπ11∈α)⋅χx=x0,
so adding (2) and (3), we obtain
Eρ[χπ(1−χqh|π)χx=x0]≤Eβ(π10)+β(π11)[χx=x0].(5)
Now, since both (4) and (5) hold, we get Eρ[χπ(1−χqh|π)χx=x0]≤E(β(π10)+β(π11))∧β(π00)[χx=x0]. Finally, summing over x0∈Φ we have the required upper bound EΘ∗[χπ(1−χqh|π)]=E(β(π10)+β(π11))∧β(π00)[1].□

Counterexample 5.3. Let He be a qubit state space, and ∣∣ψe0⟩=|+⟩=1√2(|0⟩+|1⟩). Let Ua0=Ua1=idHe. Let the observation P correspond to measuring the qubit, so Po0,Po1 are projections onto |0⟩ and |1⟩ respectively. Then Claim 4.15 fails in this setup.

Proof. We have |ψ0⟩=∣∣ψg0⟩⊗∣∣ψe0⟩=|0⟩⊗1√2(|0⟩+|1⟩), and so Uπ00|ψ0⟩=1√2(|o0a0⟩⊗|0⟩+|o1a0⟩⊗|1⟩),Uπ10|ψ0⟩=1√2(|o0a1⟩⊗|0⟩+|o1a0⟩⊗|1⟩),Uπ11|ψ0⟩=1√2(|o0a1⟩⊗|0⟩+|o1a1⟩⊗|1⟩). Now consider the universal observable B which is measurement along the vector |v⟩ and its complement, where |v⟩=12√3(3|o0a0⟩⊗|0⟩+|o1a0⟩⊗|1⟩−|o0a1⟩⊗|0⟩+|o1a1⟩⊗|1⟩) I.e. we have VB={v,v⊥}, and QB(v)=Pv,QB(v⊥)=Pv⊥, where Pv, Pv⊥ are projections in H=Hg⊗He onto |v⟩ and its ortho-complement respectively. Then we have the following values for βB for the various policies:

π00

π10

π11

βB(v)

2/3

0

0

βB(v⊥)

1/3

1

1

This can be seen by noticing that |v⟩ is perpendicular to both Uπ10|ψ0⟩ and Uπ11|ψ0⟩, while ⟨v∣∣Uπ00ψ0⟩=2√6, so βB(v|π00)=|⟨v∣∣Uπ00ψ0⟩|2=23. This means that for this B we have (βB(π10)+βB(π11))∧βB(π00)[1]=1/3. If FB={B}, by Lemma 5.2 we have EΘ∗FB[χπ(1−χqh|π)]=E(βB(π10)+βB(π11))∧βB(π00)[1]=1/3. Now, by definition Θ∗U⊂Θ∗FB, so we also have EΘ∗U[χπ(1−χqh|π)]≤1/3<1−Cop(h|π)=12.□

Although we won't need the exact value here, we remark to the interested reader that in the above setup of Counterexample 5.3, the ultraprobability attains the lower bound of Theorem 4.19, that is EΘ∗U[χπ(1−χqh|π)]=1−√3/4≈0.134.

We can extend the above counterexample to apply to Claim 4.17, via the following.

Lemma 5.4. Let β:Γ→ΔΦ be a kernel,
Θ=⊤Γ⋉β, and Θ∗=pr∗(Br(Θ)) as above. Then for h1=o0, h2=o1,
EΘ∗[χπ(1−χqh1,h2|π)]=E(β(π10)+β(π01)+β(π11))∧β(π00)[

This work was inspired by a question by Vanessa Kosoy, who also contributed several of the core ideas, as well as feedback and mentorship.## Abstract

We outline a

computationalistinterpretation of quantum mechanics, using the framework of infra-Bayesian physicalism. Some epistemic and normative aspects of this interpretation are illuminated by a number of examples and theorems.## 1. Introduction

Infra-Bayesian physicalism was introduced as a framework to investigate the relationship between a belief about a joint computational-physical universe and a corresponding belief about

whichcomputations are realized in the physical world, in the context of "infra-beliefs". Although the framework is still somewhat tentative and the definitions are not set in stone, it is interesting to explore applications in the case of quantum mechanics.## 1.1. Discussion of the results

Quantum mechanics has been notoriously difficult to interpret in a fully satisfactory manner. Investigating the question through the lens of computationalism, and more specifically in the setting of infra-Bayesian physicalism provides a new perspective on some of the questions via its emphasis on formalizing aspects of metaphysics, as well as its focus on a decision-theoretic approach. Naturally, some questions remain, and some new interesting questions are raised by this framework itself.

The toy setup can be described on the high level as follows (with details given in Sections 2 to 4). We have an "agent": in this toy model simply consisting of a policy, and a memory tape to record observations. The agent interacts with a quantum mechanical "environment": performing actions and making observations. We assume the entire agent-environment system evolves unitarily. We'll consider the agent having complete Knightian uncertainty over its own policy, and for each policy the agent's beliefs about the "universe" (the joint agent-environment system) is given by the Born rule for each observable, without any assumption on the correlation between observables (formally given by the free product). We can then use the key construction in infra-Bayesian physicalism — the bridge transform — to answer questions about the agent's corresponding beliefs about what copies of the agent (having made different observations) are instantiated in the given universe.

In light of the falsity of Claims 4.15 and 4.17, we can think of the infra-Bayesian physicalist setup as a form of many-worlds interpretation. However, unlike the traditional many-worlds interpretation, we have a meaningful way of assigning probabilities to (sets of) Everett branches, and Theorem 4.19 shows statistical consistency with the Copenhagen interpretation. In contrast with the Copenhagen interpretation, there is no "collapse", but we do assume a form of the Born rule as a basic ingredient in our setup. Finally, in contrast with the de Broglie–Bohm interpretation, the infra-Bayesian physicalist setup does not privilege particular observables, and is expected to extend naturally to relativistic settings. See also Section 8 for further discussion on properties that are specific to the toy setting and ones that are more inherent to the framework. It is worth pointing out that the author is not an expert in quantum interpretations, so a lot of opportunities are left open for making connections with the existing literature on the topic.

## 1.2. Outline

In Section 2 we describe the formal setup of a quantum mechanical agent-environment system. In Section 3 we recall some of the central constructions in infra-Bayesian physicalism, then in Section 4 we apply this framework to the agent-environment system. In Sections 4.2 and 4.3 we write down various statements relating quantities arising in the infra-Bayesian physicalist framework to the Copenhagen interpretation of quantum mechanics. While Section 4.2 focuses on "epistemic" statements, Section 4.3 is dedicated to the "normative" aspects. A general theme in both sections is that the stronger, "on the nose" relationships between the interpretations fail, while certain weaker "asymptotic" relationships hold. In Section 5.1 we construct counterexamples to the stronger claims, and in in Sections 6 and 7 we prove the weaker claims relating the interpretations. In Section 8 we discuss which aspects of our setup are for the sake of simplicity in the toy model, and which are properties of the broader theory.

## 2. Setup

First, we'll describe a standard abstract setup for a simplified agent-environment joint system. We have the following ingredients:

A finite set A of possible actions of the agent.

A finite set O of possible observations of the agent. We'll write E=O×A, the set of observation-action pairs.

For technical reasons it will be convenient to add a symbol 0 for "blank", and fix a bijection E+=E⊔{0}≅Z/N preserving 0, where N=|O|⋅|A|+1. We'll use this bijection to treat E+ as an abelian group implicitly.

A Hilbert space He corresponding to states of the environment.

Fix a finite time horizon

^{[1]}T∈N. A classical state of a cyclic, length T memory tape is a function τ:Z/T→E+. Let TpT be the set of all classical tape states.A Hilbert space Hg with orthonormal basis ∣∣ψgτ⟩ for τ∈Tp, corresponding to the quantum state of the agent.

For each a∈A a unitary map of the environment Ua:He→He, describing the "result of the action".

A projection-valued measure P on O, valued in He (giving projections Po:He→He for each observation o∈O).

Let H=Hg⊗He be the state space of the joint agent-environment system.

Remark 2.1.It would be interesting to consider a setting where the agent is allowed tochoosethe observation in each step (e.g. have the projection-valued measure P depend on the action taken). For simplicity we'll work with a fixed observation as described above.Definition 2.2.Let O≤T=t≤T⨆t∈NOt E≤T=t≤T⨆t∈NEt be the set ofpolicy.observation historiesandobservation-action historiesrespectively, i.e. finite strings of observations (resp. observation-action pairs) up to length T. There's a natural map obs:E≤T→O≤T, extracting the string of observations from a string of observation-action pairs. We'll call a function π:O≤T→A aFor two histories h1,h2 (of either type), we'll sometimes write h1⊏h2 to mean h1 is a (not necessarily proper) prefix (i.e. initial substring) of h2.Remark 2.3.We only considerdeterministicpolicies here. It's not immediately clear how one would generalize Definition 2.7 to randomized policies. In fact, we can always (and is perhaps more principled to) think of our source of randomness for a randomized policy to be included in the environment, so we don't lose out on generality by only considering deterministic policies. For example, if the source of our randomness is a quantum coin flip, then our approach offers a convenient way of modeling this by including the coin as a factor of He, i.e. part of the environment subsystem.Definition 2.4.For a tape state τ:Z/T→E+ and an observation-action pair ε∈E, let mem(τ,ε):Z/T→E+ be the state of the tape after writing the pair ε to the tape, defined by mem(τ,ε)(n)={τ(n−1)n≠0τ(−1)+εn=0.Remark 2.5.Choosing a group structure on E+ is in order to make the map mem(−,ε):Tp→Tp invertible, which in turn makes the map Uoπ in Definition 2.7 unitary.Definition 2.6.Let the "history extraction" map hist:Tp→E≤T be defined by hist(τ)=(τ(N−1),…,τ(0))∈EN, where 0≤N≤T is largest such that there's no 0≤n<N with τ(n)=0 (i.e. so that the [0,N) portion of the tape contains no blanks).Definition 2.7(Time evolution of a policy).For each policy π:O≤T→A, we define the single time-step unitary evolution operator Uπ on H as the composite of an "observation" and an "action" operator Uπ=UA,π∘UO,π, where UO,π(∣∣ψgτ⟩⊗Po|ψe⟩)=∣∣ψgmem(τ,(o,a))⟩⊗Po|ψe⟩for all o∈OUA,π(∣∣ψgτ⟩⊗|ψe⟩)=∣∣ψgτ⟩⊗Ua|ψe⟩for a=π(obs(hist(τ))) The time evolution after t∈N time-steps is given by Utπ=Uπ∘…∘Uπ, i.e. Uπ composed with itself t times.Remark 2.8.As defined above, the first step in the evolution is an observation, so we never use the value of the policy on the empty observation string. In this respect it would be more natural to start with an action instead, but it would make some of the notation and the examples more cumbersome, so we sacrifice a bit of naturality for the sake of simplicity overall.Lemma 2.9.The operator Uπ is unitary on H.Proof.The operator UA,π is clearly unitary since each Ua is. We can see that UO,π is unitary as follows. Choose an orthonormal basis ∣∣ψeo,i⟩ of PoHe for each o∈O, so together they form an orthonormal basis for He (note that the range of i might vary for varying o). Then ∣∣ψgτ⟩⊗∣∣ψeo,i⟩ forms an orthonormal basis for H, and UO,π permutes this basis, hence is unitary.□## 3. Prerequisites

We recall some definitions and lemmas within infra-Bayesianism. This is in order to make the current article fairly self-contained, all the relevant notions here were introducted in [IBP], [BIMT] and [LBIMT]. In particular we omit proofs in this section, all the relevant proofs can be found in the articles listed.

## 3.1. Ultracontributions

First of all, we work with a notion of belief intended to incorporate a form of

Knightian uncertainty. Formally, this means that we work withsetsof distributions (or rather "contributions" turn out to be a more flexible tool).Definition 3.1.Given a finite set X, acontributionμ is a non-negative measure on X, such that μ(X)≤1. We denote the set of contributions ΔcX. A contribution is a distribution if μ(X)=1, so we have ΔX⊂ΔcX.There's a natural order on ΔcX, given by pointwise comparison.

Definition 3.2.We call a subset A⊂ΔcXdownward closedif for μ∈A, ν≤μ implies ν∈A.As a subspace of RX, the set ΔcX inherits a metric and a convex structure.

Definition 3.3.We call a closed, convex, downward closed subset Θ⊂ΔcX ahomogenious ulta-contribution(HUC for short). We denote the set of HUCs by □X.We'll work with HUCs as our central formal notion of belief in this article. The exact properties required (closed, convex and downward closed) should be illuminated by Lemma 3.6.

Definition 3.4.Given a HUC Θ∈□X, and a function f:X→[0,1], we define theexpected valueEΘ[f]=maxθ∈ΘEθ[f]=maxθ∈Θ∑x∈Xθ(x)f(x).Thinking of f as a loss function, this is a worst-case expected value, given Knightian uncertainty over the probabilities.

Remark 3.5.It's worth mentioning that the prefix "infra" originates from the concept of infradistributions, which is the notion corresponding to ultracontributions, in the dual setup of utility functions instead of loss functions. We still often use the term "infra" in phrases such as infra-belief or infra-Bayesianism, but now simply carrying the connotation of a "weaker form" of belief etc., compared to the Bayesian analog.Lemma 3.6.For Θ∈□X, the expected value defines a convex, monotone, homogeneous functional EΘ[−]:[0,1]X→[0,1].Lemma 3.7.There is a duality Θ↦EΘ, between □X (i.e. closed, convex, and downward closed subsets of ΔcX) and convex, monotone, and homogeneous functionals [0,1]X→[0,1].For a functional F:[0,1]X→[0,1], the inverse map in the duality is given by F↦ΘF=⋂f:X→[0,1]{θ∈ΔcX:Eθ[f]≤F[f]}.

## 3.2. Some constructions

For the current article to be more self-contained, we spell out a few definitions used in this discussion.

Definition 3.8.Given a map of finite sets f:X→Y, we define thepushforward f∗:ΔcX→ΔcYto be given by the pushforward measure. We use the same notation to denote the pushforward on HUCs, f∗:□X→□Y, given by forward image, that is f∗(Θ)={f∗(θ)∈ΔcY|θ∈Θ}. Equivalently, in terms of the expectation values we have for g:Y→[0,1] Ef∗(Θ)[g]=EΘ[g∘f].Definition 3.9.Given a collection of finite sets Xi, and HUCs Θi∈□Xi, we define thefree product⋈iΘi∈□(∏iXi) as follows. For a contribution θ∈Δc∏iXi we have θ∈⋈iΘi if and only if for each j, (prj)∗(θ)∈Θj⊂ΔcXj, where prj:∏iXi→Xj is projection onto the ith factor.The free product thus specifies the allowed marginal values, but puts no further restriction on the possible correlations.

Definition 3.10(Total uncertainty).The state of total (Knightian) uncertainty ⊤X∈□X is defined as ⊤X=ΔcX, i.e. the subset of all contributions.Definition 3.11(Semidirect product).Given a map β:X→□Y, and an element Θ∈□X, we can define thesemidirect productΘ⋉β∈□(X×Y). This is easier to write down in terms of the expectation functionals, as follows. For g:X×Y→[0,1], define EΘ⋉β[g]=EΘ[Eβ(x)[g(x,−)]]. Here Eβ(x)[g(x,−)] is the function X→[0,1], whose value at x∈X is given by by taking expected value with respect to β(x)∈□Y of the function g(x,−):Y→[0,1].As a subset of Δc(X×Y), ⊤X⋉β can be understood as the convex hull of the δx×θ for all x∈X and all θ∈β(x)⊂ΔcY. For Θ⋉β one needs to further restrict to contributions that project down into Θ⊂ΔcX.

## 3.3. The bridge transform

The key construction we'll be considering in infra-Bayesian physicalism is the

bridge transform. This construction is aimed at answering the question "given a belief about the joint computational-physical universe, what should our corresponding belief be about which computations are realized in the physical universe?".We'll discuss these notions in a bit more detail, but for now both the physical universe Φ and the computational universe Γ are just assumed to be finite sets.

Definition 3.12.Given Θ∈□(Γ×Φ), thebridge transformof Θ, Br(Θ)∈□(Γ×2Γ×Φ) is defined as follows (cf. [IBP Definition 1.1]). For a contribution θ∈Δc(Γ×2Γ×Φ) we have θ∈Br(Θ) if and only if for any s:Γ→Γ, under the composite we have ~s(θ)∈Θ⊂Δc(Γ×Φ).Remark 3.13.The use of all endomorphism s:Γ→Γ in Definition 3.12, although concise, doesn't feel fully principled as of now. We would typically think of the computational universe Γ as the set of all possible assignments of outputs to programs, i.e. Γ=ΣR, for a certain output alphabet Σ, and a set of programs R (see Definition 4.1). In this context, ΓΓ feels somewhat unnatural. That being said, in the current discussion we mainly use the fact that ΓΓ acts transivitely on Γ, so it's possible that these results would survive in some form under a modified definition of the bridge transform.For easy reference, we spell out [IBP Proposition 2.10]:

Lemma 3.14(Refinement).Given a mapping between physical universes f:Φ1→Φ2, we have That is, for a belief Θ∈□(Φ1×Γ) we have (idelΓ×f)∗(Br(Θ))⊂Br((idΓ×f)∗(Θ)).## 4. An infra-Bayesian physicalist interpretation

We'll work with a certain specialized setup of [IBP].

Definition 4.1.Let the set of "programs" R=O≤T, the "output alphabet" Σ=A, and the set of "computational universe states" Γ=ΣR=AO≤T be the set of policies up to time horizon T. We'll write elΓ={(π,α)∈Γ×2Γ:π∈α}.Definition 4.2.Let a "universal observable" B be a triple (VB,QB,tB), where VB is a finite set (of "observation outcomes"), QB is a projection-valued measure on VB, valued in H (giving projections QB(v):H→H for each v∈VB), and an "observation time" tB∈N<T. Let U be the set of all universal observables, up to the natural notion of equivalence.Remark:We use the term "universal observable" here to distinguish between observables of the "universe" (i.e. the joint agent-environment system) from the observations of the environment by the agent.Definition 4.3(Initial state).Fix a normalized (norm 1) initial state ∣∣ψe0⟩∈He of the environment, and let ∣∣ψg0⟩ be the state of the agent corresponding to an empty memory tape, i.e. τ:Z/T→E given by τ(n)=0 for all n. Let |ψ0⟩=∣∣ψg0⟩⊗∣∣ψe0⟩∈H be the initial state of the joint system.Definition 4.4.For a policy π∈Γ, let the marginal distribution of the universal observable B be defined according to the Born rule: βB(v|π)=∥QB(v)UtBπ|ψ0⟩∥2. I.e. the norm square of the vector obtained by evolving the universe following policy π for tB time-steps from the initial state, and then projecting onto the observation subspace corresponding to the universal observation v∈VB. So βB(−|π)∈ΔVB.Definition 4.5.Let ΦU=∏B∈UVB be the set of "all possible states of the universe" (more precisely the set of all possible outcomes of all observations on the joint agent-environment system). More generally, define ΦS analogously for any subset S⊂U.Definition 4.6.For a finite subset F⊂U, let βF(π)=⋈B∈FβB(−|π)∈□ΦF be the free product of the βB, as defined in Definition 3.9. For varying π this defines an ultrakernel βF:Γ→□ΦF, and the associated semidirect product ΘF=⊤Γ⋉βF∈□(Γ×ΦF). Taking the bridge transform and projecting out the physical factor ΦF: □(Γ×ΦF)Br−→□(Γ×2Γ×ΦF)pr∗−−→□(Γ×2Γ), we get Θ∗F=pr∗(Br(ΘF))∈□(Γ×2Γ).If F1⊂F2⊂U, we have a natural "refinement" map p:ΦF2→ΦF1, given by projecting out the additional factors in ΦF2. By Lemma 3.14, we have so Θ∗F2⊂Θ∗F1. Inspired by this, we have the following.

Definition 4.7.Let Θ∗U=⋂F⊂UΘ∗F, where the intersection is over all finite subsets of U.## 4.1. Copenhagen interpretation

Definition 4.8.Let h∈E≤T be an observation-action history, and denote by Qh:H→H the projection corresponding to the proposition "the memory tape recorded history h". More precisely Qh=Qgh⊗idHe, where Qgh∣∣ψgτ⟩={∣∣ψgτ⟩if hist(τ)=h0otherwise.Definition 4.9.Given a sequence of observation-action pairs h∈En, let h≤m∈E≤m denote the truncated history (i.e. the image under projecting out the last n−m components of En if n>m, and h itself if n≤m).In the Copenhagen interpretation the "universe" (i.e. the joint system of the agent and the environment) collapses after each observation of the agent.

Definition 4.10.Given a policy π:O≤T→A, the initial state |ψ0⟩∈H, and a sequence of observation-action pairs h∈En, we can define |ψt⟩=Qh≤tUπ|ψt−1⟩ for t>0 recursively. Then according to the Copenhagen interpretation, the probability of observing h is Cop(h|π)=∥|ψn⟩∥2.Lemma 4.11.Collapsing at each step is the same as collapsing at the end, that is |ψt⟩=Qh≤tUtπ|ψ0⟩.Proof.The claim is true for t=0,1 by definition. Assume it's true for t−1, so |ψt−1⟩=Qh≤t−1Ut−1π|ψ0⟩. Let's write Ut−1π|ψ0⟩=∑τ∈Tp∣∣ψgτ⟩⊗|φeτ⟩, so |ψt−1⟩=∑τ∈Tphist(τ)=h≤t−1∣∣ψgτ⟩⊗|φeτ⟩. Then if a=π(obs(h≤t−1))∈A, we have |ψt⟩=∑τ∈Tphist(τ)=h≤t∣∣ψgτ⟩⊗Ph(t)Ua|φeτ⟩, while Utπ|ψ0⟩=∑τ∈Tpo∈O∣∣ψgmem(τ,(o,a))⟩⊗PoUπ(hist(τ))|φeτ⟩. Now Qh≤t∣∣ψgmem(τ,(o,a))⟩=0 unless (o,a)=h(t) and hist(τ)=h≤t−1, hence Qh≤tUtπ|ψ0⟩=|ψt⟩ as claimed.□## 4.2. Relating the two interpretations

Since Θ∗U∈□elΓ, we can take expectations of functions f:elΓ→[0,1], in particular indicator functions χq for q⊂elΓ.

Definition 4.12.For a policy π∈Γ, and a tuple of observations h∈On, define αh|π={γ∈Γ|γ(h)=π(h)}⊂Γ, and let qh|π={(π,α)∈elΓ|α⊂αh|π}⊂elΓ.Remark 4.13.In what follows we'll assume |A|>1. This assures that the set of policies is richer than the set of histories (i.e. |Γ|>|O≤T|). Much of the following fails in the degenerate case |A|=1.When considering the infra-Bayesian physicalist interpretation of a quantum event h, we'll consider the expected value EΘ∗U[χπ(1−χqh|π)]. As defined in Definition 4.6, ΘU can be thought of as the infra-belief ⊤Γ⋉βU∈□(Γ×ΦU), which is a joint belief over the computational-physical world, with complete Knightian uncertainty over the policy of the agent (as a representation of "free will"), and for each policy the corresponding belief about the physical world is as given by the unitary quantum evolution of the agent-environment system under the given policy. The bridge transform Θ∗U∈□elΓ of ΘU then packages the relevant beliefs about which computational facts are manifest in the physical world. The subset αh|π corresponds to the proposition "the policy outputs action a=π(h) upon observing h", and hence qh|π corresponds to the belief "the physical world witnesses the output of the policy on h to be a=π(h) (which is to say there's a version of the agent instantiated in the physical world that observed history h, and acted a)". We'll be investigating various claims about the quantity EΘ∗U[χπ(1−χqh|π)], which is the ultraprobability (i.e. the highest probability for the given Knightian uncertainty) of the agent following policy π and h not being observed (i.e. no agent being instantiated acting on history h).

Remark 4.14.It might at first seem more natural to consider the complement instead, that is χπχqh|π, which corresponds to the agent following policy π, and history h being observed. However, it turns out that EΘ∗U[χπχqh|π]=1 always. This can be understood intuitively via refinement (see Lemma 3.14): we can always extend our model of the physical world to include a copy of the agent instantiated on history h, so the highest probability of h being observed will be 1. This is also related to the monotonicity principle discussed in [IBP]. Thus although at first glance this might seem less natural, in our setup it's more meaningful to study the ultraprobability of the complement, i.e. of hnotbeing observed. Note that since we're working with convex instead of linear expectation functionals (see Lemma 3.7), the complementary ultraprobabilities will typically sum to something greater than one.We first state Claims 4.15 and 4.17 relating the IBP and Copenhagen interpretations "on the nose", which both turn out to be false in general. Then we state the weaker Theorem 4.19, which is true, and establishes a form of asymptotic relationship between the two interpretations.

Claim 4.15.The two interpretations agree on the probability that a certain history is not realized given a policy. That is, EΘ∗U[χπ(1−χqh|π)]=1−Cop(h|π).This claim turns out to be false in general, and we give a counterexample in Counterexample 5.3. Note, however, that the claim seems to be true in the limit with many actions (i.e. |A|→∞), which would warrant further study. Now consider the following definition concerning two copies of the agent being instantiated.

Definition 4.18.For a policy π∈Γ, and two tuples of observations h1,h2∈On, define αh1,h2|π={γ∈Γ|γ(hi)=π(hi) for i=1,2}⊂Γ, and let qh1,h2|π={(π,α)∈elΓ|α⊂αh1,h2|π}⊂elΓ.Claim 4.17.There is only one copy of the agent (i.e. the agent is not instantiated on multiple histories, there are no "many worlds"). That is, if neither of h1,h2∈On is a prefix of the other, then EΘ∗U[χπ(1−χqh1,h2|π)]=1.This claim is the relative counterpart of Claims 4.15 and fails as well in general (see Counterexample 5.5). Again, however, this claim might hold in the |A|→∞ limit.

Definition 4.18.Aneventis a subset of histories E⊂OT. We define the corresponding qE|π=⋃h∈Eqh|π⊂elΓ, and Cop(E|π)=∑h∈ECop(h|π).Theorem 4.19.The ultraprobability of an agent not being instantiated on a certain event can be bounded via functions of the (Copenhagen) probability of the event. More precisely, 1−√(2−Cop(E|π))Cop(E|π)≤EΘ∗U[χπ(1−χqE|π)]≤1−Cop(E|π).Proof.We prove the upper bound in Section 6.1 and the lower bound in Section 6.2. □Due to the failure of Claims 4.15 and 4.17, we can think of the infra-Bayesian physicalist setup as a form of many-worlds interpretation. However, since √(2−Cop(E|π))Cop(E|π)→0 as Cop(E|π)→0, the above Theorem 4.19 shows statistical consistency with the Copenhagen interpretation in the sense that observations that are unlikely according to the Born rule have close to 1 ultraprobability of not being instantiated (while very likely observations have close to 0 ultraprobability of uninstantiation).

Remark 4.20.For simplicity we assumed E only containsentirehistories (i.e. ones of maximal length T). It's easy to modify the definitions to account for partial histories. The inequalities in Theorem 4.19 remain true even if E includes partial histories, and the proofs are easy to adjust. We avoid doing this here in order to keep the notation cleaner. However, it's worth noting some important points here. For a partial history h, let H⊂OT be the set of allcompletionsof h, i.e. H={~h∈OT:h⊏~h}. Then we have Cop(h|π)=Cop(H|π)=∑~h∈HCop(~h|π). On the other hand, qh|π≠qH|π=⋃~h∈Hq~h|π, so there is an important difference here between the two interpretations, which would warrant further discussion. In particular, under the infra-Bayesian physicalist interpretation it can happen that EΘ∗U[χπ(1−χqH|π)]>EΘ∗U[χπ(1−χqh|π)] for a partial history h and its set of completions H. This could be loosely interpreted as Everett branches "disappearing", as the ultraprobability of an agent not being instantiated on the partial history h is less than that of the agent not being instantiated on any completion of that history.## 4.3. Decision theory

To shed more light on the way the infra-Bayesian physicalist interpretation functions, it is interesting to consider the

decision theoryof the framework, along with the epistemic considerations above.Definition 4.21.Consider a loss function L:D→R≥0, where D=ET is the set of destinies. We can then construct thephysicalized loss function(cf. [IBP Definition 3.1]) Lphys:elΓ→R≥0, given by Lphys(γ,α)=minh∈Xαmaxd∈Dh⊏dL(d), where Xα is the set of histories witnessed by α, that is Xα={h∈E≤T|∀ga⊏h,∀~γ∈α:~γ(obs(g))=a}. Note that in our simplified context, Lphys(γ,α) doesn't depend on γ.Definition 4.22.We can define the worst-case expected physicalized loss associated to a policy π by LIBP(π)=EΘ∗U[χπ⋅Lphys]. Under the Copenhagen model, we would instead simply consider LCop(π)=ECop[L|π]=∑d∈DCop(d|π)L(d).Remark 4.23.Given a policy π∈Γ, we can consider the set of "fair" counterfactuals (cf. [IBP Definition 1.5]) Cπfair={(γ,α)∈elΓ|∀h∈O≤T:(∀~γ∈α,∀~h⊏h:~γ(~h)=γ(~h))⟹γ(h)=π(h)}, i.e. where if α witnesses the history h, then γ agrees with π on that history. This definition is in contrast with the "naive" counterfactuals we considered above (when writing χπ): Cπnaive={(γ,α)∈elΓ|γ=π}. In Definition 4.22 above, and generally whenever we use χπ, we could have used the indicator function of Cπfair instead. The choice of counterfactuals affects the various expected values, however, all of the theorems in this article remain true (and Claims 4.15 and 4.17 remain false) for both naive and fair counterfactuals. We thus work with naive counterfactuals for the sake of simplicity.Similarly to Section 4.2, the "on the nose" claim relating the two interpretations fails, but we have an asymptotic relationship which holds.

Claim 4.24.The two interpretations agree on the loss of any policy: LIBP(π)=LCop(π).Again, this turns out to be false, and we give a simple counterexample in Counterexample 5.6.

To allow discussing the asymptotic behavior, assume now that we incur a loss at each timestep, given by ℓ:E=O×A→R≥0, and we consider the total loss L=T∑t=1ℓt:D→R≥0. We might hope that we could have at least the following.

Claim 4.25.The two interpretations agree on the loss of any policy asymptotically: LIBP(π)∼LCop(π), i.e. the difference is bounded sublinearly in T.This claim is still false in general for essentially the same reason as Claim 4.24 since certain policies might involve a one-off step that then affect the entire asymptotic loss. We give a detailed explanation in Counterexample 5.7. We do however have the following.

Theorem 4.26.If the resulting MDP is communicating (see Definition 7.8), then for any policy π we have LCop(π∗)−o(T)≤LIBP(π)≤LCop(π), where π∗ is a Copenhagen-optimal policy. In particular, optimal losses for the IBP and Copenhagen frameworks agree asymptotically.Proof.See Theorem 7.1 for the upper bound and Theorem 7.21 for the lower bound. □## 5. Examples

We'll look at a few concrete examples in detail, firstly to gain some insight into how Claims 4.15 and 4.17 fail in general, and secondly to see how our framework operates in the famously puzzling Wigner's friend scenario.

## 5.1. Counterexamples

We'll construct simple counterexamples to Claims 4.15 and 4.17 in the smallest non-degenerate case, i.e. when |O|=2 and |A|=2, and T=1. Let O={o0,o1} and A={a0,a1}. There are four policies in this case (ignoring the value of the policies on the empty input, which is irrelevant in our setting, see Remark 2.8), which we'll abbreviate as π00,π01,π10,π11, where πij(o0)=aiπij(o1)=aj. Assume h=o0, and π=π00, so αh|π={π00,π01}.

Recall [IBP Lemma 1]:

Lemma 5.1.For ρ∈ΔcelΓ×Φ, we have ρ∈Br(Θ) if and only if for each s:Γ→Γ and g:Γ×Φ→[0,1] Eρ[~g]≤EΘ[g], where ~g:elΓ×Φ→[0,1] is given by γ,α,x↦χs(γ)∈α⋅g(s(γ),x).Lemma 5.2.Let β:Γ→ΔΦ be a kernel, Θ=⊤Γ⋉β, and Θ∗=pr∗(Br(Θ)) as above. Then EΘ∗[χπ(1−χqh|π)]=E(β(π10)+β(π11))∧β(π00)[1].Proof.To obtain a lower bound (although we'll only use the upper bound for the counterexample), define the contribution ρ∈Δc(elΓ×Φ) by ρ=δπ00,{π00,π10}×ϕ10+δπ00,{π00,π11}×ϕ11, where ϕ10,ϕ11∈ΔcΦ are such that ϕ10≤β(π10), ϕ11≤β(π11), and ϕ10+ϕ11=(β(π10)+β(π11))∧β(π00). One possible such choice is ϕ10=β(π10)∧β(π00) ϕ11=β(π11)∧(β(π00)−ϕ10). Then it's easy to verify that ρ∈Br(Θ), and Eρ[χπ(1−χqh|π)]=E(β(π10)+β(π11))∧β(π00)[1]. To obtain an upper bound, fix x0∈Φ, and use Lemma 5.1 for constant s=π00, and g(γ,x)=χγ=π00⋅χx=x0. We have ~g(γ,α,x)=χπ00∈α⋅g(π00,x)=χπ00∈α⋅χx=x0, and so Eρ[χπ00∈α⋅χx=x0]=Eρ[~g]≤EΘ[χγ=π00⋅χx=x0]=Eβ(π00)[χx=x0].(1) Analogously for π10 and π11 we get Eρ[χπ10∈α⋅χx=x0]≤EΘ[χγ=π10⋅χx=x0]=Eβ(π10)[χx=x0],(2) and Eρ[χπ11∈α⋅χx=x0]≤EΘ[χγ=π11⋅χx=x0]=Eβ(π11)[χx=x0].(3)Now, χπ(1−χqh|π)χx=x0≤χπ00∈α⋅χx=x0, so by (1) we get Eρ[χπ(1−χqh|π)χx=x0]≤Eβ(π00)[χx=x0].(4)

We also have 1−χqh|π≤χπ10∈α+χπ11∈α, since π10∉α and π11∉α together would imply α⊂αh|π00. Thus χπ(1−χqh|π)χx=x0≤(χπ10∈α+χπ11∈α)⋅χx=x0, so adding (2) and (3), we obtain Eρ[χπ(1−χqh|π)χx=x0]≤Eβ(π10)+β(π11)[χx=x0].(5) Now, since both (4) and (5) hold, we get Eρ[χπ(1−χqh|π)χx=x0]≤E(β(π10)+β(π11))∧β(π00)[χx=x0]. Finally, summing over x0∈Φ we have the required upper bound EΘ∗[χπ(1−χqh|π)]=E(β(π10)+β(π11))∧β(π00)[1]. □

Counterexample 5.3.Let He be a qubit state space, and ∣∣ψe0⟩=|+⟩=1√2(|0⟩+|1⟩). Let Ua0=Ua1=idHe. Let the observation P correspond to measuring the qubit, so Po0,Po1 are projections onto |0⟩ and |1⟩ respectively. Then Claim 4.15 fails in this setup.Proof.We have |ψ0⟩=∣∣ψg0⟩⊗∣∣ψe0⟩=|0⟩⊗1√2(|0⟩+|1⟩), and so Uπ00|ψ0⟩=1√2(|o0a0⟩⊗|0⟩+|o1a0⟩⊗|1⟩), Uπ10|ψ0⟩=1√2(|o0a1⟩⊗|0⟩+|o1a0⟩⊗|1⟩), Uπ11|ψ0⟩=1√2(|o0a1⟩⊗|0⟩+|o1a1⟩⊗|1⟩). Now consider the universal observable B which is measurement along the vector |v⟩ and its complement, where |v⟩=12√3(3|o0a0⟩⊗|0⟩+|o1a0⟩⊗|1⟩−|o0a1⟩⊗|0⟩+|o1a1⟩⊗|1⟩) I.e. we have VB={v,v⊥}, and QB(v)=Pv, QB(v⊥)=Pv⊥, where Pv, Pv⊥ are projections in H=Hg⊗He onto |v⟩ and its ortho-complement respectively. Then we have the following values for βB for the various policies:This can be seen by noticing that |v⟩ is perpendicular to both Uπ10|ψ0⟩ and Uπ11|ψ0⟩, while ⟨v∣∣Uπ00ψ0⟩=2√6, so βB(v|π00)=|⟨v∣∣Uπ00ψ0⟩|2=23. This means that for this B we have (βB(π10)+βB(π11))∧βB(π00)[1]=1/3. If FB={B}, by Lemma 5.2 we have EΘ∗FB[χπ(1−χqh|π)]=E(βB(π10)+βB(π11))∧βB(π00)[1]=1/3. Now, by definition Θ∗U⊂Θ∗FB, so we also have EΘ∗U[χπ(1−χqh|π)]≤1/3<1−Cop(h|π)=12. □

Although we won't need the exact value here, we remark to the interested reader that in the above setup of Counterexample 5.3, the ultraprobability attains the lower bound of Theorem 4.19, that is EΘ∗U[χπ(1−χqh|π)]=1−√3/4≈0.134.

We can extend the above counterexample to apply to Claim 4.17, via the following.

Lemma 5.4.Let β:Γ→ΔΦ be a kernel, Θ=⊤Γ⋉β, and Θ∗=pr∗(Br(Θ)) as above. Then for h1=o0, h2=o1, EΘ∗[χπ(1−χqh1,h2|π)]=E(β(π10)+β(π01)+β(π11))∧β(π00)[