This proof section accompanies Formalizing Newcombian problems with fuzzy infra-Bayesianism. We prove the following result.
Theorem [Alexander Appel (@Diffractor), Vanessa Kosoy (@Vanessa Kosoy)]:
Let be a Newcombian problem of horizon that satisfies pseudocausality. Let denote the associated supra-POMDP with infinite time horizon and time discount Then
Furthermore, if is a family of policies such that then
Proof: Let denote the empty history. Given a supracontribution , let denote the set of maximal extreme points of First we remark that for any supra-POMDP, without loss of generality, a set of copolicies can always be replaced by
Given an episode policy let denote the episode copolicy that initializes the state to i.e. Let denote the distribution over outcomes determined by the interaction of and Note that the expected loss with respect to is equal to the expected loss for the Newcombian problem, i.e.
Recall that throughout this sequence, we assume that is finite. By the remark at the beginning of the proof, the expected loss in one episode for the corresponding supra-POMDP can be written as a maximum expected loss over a finite set of -copolicies Namely,
Then
and thus for any episode policy
We now extend this analysis to the optimal loss over episodes for [1] Let denote the episode optimal loss for Let be an arbitrary policy for episodes of Then as before,
where the maximum is over a finite set of -episode copolicies By the single episode case,
and thus
It remains to show that the opposite inequality holds in the many-episode and limit.
Recall that given we define
Recall that since satisfies pseudocausality, there exists a -optimal policy such that for all if then is also optimal for Consequently, for any episode copolicy either or To see this, suppose there exists an episode copolicy such that Then there exists a policy such that and . By pseudocausality, Thus
Define
By the remark at the beginning of the proof, the relevant set of copolicies in the definition of is finite, and thus is well-defined. If then Thus
Consider the iterated Newcombian problem over episodes. Let denote the multi-episode policy such that restricted to every episode is Let denote an arbitrary copolicy that interacts with Furthermore, let denote the number of episodes for which the episode-restriction of interacting with satisfies [2]
We have
Furthermore,
We leave it to the reader to verify that