LESSWRONG
LW

Personal Blog

2

Implementing CDT with optimal predictor systems

by Vanessa Kosoy
20th Dec 2015
AI Alignment Forum
15 min read
2

2

Ω 1

Personal Blog

2

Ω 1

Implementing CDT with optimal predictor systems
2paulfchristiano
0Vanessa Kosoy
New Comment
2 comments, sorted by
top scoring
Click to highlight new comments since: Today at 10:43 PM
[-]paulfchristiano10yΩ120

I find these theorem statements quite hard to follow. It looks like this is doing roughly the right thing and would be better to work with than reflective oracles (i.e. to the extent that there were new complications, those complications would represent real and important phenomena that were simply covered up by the simplifying assumptions). In that case I am quite interested in AIXI-with-optimal-predictors (though probably I'm most interested in a more accessible presentation, and I really feel like it should be possible to have a cleaner formalism).

As an example, it seems like this result might be of broad interest to computer scientists, since e.g. there is no polynomial time algorithm that finds an approximate Nash equilibria in general games, even apparently simple ones. An extremely natural open question is "so what should we actually expect to happen when extremely rational players play such games?" It looks like this approach may be able to give some kind of an answer via an appropriate notion of boundedly optimal, but it is pretty hard to understand whether this approach does give a satisfying answer, and if so what it is.

I think that the relevant take-away is in Theorem 1. My impression is that you are holding the strategy sets constant while effectively taking a limit over computational resources of the players (and potentially the difficulty of understanding the payoff matrix). Is that right? In that case, I expect convergence to the N.E. to only occur once the computational resources are exponential in the size of the game.

(Of course that would still be a big improvement over uncomputability, I'm just trying to understand what is going on.)

Reply
[-]Vanessa Kosoy10yΩ000

AIXI-with-optimal-predictors: I believe this is relatively straightforward. However, my plan for the next step was adapting these results to a decision rule based on logical counterfactuals in a way which produces metathreat equilibria.

Bounded Nash equilibria: I don't think the concept is entirely novel. I've seen some papers which discuss Nash-like equilibria with computational resource bounds, although the the area seems to remain largely unexplored. The particular setting I use here is not very relevant to what you're suggesting since finding Nash equilibria is non-polynomial in the number of strategies whereas here I keep the number of strategies constant. Instead, the complexity comes from the dependence of the payoff tensor on the parameter sampled from μk.

Your description of Theorem 1 is more or less correct except there's only a "payoff vector" here since this is a 1 player setting. The multiplayer setting is used in Corollary 2.

Regarding dependence on game size, it is not as bad as exponential. The Lipton-Markakis-Mehta algorithm finds ϵ-equilibria in time O(nlognϵ2)

Reply
Moderation Log
More from Vanessa Kosoy
View more
Curated and popular this week
2Comments

We consider transparent games between bounded CDT agents ("transparent" meaning each player has a model of the other players). The agents compute the expected utility of each possible action by executing an optimal predictor of a causal counterfactual, i.e. an optimal predictor for a function that evaluates the other players and computes the utility for the selected action. Since the agents simultaneously attempt to predict each other, the optimal predictors form an optimal predictor system for the reflective system comprised by the causal counterfactuals of all agents. We show that for strict maximizers, the resulting outcome is a bounded analogue of an approximate Nash equilibrium, i.e. a strategy which is an optimal response within certain resource constraints up to an asymptotically small error. For "thermalizers" (agents that choose an action with probability proportional to 2uT), we get a similar result with expected utility Es[u] replaced by "free utility" Es[u]+TH(s). Thus, such optimal predictor systems behave like bounded counterparts of reflective oracles.

Preliminaries

The proofs for this section are given in Appendix A.

We redefine E2(ll,ϕ) and E2(ll) to be somewhat smaller proto-error spaces which nevertheless yield the same existence theorems as before. This is thanks to Lemma A.1.

Construction 1

Given ϕ∈Φ, denote E2(ll,ϕ) the set of bounded functions δ:N2→R≥0 s.t.

∀ψ∈Φ:ψ≤ϕ⟹Eλkψ[δ(k,j)]=O(ψ(k)−1)

Denote

E2(ll):=⋂ϕ∈ΦE2(ll,ϕ)

Proposition 1

If ϕ∈Φ is s.t. ∃n:liminfk→∞2−knϕ(k)=0, O(ϕ−1) is a proto-error space.

For any ϕ∈Φ, E2(ll,ϕ) is an ample proto-error space. When ϕ is non-decreasing, E2(ll,ϕ) is stable.

E2(ll) is a stable ample proto-error space.

Notation

We denote O(ϕ−1∞):=O(ϕ−1)1∞. We allow the same abuse of notation for this symbol as for usual big O notation.

For any (poly,rlog)-bischeme ^Q=(Q,rQ,σQ) we use ^σkjQ to denote UrQ(k,j)×σkjQ.

For reflective systems R=(Σ,f,μ) we write indices in Σ as superscripts rather than subscripts as before.

Given π∈ΠΣ, we denote U(π)nkj:=Uπnkj2 and πnkj(x,y):=β(ev(πnkj1;x,y)).

Results

The proofs for this section are given in Appendix B.

We start by showing that choosing an action by maximizing over optimally predicted utility is an optimal strategy in some sense.

Theorem 1

Fix a proto-error space E. Consider A a finite set, μ a word ensemble and {fa:suppμ→[0,1]}a∈A. Suppose {^Pa}a∈A are E(poly,rlog)-optimal predictors for f. Assume that σPa and rPa don't depend on a (this doesn't limit generality since we can replace σPa by ∏b∈AσPb and rPa by a polynomial rP≥maxb∈ArPb). Consider ^A an A-valued (poly,rlog)-bischeme. Define ^M another A-valued (poly,rlog)-bischeme where ^Mkj(x) is computed by sampling y from ^σkjP and choosing arbitrarily from a∈A s.t. ^Pkja(x,y) is maximal. Then, there is δ∈E12 s.t.

Eμk×^σkjM[f^Mkj]≥Eμk×^σkjA[f^Akj]−δ(k,j)

Definition 1

Given suitable sets X,Y and ϕ∈Φ, a ϕ(poly,rlog)-scheme of signature X→Y is ^A=(A:N×X×{0,1}∗2→Y,rA:N→N,σA:N×{0,1}∗→[0,1]) s.t.

(i) The runtime of Ak is bounded by p(tϕ(k)) for some polynomial p.

(ii) |rA(k)|≤p′(tϕ(k)) for some polynomial p′.

(iii) σkA is a probability measure on {0,1}∗ and there is c>0 s.t. suppσkA⊆{0,1}≤c⌊log(k+2)ϕ(k)⌋.

A ϕ(poly,rlog)-scheme of signature {0,1}∗→Y is also called a Y-valued ϕ(poly,rlog)-scheme.

As usual, we sometimes use ^Akj as implicit notation in which A receives a value sampled from UrA(k) for the second argument and/or a value sampled from σkA for the third argument.

Corollary 1

Fix ϕ∈Φ. Consider A a finite set, μ a word ensemble and {fa:suppμ→[0,1]}a∈A. Suppose {^Pa}a∈A are E2(ll,ϕ)(poly,rlog)-optimal predictors for f. Consider ϵ∈(0,1) and ^A an A-valued ϕ1−ϵ(poly,rlog)-scheme. Define ^M as in Theorem 1. Then

Eμk×λkϕ[E^σkjM[f^Mkj]]≥Eμk×^σkA[f^Ak]−O(ϕ−1∞)


We now introduce the game theoretical setting.

Definition 2

A distributional game is a quadruple G=(μ,P,A,u) where μ is a word ensemble, P is a finite set representing players, {An}n∈P are finite sets representing the possible actions of each player and {un:suppμ×∏n∈PAn→[0,1]}n∈P represent the utility functions.


For each x∈suppμ, G defines an ordinary game in normal form. We think of G as a single game where the strategies are (some class of) functions s:suppμ→An and the payoffs are Eμ[un(∏n∈Psn(x))].

Construction 2

Consider G=(μ,P,A,u) a distributional game and {ϕn∈Φ}n∈P. We construct the reflective system RϕG=(P,uϕ,μG).

μG is defined by

μnkG(k,x,a)=μk(x)χAn(a)#An

Define MnG:N2×suppμ×{0,1}∗×ΠP→℘(An) by

MnkjG(x,y,π):=argmaxa∈Aπnkj((k,x,a),y)

Denote ΔnG to be the space of mixed actions of player n i.e. the space of probability measures on An.

Define snG,ϕ:N×suppμ×ΠP→ΔnG by

PrsnkG,ϕ(x,π)[a]:=Eλkϕn[EU(π)nkj[χMnkjG(x,y,π)#MnkjG(x,y,π)]]

Define sG,ϕ:N×suppμ×ΠP→∏n∈PΔnG by

skG,ϕ(x,π):=∏n∈PsnkG,ϕ(x,π)

Define s¯nG,ϕ:N×suppμ×ΠP→∏m∈P∖nΔmG by

s¯nkG,ϕ(x,^Q):=∏m∈P∖nsmkG,ϕ(x,^Qm)

Finally, define uϕ by

unkϕ(x,a,π):=Ea×s¯nkG,ϕ(x,π)[un]


For any (poly,rlog)-predictor ^Q we will use the notation snkG,ϕ(x,^Q)∈ΔnG to mean

snkG,ϕ(x,^Q):=EσQ[snkG,ϕ(x,^Q[z])]

For a family {^Qn}n∈P of (poly,rlog)-predictors, we will use the notations skG,ϕ(x,^Q)∈∏n∈PΔnG and s¯nkG,ϕ(x,^Q)∈∏m∈P∖nΔmG to mean

skG(x,^Q):=∏n∈PsnkG,ϕ(x,^Qn)

s¯nkG(x,^Q):=∏m∈P∖nsmkG,ϕ(x,^Qm)

Corollary 2

Fix G=(μ,P,A,u) a distributional game, {ϕn∈Φ}n∈P and ^P an E∗2(ll,ϕ)(poly,rlog)-optimal predictor system for RϕG. Consider n∈P, ϵ∈(0,1) and ^A an An-valued (ϕn)1−ϵ(poly,rlog) scheme. Then

Eμk[EskG,ϕ(x,^P)[un]]≥Eμk[E^Ak(x)×s¯nkG,ϕ(x,^P)[un]]−O((ϕn)−1∞)


We now move from studying strict maximizers to studying thermalizers.

Theorem 2

Fix a proto-error space E. Consider A a finite set, μ a word ensemble and {fa:suppμ→[0,1]}a∈A. Suppose {^Pa}a∈A are E(poly,rlog)-optimal predictors for f and T:N2→R>0 is some function. Assume that σPa and rPa don't depend on a. Consider ^A an A-valued (poly,rlog)-bischeme. Suppose ^MT is another A-valued (poly,rlog)-bischeme satisfying

|Pr^σM[MkjT(x)=a]−E^σP[ZkjT(x,y)−12Pkja(x,y)Tkj]|≤δr(k,j)

Here y is sampled from ^σP, ZkjT(x,y):=∑a∈A2Pkja(x,y)Tkj is the normalization factor and δr,Tδ12r∈E14. Then, there is δ∈E14 s.t.

Eμk[E^σkjMT[f^MkjT(x)(x)]+TkjH(^MkjT(x))]≥Eμk[E^σkjA[f^Akj(x)(x)]+TkjH(^Akj(x))]−δ(k,j)

Here H for stands the (base 2) Shannon entropy of a probability measure on A.

Corollary 3

Fix ϕ∈Φ. Consider A a finite set, μ a word ensemble and {fa:suppμ→[0,1]}a∈A. Suppose {^Pa}a∈A are E2(ll,ϕ)(poly,rlog)-optimal predictors for f and T:N→R>0 a bounded function. Consider ϵ∈(0,1) and ^A an A-valued ϕ1−ϵ(poly,rlog)-scheme. Assume ^MT is as in Theorem 2. Then

Eμk[Eλkϕ[E^σkjMT[f^MkjT(x)(x)]]+TkH(Eλkϕ[^MkjT(x)])]≥Eμk[E^σkA[f^Ak(x)(x)]+TkH(^Ak(x))]−O(ϕ−1∞)

Here Eλkϕ[^MkjT(x)] stands for averaging the probability measure ^MkjT(x) on A with respect to parameter j using probability distribution λkϕ.

Construction 3

Consider G=(μ,P,A,u) a distributional game, {ϕn∈Φ}n∈P and T:P×N→R>0. We construct the reflective system Rϕ,TG=(P,uϕ,T,μG).

Define snG,ϕ,T:N×suppμ×ΠP→ΔnG by

PrsnkG,ϕ,T(x,π)[a]:=Eλkϕn[EU(π)nkj[ZnkjG,ϕ,T(x,y,π)−12(Tnk)−1πnkj((k,x,a),y)]]

Here Z is the normalization factor.

Define sG,ϕ,T:N×suppμ×ΠP→∏n∈PΔnG by

skG,ϕ,T(x,π):=∏n∈PsnkG,ϕ,T(x,π)

Define s¯nG,ϕ,T:N×suppμ×ΠP→∏m∈P∖nΔmG by

s¯nkG,ϕ,T(x,π):=∏m∈P∖nsmkG,ϕ,T(x,π)

Finally, define uϕ,T by

unkϕ,T(x,a,π):=Ea×s¯nkG,ϕ,T(x,π)[un]


We define the notations snkG,ϕ,T(x,^Q), skG,ϕ,T(x,^Q) and s¯nkG,ϕ,T(x,^Q) analogously to before.

Corollary 4

Fix G=(μ,P,A,u) a distributional game, {ϕn∈Φ}n∈P, T:P×N→R>0 bounded and ^P an E∗2(ll,ϕ)(poly,rlog)-optimal predictor system for Rϕ,TG. Consider n∈P, ϵ∈(0,1) and ^A an An-valued (ϕn)1−ϵ(poly,rlog) scheme. Then

Eμk[EskG,ϕ,T(x,^P)[un]+TkH(snkG,ϕ,T(x,^P))]≥Eμk[E^Ak(x)×s¯nkG,ϕ,T(x,^P)[un]+TkH(^Ak(x))]−O((ϕn)−1∞)


Finally, we show that assuming ϕn doesn't depend on n and T doesn't fall to zero too fast, deterministic advice is sufficient to construct an optimal predictor system for Rϕ,TG.

Definition 3

Consider a reflective system R=(Σ,f,μ) and {ϕn∈Φ}n∈Σ. R is said to be ϕ-Hoelder when there are {cnk∈R>0}n∈Σ,k∈N, {ρn:Σ→[0,1]}n∈Σ, {ψnm∈Φ}n,m∈Σ, {αn∈(0,1]}n∈Σ and {δn:N→R≥0}n∈Σ s.t.

(i) ∀n∈Σ∃ϵn>0:cnk=O(ϕn(k)αn2−ϵn)

(ii) ∑m∈Σρn(m)=1

(iii) ψnm≥ϕn

(iv) δn(k)=O(ϕn(k)−1∞)

(v) Eμnk[(fn(x,π)−fn(x,~π))2]≤cnkEρn[Eμmk×λkψnm[EU(π)mkj×U(~π)mkj[(πmkj(x,y)−~πmkj(x,y))2]]]αn+δn(k)

Theorem 3

Consider a finite set Σ, a reflective system R=(Σ,f,μ) and {ϕn∈Φ}n∈Σ. If R is ϕ-Hoelder then it has an E∗2(ll,ϕ)(poly,log)-optimal predictor system.

Theorem 4

Consider G=(μ,P,A,u) a distributional game, ϕ∈Φ and T:P×N→R>0. Assume 1Tnk=O(ϕ(k)14−ϵ) for some ϵ>0. Then Rϕ,TG is ϕ-Hoelder.

Appendix A

Proposition A.1

Suppose h:R≥0→R≥0 is a non-decreasing function s.t. ∫∞0h(x)−1dx<∞. Define δh:N2→R≥0 by

δh(k,j):=min(loglog(k+3)h(loglog(j+3)),1)

Then, δh∈E2(ll).

Proof of Proposition A.1

Consider ϕ∈Φ. We have

limsupk→∞ϕ(k)Eλkϕ[δh(k,j)]=limsupk→∞ϕ(k)∑tϕ(k)−1j=2(loglog(j+1)−loglogj)δh(k,j)loglogtϕ(k)

limsupk→∞ϕ(k)Eλkϕ[δh(k,j)]=limsupk→∞ϕ(k)∫tϕ(k)x=2δh(k,⌊x⌋)d(loglogx)ϕ(k)loglogk

limsupk→∞ϕ(k)Eλkϕ[δh(k,j)]≤limsupk→∞∫tϕ(k)x=2h(loglog(⌊x⌋+3))−1d(loglogx)

limsupk→∞ϕ(k)Eλkϕ[δh(k,j)]≤limsupk→∞∫tϕ(k)x=2h(loglogx)−1d(loglogx)

limsupk→∞ϕ(k)Eλkϕ[δh(k,j)]≤∫∞0h(t)−1dt

Proof of Proposition 1

Mostly obvious modulo Proposition A.1. To see E2(ll,ϕ) is stable for non-decreasing ϕ, consider a non-constant polynomial p:N→N and δ∈E2(ll,ϕ). Define δ′(k,j):=δ(p(k),j). To get the desired condition for δ′ and ψ∈Φ with ψ≤ϕ, consider any ψ′∈Φ s.t. ψ′≤ϕ and for sufficiently large k we have ψ′(p(k))=ψ(k). Suche ψ′ exists since for for k sufficiently large ϕ(k)≤ϕ(p(k)). We have

limsupk→∞ψ′(k)Eλkψ′[δ(k,j)]<∞

In particular

limsupk→∞ψ′(p(k))Eλp(k)ψ′[δ(p(k),j)]<∞

limsupk→∞ψ(k)Eλkψ[δ′(k,j)]<∞

Proposition A.2

Consider a polynomial q:N2→N. There is a function λq:N3→[0,1] s.t.

(i) ∀k,j∈N:∑i∈Nλq(k,j,i)=1

(ii) For any function ϵ:N2→[0,1] we have

ϵ(k,j)−∑i∈Nλq(k,j,i)ϵ(k,q(k,j)+i)∈E2(ll)

Proof of Proposition A.2

Given functions q1,q2:N2→N s.t. q1(k,j)≥q2(k,j) for k,j≫0, the proposition for q1 implies the proposition for q2 by setting

λq2(k,j,i):={λq1(k,j,i−q1(k,j)+q2(k,j))if i−q1(k,j)+q2(k,j)≥00if i−q1(k,j)+q2(k,j)<0

Therefore, it is enough to prove to proposition for functions of the form q(k,j)=jm+nlogk for m>0.

Consider any ϕ∈Φ. We have

limsupk→∞ϕ(k)loglogkϕ(k)loglogk<∞

limsupk→∞ϕ(k)log(m+nlogk)ϕ(k)loglogk<∞

limsupk→∞ϕ(k)2m+nlogk∫x=2d(loglogx)ϕ(k)loglogk<∞

Since ϵ takes values in [0,1]

limsupk→∞ϕ(k)2m+nlogk∫x=2ϵ(k,⌊x⌋)d(loglogx)ϕ(k)loglogk<∞

Similarly

limsupk→∞ϕ(k)tϕ(k)m+nlogk∫x=tϕ(k)ϵ(k,⌊x⌋)d(loglogx)ϕ(k)loglogk<∞

The last two equations imply that

limsupk→∞ϕ(k)tϕ(k)∫x=2ϵ(k,⌊x⌋)d(loglogx)−tϕ(k)m+nlogk∫x=2m+nlogkϵ(k,⌊x⌋)d(loglogx)ϕ(k)loglogk<∞

Raising x to a power is equivalent to adding a constant to loglogx, therefore

limsupk→∞ϕ(k)tϕ(k)∫x=2ϵ(k,⌊x⌋)d(loglogx)−tϕ(k)∫x=2ϵ(k,⌊xm+nlogk⌋)d(loglogx)ϕ(k)loglogk<∞

limsupk→∞ϕ(k)tϕ(k)∫x=2(ϵ(k,⌊x⌋)−ϵ(k,⌊xm+nlogk⌋))d(loglogx)ϕ(k)loglogk<∞

Since ⌊xm+nlogk⌋≥⌊x⌋m+nlogk we can choose λq satisfying condition (i) so that

j+1∫x=jϵ(k,⌊xm+nlogk⌋)d(loglogx)=(loglog(j+1)−loglogj)∑iλq(k,j,i)ϵ(k,jm+nlogk+i)

It follows that

j+1∫x=jϵ(k,⌊xm+nlogk⌋)d(loglogx)=j+1∫x=j∑iλq(k,⌊x⌋,i)ϵ(k,⌊x⌋m+nlogk+i)d(loglogx)

limsupk→∞ϕ(k)tϕ(k)∫x=2(ϵ(k,⌊x⌋)−∑iλq(k,⌊x⌋,i)ϵ(k,⌊x⌋m+nlogk+i))d(loglogx)ϕ(k)loglogk<∞

limsupk→∞ϕ(k)∑tϕ(k)−1j=2(loglog(j+1)−loglogj)(ϵ(k,j)−∑iλq(k,j,i)ϵ(k,jm+nlogk+i))ϕ(k)loglogk<∞

ϵ(k,j)−∑i∈Nλq(k,j,i)ϵ(k,q(k,j)+i)∈E2(ll)

Lemma A.1

Consider (f,μ) a distributional estimation problem, ^P, ^Q (poly,rlog)-predictors. Suppose p:N2→N a polynomial, ϕ∈Φ and δ∈E2(ll,ϕ) are s.t.

∀i,k,j∈N:E[(^Pk,p(k,j)+i−f)2]≤E[(^Qkj−f)2]+δ(k,j)

Then ∃δ′∈E2(ll,ϕ) s.t.

E[(^Pkj−f)2]≤E[(^Qkj−f)2]+δ′(k,j)


The proof of Lemma A.1 is analogous to before and we omit it.

Appendix B

The following is a slightly stronger version of one direction of the orthogonality lemma.

Lemma B.1

Consider (f,μ) a distributional estimation problem and ^P an E(poly,rlog)-optimal predictor for (f,μ). Then there are c1,c2∈R and an E12-moderate function δ:N5→[0,1] s.t. for any k,j,r,l,t∈N, τ a probability measure on {0,1}≤l and Q:suppμk×{0,1}rP(k,j)×{0,1}∗×{0,1}r×{0,1}≤lalg−→Q that runs in time t on any valid input

|Eμk×UrP(k,j)×σkjP×Ur×τ[Q(x,y,z,u,w)(Pkj(x,y,z)−f(x))]|≤(c1+c2Eμk×UrP(k,j)×σkjP×Ur×τ[Q(x,y,z,u,w)2])δ(k,j,t,2l,2|Q|)


The proof is analogous to the original and we omit it.

Lemma B.2

Fix a proto-error space E. Consider A a finite set, μ a word ensemble and {fa:suppμ→[0,1]}a∈A. Suppose {^Pa}a∈A are E(poly,rlog)-optimal predictors for f. Assume that σPa and rPa don't depend on a. Consider ^A an A-valued (poly,rlog)-bischeme. Assume ^σA factors as ^σP×τ. Then

|Eμ×^σP×τ[PA(x,y,z)(x,y)−fA(x,y,z)]|∈E12

Proof of Lemma B.2

Eμ×^σP×τ[PA(x,y,z)(x,y)−fA(x,y,z)(x)]=∑a∈AEμ×^σP×τ[δA(x,y,z),a(Pa(x,y)−fa(x))]

Applying Lemma B.1 we get the desired result.

Proof of Theorem 1

Lemma B.2 implies

|Eμ×^σP×^σA[^P^A−f^A]|∈E12

|Eμ×^σP×τ[^P^M−f^M]|∈E12

Combining the two

|Eμ×^σP×τ[^P^M−f^M]|+|Eμ×^σP×^σA[^P^A−f^A]|∈E12

|Eμ×^σP×τ[^P^M−f^M]−Eμ×^σP×^σA[^P^A−f^A]|∈E12

|Eμ×^σP×τ×^σA[^P^M−^P^A]−Eμ×^σP×τ×^σA[f^M−f^A]|∈E12

The construction of ^M implies that for every x∈suppμk we have

E^σP×τ[PkjMkj(x,y,z)(x,y)]≥E^σP×^σA[PkjAkj(x,z)(x,y)]

Combining with the above we get the desired result.

Proposition B.1

Consider ϕ∈Φ, ϵ∈(0,1), ζ:N2→R bounded below and ξ:N→R bounded. Assume that ∀k,j∈N:j≥tϕ1−ϵ(k)⟹ζkj=ξk. Then Eλkϕ[ζkj]≥ξk−O(ϕ(k)−ϵ).

Proof of Proposition B.1

Eλkϕ[ζkj]=Prλkϕ[j<tϕ1−ϵ(k)]Eλkϕ[ζkj∣j<tϕ1−ϵ(k)]+Prλkϕ[j≥tϕ1−ϵ(k)]Eλkϕ[ζkj∣j≥tϕ1−ϵ(k)]

Eλkϕ[ζkj]≥loglogtϕ1−ϵ(k)loglogtϕ(k)infζ+loglogtϕ(k)−loglogtϕ1−ϵ(k)loglogtϕ(k)ξk

Eλkϕ[ζkj]≥O(ϕ(k)−ϵ)infζ+(1−O(ϕ(k)−ϵ))ξk

Eλkϕ[ζkj]≥ξk−O(ϕ(k)−ϵ)

Proof of Corollary 1

Choose some a0∈A. Define the A-valued (poly,rlog)-scheme ^B by

^Bkj(x):={^Ak(x) if j≥tϕ1−ϵ(k)a0 otherwise

Applying Theorem 1 we get δ∈E122(ll,ϕ) s.t.

Eμk×^σkjM[f^Mkj]≥Eμk×^σkjB[f^Bkj]−δ(k,j)

Eμk×λkϕ[E^σkjM[f^Mkj]]≥Eμk×λkϕ[E^σkjB[f^Bkj]]−Eλkϕ[δ(k,j)]

Proposition B.1 implies

Eμk×λkϕ[E^σkjB[f^Bkj]]≥Eμk×^σkA[f^Ak]−O(ϕ(k)−ϵ)

Also

Eλkϕ[δ(k,j)]≤Eλkϕ[δ(k,j)2]12

Eλkϕ[δ(k,j)]=O(ϕ(k)−12)

Putting everything together we get the desired result.

Proof of Corollary 2

For each a∈An define ^Pkja(k,x):=^Pkj(k,x,a) and unϕ,a(k,x):=unkϕ(x,a,^P). It is easy to see ^Pa is an E∗2(ll,ϕ)(poly,rlog)-optimal predictor for uϕ,a. We have

EskG,ϕ(x,^P)[un]=Eλkϕn[E^σkjM[unkϕ,^Mkj(x)(x)]]±O(ϕn(k)−1)

Here ^Mkj(x,y,z) selects an element of MnkjG(x,y,^P[z]) uniformly at random up to a small rounding error which yields the O(ϕn(k)−1) term. Also

E^Ak(x)×s¯nkG,ϕ(x,^P)[un]=E^σkA[unkϕ,^Ak(x)(x)]

Applying Corollary 1 we get the desired result.

Proposition B.2

Consider A a finite set, u:A→R and T>0. Define ϑ:A→[0,1] by ϑ(a):=Z−12u(a)T where Z is the normalization constant ∑a∈A2u(a)T. Consider ρ:A→[0,1] with ∑a∈Aρ(a)=1. Then

Eϑ[u]+TH(ϑ)=Eρ[u]+TH(ρ)+TDKL(ρ∥ϑ)

Proof of Proposition B.2

DKL(ρ∥ϑ)=Eρ[logρϑ]

DKL(ρ∥ϑ)=Eρ[logρZ−12uT]

DKL(ρ∥ϑ)=Eρ[logρ+logZ−uT]

DKL(ρ∥ϑ)=−H(ρ)+logZ−Eρ[u]T

TDKL(ρ∥ϑ)=TlogZ−(Eρ[u]+TH(ρ))

On the other hand

H(ϑ)=−Eϑ[logϑ]

H(ϑ)=−Eϑ[log(Z−12uT)]

H(ϑ)=−Eϑ[−logZ+uT]

H(ϑ)=logZ−Eϑ[u]T

TlogZ=Eϑ[u]+TH(ϑ)

Combining the two we get the desired result.

Proposition B.3

Consider A a finite set, u1,u2:A→R and T>0. Define the probability measures ϑ1,ϑ2:A→[0,1] by ϑi(a):=Z−1i2ui(a)T where Zi are normalization factors. Then

DKL(ϑ1∥ϑ2)≤T−1∑a∈A|u1(a)−u2(a)|

Proof of Proposition B.3

DKL(ϑ1∥ϑ2)=Eϑ1[logϑ1ϑ2]

DKL(ϑ1∥ϑ2)=Eϑ1[logZ−112u1TZ−122u2T]

DKL(ϑ1∥ϑ2)=Eϑ1[u1−u2T]+logZ2Z1

DKL(ϑ1∥ϑ2)=Eϑ1[u1−u2T]+log∑a∈A2u2(a)TZ1

DKL(ϑ1∥ϑ2)=Eϑ1[u1−u2T]+log∑a∈AZ−112u1(a)T2u2(a)−u1(a)T

DKL(ϑ1∥ϑ2)=Eϑ1[u1−u2T]+logEϑ1[2u2−u1T]

DKL(ϑ1∥ϑ2)≤T−1(maxa∈A(u1(a)−u2(a))−mina∈A(u1(a)−u2(a)))

DKL(ϑ1∥ϑ2)≤T−1∑a∈A|u1(a)−u2(a)|

Proposition B.4

Consider A a finite set. There is a c>0 s.t. for any ρ1,ρ2 probability measures on A we have

|H(ρ1)−H(ρ2)|≤cdTV(ρ1,ρ2)12

Here dTV(ρ1,ρ2):=12∑a∈A|ρ1(a)−ρ2(a)| is the total variation distance.

Proof of Proposition B.4

Given 0≤p≤q≤1 we have

|qlogq−plogp|=|∫qpd(xlogx)|

|qlogq−plogp|=|∫qp(logx+1ln2)dx|

|qlogq−plogp|=|∫qplog(ex)dx|

|qlogq−plogp|≤∫qp|log(ex)|dx

For some c′>0 we have |log(ex)|≤c′2x−12 for x∈(0,1] hence

|qlogq−plogp|≤∫qpc′2x−12dx

|qlogq−plogp|≤c′(q12−p12)

|qlogq−plogp|≤c′(q−p)12

|H(ρ1)−H(ρ2)|=|∑a∈Aρ1(a)logρ1(a)−ρ2(a)logρ2(a)|

|H(ρ1)−H(ρ2)|≤∑a∈A|ρ1(a)logρ1(a)−ρ2(a)logρ2(a)|

|H(ρ1)−H(ρ2)|≤∑a∈Ac′|ρ1(a)−ρ2(a)|12

Using the concavity of the square root we complete the proof.

Proof of Theorem 2

For any k,j∈N, x∈suppμk and y∈{0,1}∗2 define ϑkj(x) and ϑkj(x,y) probability measures on A by

Prϑkj(x)[a]:=Wkj(x)−12(Tkj)−1E^σP[^Pkja(x)]

Prϑkj(x,y)[a]:=ZkjT(x,y)−12(Tkj)−1Pkja(x,y)

Here W is the normalization factor. By the convexity of Kullback-Leibler divergence

DKL(E^σP[ϑkj(x,y)]∥ϑkj(x))≤E^σP[DKL(ϑkj(x,y)∥ϑkj(x))]

Applying Proposition B.3

DKL(E^σP[ϑkj(x,y)]∥ϑkj(x))≤E^σP[(Tkj)−1∑a∈A|Pkja(x,y)−E^σP[Pkja(x,y′)]|]

DKL(E^σP[ϑkj(x,y)]∥ϑkj(x))≤(Tkj)−1∑a∈AE^σP[(Pkja(x,y)−E^σP[Pkja(x,y′)])2]12

DKL(E^σP[ϑkj(x,y)]∥ϑkj(x))≤(Tkj)−1∑a∈AE^σP×^σP[(Pkja(x,y)−Pkja(x,y′))2]12

Applying the uniqueness theorem, we get

TkjEμk[DKL(E^σP[ϑkj(x,y)]∥ϑkj(x))]∈E14

We also have

dTV(^MkjT(x),E^σP[ϑkj(x,y)])≤c1δr(k,j)

for some c1>0. Applying Propositions B.2 and B.4 we get that

Tkj|DKL(^MkjT(x)∥ϑkj(x))−DKL(E^σP[ϑkj(x,y)]∥ϑkj(x))|≤c2δr(k,j)+c3Tkjδr(k,j)12

for some c2,c3>0. Combining with the above, we get

TkjEμk[DKL(^MkjT(x)∥ϑkj(x))]∈E14

In particular, for some δ∈E14

TkjEμk[DKL(^MkjT(x)∥ϑkj(x))]≤TkjEμk[DKL(^Akj(x)∥ϑkj(x))]+δ(k,j)

−TkjEμk[DKL(^MkjT(x)∥ϑkj(x))]≥−TkjEμk[DKL(^Akj(x)∥ϑkj(x))]−δ(k,j)

Applying Proposition B.2

Eμk[E^σkjMT[^P^MkjT(x)(x)]+TkjH(^MkjT(x))]≥Eμk[E^σkjA[^P^Akj(x)(x)]+TkjH(^Akj(x))]−δ(k,j)

Applying Lemma B.2 we get the desired result.

Proof of Corollary 3

Choose some a0∈A. Define the A-valued (poly,rlog)-scheme ^B by

^Bkj(x):={^Ak(x) if j≥tϕ1−ϵ(k)a0 otherwise

Applying Theorem 2 we get δ∈E142(ll,ϕ) s.t.

Eμk[E^σkjMT[f^MkjT(x)(x)]+TkH(^MkjT(x))]≥Eμk[E^σkjB[f^Bkj(x)(x)]+TkH(^Bkj(x))]−δ(k,j)

Eμk[Eλkϕ[E^σkjMT[f^MkjT(x)(x)]]+TkEλkϕ[H(^MkjT(x))]]≥Eλkϕ×μk[E^σkjB[f^Bkj(x)(x)]+TkH(^Bkj(x))]−Eλkϕ[δ(k,j)]

Using the concavity of entropy and the property of δ

Eμk[Eλkϕ[E^σkjMT[f^MkjT(x)(x)]]+TkH(^MkjT(x))]≥Eλkϕ×μk[E^σkjB[f^Bkj(x)(x)]+TkH(^Bkj(x))]−O(ϕ(k)−14)

Applying Proposition B.1 we get the desired result.

Proof of Corollary 4

For each a∈An define ^Pkja(k,x):=^Pkj(k,x,a) and unϕ,T,a(k,x):=unkϕ,T(x,a,^P). It is easy to see ^Pa is an E∗2(ll,ϕ)(poly,rlog)-optimal predictor for uϕ,T,a. Construct the (poly,rlog)-bischeme ^MT to satisfy the condition of Theorem 2 with δr(k,j)≤2−j. We have (with the help of Proposition B.4)

EskG,ϕ,T(x,^P)[un]=Eλkϕn[E^σkjMT[unkϕ,T,^MkjT(x)(x)]]±O(ϕn(k)−1)

H(snkG,ϕ,T(x,^P))=H(Eλkϕn[^MkjT(x)])±O(ϕn(k)−12)

E^Ak(x)×s¯nkG,ϕ,T(x,^P)[un]=E^σkA[unkϕ,T,^Ak(x)(x)]

Applying Corollary 3 we get the desired result.

Proposition B.5

Consider a finite set Σ, R=(Σ,f,μ) a ϕ-Hoelder reflective system and two collections of (poly,rlog)-predictors {^Qn1}n∈Σ and {^Qn2}n∈Σ. Assume that ∀n∈Σ:^Qn1μn≃E122(ll)^Qn2. Then

∀n∈Σ:Eμkn[(R[^Q1]n(x)−R[^Q1]n(x))2]=O(ϕn(k)−1∞)

Proof of Proposition B.5

Eμnk[(R[^Q1]n(x)−R[^Q1]n(x))2]=Eμnk[(Eσ1[fn(x,^Q1[a])]−Eσ2[fn(x,^Q2[a])])2]

Eμnk[(R[^Q1]n(x)−R[^Q1]n(x))2]≤Eμnk×σ1×σ2[(fn(x,^Q1[a1])−fn(x,^Q2[a2]))2]

Eμnk[(R[^Q1]n(x)−R[^Q1]n(x))2]≤Eσ1×σ2[cnkEρn[Eλkψnm[Eμmk×Ur1(k,j)×Ur2(k,j)[(Qmkj1(x,y,a1)−Qmkj2(x,y,a2))2]]]αn]+δn(k)

Eμnk[(R[^Q1]n(x)−R[^Q1]n(x))2]≤cnkEρn[Eλkψnm[Eμmk×Ur1(k,j)×Ur2(k,j)×σ1×σ2[(Qmkj1(x,y,a1)−Qmkj2(x,y,a2))2]]]αn+δn(k)

Using the similarity of ^Q1 and ^Q2 there are {~δn:N2→[0,1]∈E2(ll)}n∈Σ s.t.

Eμnk[(R[^Q1]n(x)−R[^Q1]n(x))2]≤cnkEρn[Eλkψnm[~δm(k,j)12]]αn+δn(k)

Eμnk[(R[^Q1]n(x)−R[^Q1]n(x))2]≤cnkEρn[Eλkψnm[~δm(k,j)]]αn2+δn(k)

Eμnk[(R[^Q1]n(x)−R[^Q1]n(x))2]≤O(ϕn(k)αn2−ϵn)Eρn[O(ψnm(k)−1)]αn2+O(ϕn(k)−1∞)

Using ψnm≥ϕn we get the desired result.

Proof of Theorem 3

Fix a finite set Σ and a collection {ϕn∈Φ}n∈Σ. Consider R a ϕ-Hoelder reflective system. By the general existence theorem, there is ^R an E2(ll)(poly,rlog)-optimal predictor system for R. For each n∈Σ we can choose ^Pn, an E2(ll)(poly,log)-optimal predictor for R[^R]n. By the uniqueness theorem, we have ^Pnμn≃E122(ll)^Rn. By Proposition B.5 this implies Eμnk[(R[^P]n(x)−R[^R]n(x))2]=O(ϕn(k)−1∞). This means ^Pn is an E∗2(ll,ϕn)(poly,log)-optimal predictor for R[^P]n and ^P is an E∗2(ll,ϕ)(poly,log)-optimal predictor system for R.

Proposition B.6

Fix a finite set Σ and a collection {ϕn∈Φ}n∈Σ. Consider R=(Σ,f,μ) a reflective system. Assume there are {cnmk∈R>0}n,m∈Σ,k∈N, {ψnm∈Φ}n,m∈Σ, {αn∈(0,1]}n∈Σ and {δnm:N→R≥0}n,m∈Σ s.t.

(i) ∀n,m∈Σ∃ϵnm>0:cnmk=O(ϕn(k)αn2−ϵnm)

(ii) ψnm≥ϕn

(iii) δnm(k)=O(ϕn(k)−1∞)

(iv) For any m∈Σ and π,~π∈ΠΣ s.t. ∀l∈Σ∖m,k,j∈N:πlkj=~πlkj Eμnk[(fn(x,π)−fn(x,~π))2]≤cnmkEμmk×λkψnm[EU(π)mkj×U(~π)mkj[(πmkj(x,y)−~πmkj(x,y))2]]αn+δnm(k)

Then, R is ϕ-Hoelder.

Proof of Proposition B.6

Identify Σ with {n∈N∣1≤n≤N} where N=#Σ. Consider any π,~π∈ΠΣ. Define πn recursively for all n≤N by π0:=π, πn+1,kjn+1:=~πn+1,kj and ∀m∈Σ∖{n+1}:πmkjn+1:=πmkjn. In particular, it follows that πN=~π. We have

Eμnk[(fn(x,πm−1)−fn(x,πm))2]≤cnmkEμmk×λkψnm[EU(π)mkj×U(~π)mkj[(πmkj(x,y)−~πmkj(x,y))2]]αn+δnm(k)

1NN∑m=1Eμnk[(fn(x,πm−1)−fn(x,πm))2]≤1NN∑m=1(cnmkEμmk×λkψnm[EU(π)mkj×U(~π)mkj[(πmkj(x,y)−~πmkj(x,y))2]]αn+δnm(k))

Denoting cnk:=maxm∈Σcnmk, δn:=1N∑Nm=1δnm we get

Eμnk[(1NN∑m=1(fn(x,πm−1)−fn(x,πm)))2]≤cnk(1NN∑m=1Eμmk×λkψnm[EU(π)mkj×U(~π)mkj[(πmkj(x,y)−~πmkj(x,y))2]])αn+δn(k)

Eμnk[(fn(x,π)−fn(x,~π))2]≤N2cnk(1NN∑m=1Eμmk×λkψnm[EU(π)mkj×U(~π)mkj[(πmkj(x,y)−~πmkj(x,y))2]])αn+N2δn(k)

Proof of Theorem 4

Consider n,m∈P and π,~π∈ΠP s.t. ∀l∈P∖m,k,j∈N:πlkj=~πlkj.

EμnkG[(unkϕ,T(x,a,π)−unkϕ,T(x,a,~π))2]≤Eμk[dTV(s¯nkG,ϕ,T(x,π),s¯nkG,ϕ,T(x,~π))2]

EμnkG[(unkϕ,T(x,a,π)−unkϕ,T(x,a,~π))2]≤Eμk[dTV(smkG,ϕ,T(x,π),smkG,ϕ,T(x,~π))2]

EμnkG[(unkϕ,T(x,a,π)−unkϕ,T(x,a,~π))2]≤Eμk[(12∑b∈Am|PrsmkG,ϕ,T(x,π)[b]−PrsmkG,ϕ,T(x,~π)[b]|)2]

Slightly abusing notation, define smG,ϕ,T:N2×suppμ×{0,1}∗×ΠP→ΔmG by

PrsmkjG,ϕ,T(x,y,π)[a]:=ZmkjG,ϕ,T(x,y,π)−12(Tmk)−1πmkj((k,x,a),y)

We have

smkG,ϕ,T(x,π)=Eλkϕ[EU(π)mkj[smkjG,ϕ,T(x,y,π)]]

EμnkG[(unkϕ,T(x,a,π)−unkϕ,T(x,a,~π))2]≤Eμk[(12∑b∈Am|Eλkϕ[EU(π)mkj[PrsmkjG,ϕ,T(x,y,π)[b]]−EU(~π)mkj[PrsmkjG,ϕ,T(x,y,~π)[b]]]|)2]

EμnkG[(unkϕ,T(x,a,π)−unkϕ,T(x,a,~π))2]≤Eμk[Eλkϕ[EU(π)mkj×U(~π)mkj[dTV(smkjG,ϕ,T(x,y,π),smkjG,ϕ,T(x,y,~π))]]2]

EμnkG[(unkϕ,T(x,a,π)−unkϕ,T(x,a,~π))2]≤Eμk×λkϕ[EU(π)mkj×U(~π)mkj[dTV(smkjG,ϕ,T(x,y,π),smkjG,ϕ,T(x,y,~π))2]]

Using Pinsker's inequality

EμnkG[(unkϕ,T(x,a,π)−unkϕ,T(x,a,~π))2]≤12Eμk×λkϕ[EU(π)mkj×U(~π)mkj[DKL(smkjG,ϕ,T(x,y,π)∥smkjG,ϕ,T(x,y,~π))]]

Using Proposition B.3

EμnkG[(unkϕ,T(x,a,π)−unkϕ,T(x,a,~π))2]≤12(Tmk)−1Eμk×λkϕ[EU(π)mkj×U(~π)mkj[∑b∈Am|πmkj((k,x,b),y)−~πmkj((k,x,b),y)|]]

EμnkG[(unkϕ,T(x,a,π)−unkϕ,T(x,a,~π))2]≤12#Am(Tmk)−1EμmkG×λkϕ[EU(π)mkj×U(~π)mkj[|πmkj((k,x,b),y)−~πmkj((k,x,b),y)|]]

EμnkG[(unkϕ,T(x,a,π)−unkϕ,T(x,a,~π))2]≤12#Am(Tmk)−1EμmkG×λkϕ[EU(π)mkj×U(~π)mkj[(πmkj((k,x,b),y)−~πmkj((k,x,b),y))2]]12

Applying Proposition B.6 we get the desired result.