Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

When EDT=CDT, ADT Does Well

25th Oct 2018

3abramdemski

New Comment

1 comment, sorted by Click to highlight new comments since: Today at 8:21 AM

I don't currently know of any example where this achieves better performance than LIDT. The raw action-expectations of the logical inductor, used as an embedder, will always pass the reality filter and the CDT=EDT filter. Any embedder differing from those expectations would have to differ on the expected utility of an action other than the one it recommended, in order to pass the CDT=EDT filter. So the expected utility of alternate embedders, according to the those embedders themselves, can only be lower than that of the LIDT-like embedder.

There seems to be a deep connection between the CDT=EDT assumption and LIDT-like reasoning. In the Bayes-net setting where I prove CDT=EDT, I assume self-knowledge of mixed strategy (aka mixed-strategy ratifiability) and I assume one can implement randomized strategies without interference (mixed-strategy implementability). LIDT has the first property, since a logical inductor will learn a calibrated estimate of the action probabilities in a given situation. In problems where the environment is entangled with the randomization the agent uses, like troll bridge, LIDT can do quite poorly. Part of the original attraction of the CDT=EDT philosophy for me was the way it seems to capture how logical induction naturally wants to think about things.

I don't think loss relative to the argmax agent for the true environment is a very good optimality notion. It is only helpful in so far as the argmax strategy on the true environment is the best you can do. Other agents may perform better, in general. For example, argmax behaviour will 2-box in Newcomb so long as the predictor can't predict the agent's exploration. (Say, if the predictor is predicting you with your same logical inductor.) Loss relative to other agents more generally (like in the original ADT write-up) seems more relevant.

Back in my first post about ADT, I conjectured that ADT might achieve sublinear regret relative to argmaxing for the true embedder, if some extra condition was imposed.

The Problem:In the form of ADT stated in that post, the game of chicken is a counterexample to Conjecture 1. In short, the "delusion" embedder F (where the opponent is an identical ADT agent) has argmax converge to going straight (because it will, in expectation, yield 0.81 utility, while other strategies yield less utility.) Further, if the agent uses the delusion embedder less than 10% of the time, then E(F(amF)) will start being greater than E(E(amE)) (where E is the embedder that plugs the chosen action into both your action and your opponent's action), so it will start using the delusion embedder more often. Therefore, 10% of the time, the ADT agent rams into itself because it uses the delusion embedder.

Now, the reason why this embedder can't be removed is interesting. We can consider the agent to be reasoning as follows: "I don't know whether I'll pick the delusion embedder or the copy embedder. There's a 90% chance that an ADT agent will pick the copy embedder E, in which case F(amF) will score highly, because amF will just go straight, and amE picks swerving with 90% probability. Also, E(F(ADT)) is around 0.729, because about 90% of the time, my foe picks the copy embedder and so do I, and about 10% of the time, my foe picks the delusion embedder and so do I, and we ram into each other."

However, note that E(F(amF)) (approximately 0.81) systematically deviates from E(F(ADT)|ADT=amF) (approximately 0). This is pretty much Abram's criterion for CDT and EDT diverging. Plugging an action into the embedder tends to give a different result than plugging yourself into the embedder and conditioning on you taking that action. The second quantity

isa more accurate estimate of the utility you can expect to get. However, you can't just use it in place of E(F(amF)) because the conditional may give nonsensical answers if you almost certainly don't take the action, because conditionals on improbable events are less accurate. To rephrase this, using the conditional is vulnerable to the futarchy hack, where a trader with a lot of wealth bets on conditional contracts driving down the value of F(ADT)|ADT=amF, so the ADT agent doesn't use embedder F, so 0 dollars are lost on the conditional contracts, and the wealthy trader can repeat the same move forevermore and unilaterally prevent ever trying out embedder F.The Solution:Is there some way to address this? Yes. As stated in the comments of the last post on ADT, it is possible to improve the reality filter by switching from |Et(Et(ADTt))−Et(Ut)|<δt to (using F∗t as an abbreviation for ADTt=amFt, because this means that embedder F was chosen)

∑F∈EPt(F∗t)⋅|Et(Et(ADTt)|F∗t)−Et(Ut|F∗t)|<δt

In short, the conditionals may be inaccurate, but if they are inaccurate, then Pt(F∗t) must be low, so the expected values being inaccurate is compensated for by being multiplied by a number close to zero.

It is possible to use the same sort of trick on the divergence between plugging in an action and plugging in yourself and conditioning on an action, and require

Pt(E∗t)⋅|Et(Et(amEt))−Et(Et(ADTt)|E∗t)|<δ′t

for the embedder to be considered. This is a CDT=EDT filter. Note that this condition isn't a hard constraint on all embedders, it just applies to embedders that are chosen with non-negligible probability. If a particular embedder is definitely not taken, the multiplication by a near-zero term will make the embedder pass the filter, and Et(Et(amEt)) (the CDT utility) will be used as the utility for the embedder, since there's no conditional utility to defer to. However, if an embedder has non-negligible probability of being taken, and passes this filter, then Et(Et(amEt))≃Et(Et(ADTt)|E∗t), which is pretty much equivalent to using the EDT utility.

With the improved reality-filter, and the CDT=EDT filter, we can arrive at a proof of Conjecture 1, modulo some extra conditions.

Required Conditions:First, we must assume that the hypothesis space contains a "true environment" E, which is defined as follows. First, it passes the reality-filter (ie, the reality-filter converges to 1). Second, it passes the CDT=EDT filter (that converges to 1). Note that if the series of utilities seen is acquired by feeding E with a uniformly random bitstring on each turn, it will pass the reality-filter by an adaptation of Theorem 1 from the last post, by Kelly-betting on the exploitable difference between Et(Et(ADTt)|F∗t) and Et(Ut|F∗t), where the timely appearance of Ut and the actual action in the deductive process lets the conditional bets be resolved quickly. The second condition isn't acquired by default, but it appears that in the known toy examples games where CDT≠EDT, there is another environment that is equally accurate where the choice of action influences the opponent and CDT=EDT in that environment.

Second, it's possible that the environment could be a perverse environment with clustering behavior, so 1t⋅∑i≤tEi(Ei(amEi))−Ei(amEi) doesn't limit to 0, because feedback on what E(amE) is may be quite sparse. Therefore, we have to assume that the true environment (whatever it may happen to be) is sufficiently "nice", in the sense that the time-average of the difference between the expectation of E(amE) and E(amE) itself limits to zero.

The third condition is very similar, that 1t∑i≤tEi(Ui)−Ui limits to zero. It's more plausible than the first assumption, because (an interval containing) the true value of U appears in the deductive process in a timely manner by assumption, but I haven't found a way to implement Kelly-betting to exploit this, because the traders don't necessarily have access to the deductive process, or know their own worst-case value. It should be noted that the traders not knowing their own worst-case value is the property that necessitates emulating the

entire logical inductorto implement budgeting, in the key ϵ-ROI lemma of the logical induction paper. Therefore, it makes sense to stipulate that traders are given their own worst-case value as an advice string, to streamline budgeting, and if the logical inductor is augmented in this way, this condition is attained for free via a Kelly-betting argument.Final Proof:We will get a necessary and simple lemma out of the way first.

Lemma 1:When there is a finite set of sequences of eventsE, which are mutually exclusive, and a sequence of LUV'sUt, thenEt(Ut)≃t∑F∈EPt(F∗t)⋅Et(Ut|F∗t)Proof:By the definition of Et(Ut|F∗t), this is equivalent to the statement that Et(Ut)≃t∑F∈EEt(Ut⋅[F∗t]), where the brackets are the Iverson brackets, which is just the indicator function that is 1 if the statement is true, and 0 if the statement is false.Now, notice that the bounded combination sequences ∑0≤i<t1t┌Ut>it┐ and ∑0≤i<t∑F∈E1t┌Ut⋅[F∗t]>it┐ have provably equal values for all t, so by Theorem 4.5.4 in the logical induction paper, Affine Provability Induction, the prices on these sequences will limit to being equal. By the definition of expected value, the prices on these sequences are the prices of Et(Ut) and ∑F∈EEt(Ut⋅[F∗t]), respectively.

Now to prove sublinear regret.

Theorem 1:If there is an embedder E in the hypothesis class of an ADT agent which is up against an environment U such that1:1t∑i≤tEt(Ut)−Ut≃t0 (utility is learned)2:1t∑i≤tEt(Et(amEt))−Et(amEt)≃t0 (argmax behavior of E is learned)3:indδt(∑F∈EPt(F∗t)⋅|Et(Et(ADTt)|F∗t)−Et(Ut|F∗t)|<δt)≃t1(E passes the reality filter)4:indδ′t(Pt(E∗t)⋅|Et(Et(amEt))−Et(Et(ADTt)|E∗t)|<δ′t)≃t1(E passes the EDT=CDT filter),then the ADT agent on the true environmentUhas sublinear regret relative to the argmax agent on E, ie, limsupt→∞1t∑i≤tEt(amEt)−Ut≤0.Proof:To begin with, by applying assumptions 1 and 2, we only need to prove that limsupt→∞1t∑i≤tEt(Et(amEt))−Et(Ut)≤0. In fact, we will prove a stronger result, that Et(Ut)≥tEt(Et(amEt)) (which immediately implies the desired result).Assume this is false, so infinitely often Et(Et(amEt))−ϵ>Et(Ut) for some ϵ. Also, let Dϵt be the set of embedders with greater than ϵ3|E| probability of being chosen on turn t. By Lemma 1,

Et(Ut)≃t∑F∈EPt(F∗t)⋅Et(Ut|F∗t)

∑F∈EPt(F∗t)⋅Et(Ut|F∗t)≥∑F∈DϵtPt(F∗t)⋅Et(Ut|F∗t)

In case 1, ∃F∈Dϵt that clearly fails the filters (ie, one of the two filters has the quantity inside of the filter be above 3δtor 3δ′t, as the case may be). Also Et(Et(amEt))>ϵ3. Then in this case, it is possible for a trader to bet against F being selected, because clearly failing a filter can be picked out by a continuous indicator function. F won't be selected because its value is 0, due to failing both filters, and E passes both filters by conditions 3 and 4, and claims that it can attain value bounded away from 0. This bet allows the trader to pick up a freeϵ3|E| dollars or more (due to the probability of F being selected being bounded away from 0 due to being in Dϵt). Because this is a logical inductor, this case can only occur finitely many times.

In case 2, ∃F∈Dϵt that clearly fails a filter, also Et(Et(amEt))≤ϵ3. In that case the regret is at most ϵ3. Therefore, all but finitely many of the times whereEt(Et(amEt))−ϵ>Et(Ut) occur in case 3.

Now, let D′ϵt⊆Dϵt be the set of plausible environments that don't clearly fail a filter. In case 3, these two sets are equal. By the reality-filters not being clearly failed, and the reality-filters enforcing the approximate equality of Et(Ut|F∗t) and Et(Ft(ADTt)|F∗t), for all F∈D′ϵt and δt limiting to 0, we get that

∑F∈D′ϵtPt(F∗t)⋅Et(Ut|F∗t)≃t∑F∈D′ϵtPt(F∗t)⋅Et(Ft(ADTt)|F∗t)

Then, by the CDT=EDT filters not being clearly failed, which enforce the approximate equality of Et(Ft(ADTt)|F∗t) and Et(Ft(amEt)), we get that

∑F∈D′ϵtPt(F∗t)⋅Et(Ft(ADTt)|F∗t)≃t∑F∈D′ϵtPt(F∗t)⋅Et(Ft(amFt))

Now, by conditions 3 and 4, the filters for E must limit to 1, so on all but finitely many turns, for all F∈D′ϵt, Et(Ft(amFt))>Et(Et(amEt))−ϵ3 (otherwise it is possible to bet against F being chosen and get free money by the argument from case 1). Thus, we get that

∑F∈D′ϵtPt(F∗t)⋅Et(Ft(amFt))≥t∑F∈D′ϵtPt(F∗t)⋅Et(Et(amEt))−ϵ3

Because we are in case 3 where D′ϵt=Dϵt, and the probabilities of the embedders being selected approximately add up to 1, and the total probability mass outside of Dϵt is at most ϵ3 by the definition of Dϵt, we get that

∑F∈D′ϵtPt(F∗t)⋅Et(Et(amEt))−ϵ3≥tEt(Et(amEt))−2ϵ3

And putting it all together, we get that Et(Ut)≥tEt(Et(amEt))−2ϵ3 , which contradicts the assumption that Et(Ut)<Et(Et(amEt))−ϵ infinitely often.

Therefore, Et(Ut)≥tEt(Et(amEt)) holds, which implies the theorem by assumptions 1 and 2.

Additional Discussion and Generalizing to Expanding Hypothesis Spaces:There's an interesting feature of this. In the proof, only part of the reality-filter was actually used, and the part that was used can be combined with the CDT=EDT filter to yield a single filter that suffices to prove sublinear regret on the true environment, which is:

Pt(E∗t)⋅|Et(Et(amEt))−Et(Ut|E∗t)|<δ′t

However, this doesn't enforce that embedders accurately predict what happens when other embedders are selected by ADT, so it glosses over important information which can be used to rule out an embedder.

The obvious next question is whether it's possible to generalize the proof to a space of hypotheses that

expandsover time. We might hope to get a result that, for all computable environments that are "sufficiently nice", there is an ADT agent with a sufficiently slow-growing hypothesis space that will eventually learn to act in accordance with the true environment. This runs into several obstacles.Obstacle 1:The proof of Lemma 1 fails because the batch of statements of the form ∑0≤i<t∑F∈E1t┌Ut⋅[F∗t]>it┐ stops being a bounded combination sequence (because, all in all, the combination is composed of |E| shares in total, which isn't bounded when E grows over time).Obstacle 2:The argument for Case 1 occuring finitely often (that failure allows you to pump free money out and if this recurs infinitely often you get infinite money) fails, because the amount of money you can pump out is ϵ3|E|, but when |E| grows over time, the money you can pump out lessens over time, in a way that may force convergence. I suspect it's possible to address this by shooting for a proof that even when the hypothesis space grows over time, it's possible to exploit the market unless the total probability mass on embedders that clearly fail their filters limits to zero.Obstacle 3:The asymptotic equality linking ∑F∈D′ϵtPt(F∗t)⋅Et(Ut|F∗t) and ∑F∈D′ϵtPt(F∗t)⋅Et(Ft(amFt)) occurs because the difference between those two is bounded by |E|(δt+δ′t), which limits to zero because δt and δ′t limit to zero. When |E| grows over time, it requires δt and δ′t to shrink faster to get this to work out. So if you're adding one new hypothesis each turn, then δt and δ′t must shrink faster than 1t. And if δt and δ′t shrink too fast, then the true environment can't pass the reality filter either. So this means |E| can't grow too fast.Obstacle 4:This appears to be the largest obstacle. It'sreallyhard to guarantee that the true environment passes the reality-filter. Even with a finite hypothesis space, the reality-filter "score" does limit to zero (although I haven't checked the proof particularly well), but going for any result along the lines of "this rate of δt approaching zero/level of strictness for the reality-filter is passed by the true environment" causes the proof to fail. It might even be uncomputably slow for all I know. As far as I can tell, this problem gets even worse when the size of the hypothesis class grows over time, and it might cause a failure of the original proof of the reality-filter "score" limiting to zero. It seems like the hard part is being shoved into the true environment being able topass the filters in the first place.And limits on how fast δt can converge to zero translate directly into limits on how fast |E| can grow by the third obstacle.