Previously, I proved that imitative amplification with weak HCH, approval-based amplification, and recursive reward modeling access PSPACE while AI safety via market making accesses EXP. At the time, I wasn't sure whether my market making proof would generalize to the others, so I just published it with the PSPACE proofs instead. However, I have since become convinced that the proof does generalize—and that it generalizes for all of the proposals I mentioned—such that imitative amplification with weak HCH, approval-based amplification, and recursive reward modeling all actually access EXP. This post attempts to prove that.
First, since l∈EXP, we know that for any x∈X, Tl(x) halts in O(2poly(n)) steps where n=|x|. Thus, we can construct a function fl(n)=c1+c2ec3nc4 such that for all x∈X, Tl(x) halts in less than or equal to fl(x) steps by picking c3,c4 large enough that they dominate all other terms in the polynomial for all n∈N. Note that fl is then computable in time polynomial in n.
Second, let H's new strategy be as follows:
Given p, let s,x=M(p:f(|x|)). Then, return accept/reject based on whether s is an accept or reject state (it will always be one or the other)
Given p:0, return s0,x0 where s0 is the starting state and x0 is the empty tape symbol.
Given p:i, let
s,x=M(p:i−1)xleft=M(p:i−1:−1)xright=M(p:i−1:1).
Then, return s′,x′ where s′ is the next state of Tl from state s over symbol x (where if we're already in an accept/reject state we just stay there) and x′ is the new tape symbol at the head (which can be determined given s,x,xleft,xright).
Given p:0:j, return x0.
Given p:i:j, let s,xhead=M(p:i−1). Then, let h be the amount by which Tl on s,xhead changes the head position by (such that h∈{−1,0,1}). If j+h=0, let s′,x′=M(p:i−1) otherwise let x′=M(p:i−1:j+h). Return x′.
Note that this strategy precisely replicates the strategy used in my proof that market making accesses EXP for inputs p:i and p:i:j. Thus, I'll just defer to that proof for why the strategy works and is polynomial time on those inputs. Note that this is where l∈EXP becomes essential, as it guarantees that i and j can be represented in polynomial space.
Then, the only difference between this strategy and the market making one is for the base p input. On p, given that M(p:i) works, M(p:f(|x|)) will always return a state after Tl has halted, which will always be an accept state if Tl accepts and a reject state if it rejects. Furthermore, since |M(p:f(|x|)|∈O(1) and f is computable in polynomial time, the strategy for p is polynomial, as desired.
Since this procedure works for all l∈EXP, we get that amplification with weak HCH accesses EXP, as desired.
This post is a follow-up to my “Alignment proposals and complexity classes” post. Thanks to Sam Eisenstat for helping with part of the proof here.
Previously, I proved that imitative amplification with weak HCH, approval-based amplification, and recursive reward modeling access PSPACE while AI safety via market making accesses EXP. At the time, I wasn't sure whether my market making proof would generalize to the others, so I just published it with the PSPACE proofs instead. However, I have since become convinced that the proof does generalize—and that it generalizes for all of the proposals I mentioned—such that imitative amplification with weak HCH, approval-based amplification, and recursive reward modeling all actually access EXP. This post attempts to prove that.
Updated list of proposals by complexity class
P: Imitation learning (trivial)
PSPACE: AI safety via debate (proof)
EXP: AI safety via market making (proof), Imitative amplification with weak HCH (proof below), Approval-based amplification (proof below), Recursive reward modeling (proof below)
NEXP: Debate with cross-examination (proof)
R: Imitative amplification with strong HCH (proof), AI safety via market making with pointers (proof)
Proofs
Imitative amplification with weak HCH accesses EXP
The proof here is similar in structure to my previous proof that weak HCH accesses PSPACE, so I'll only explain where this proof differs from that one.
First, since l∈EXP, we know that for any x∈X, Tl(x) halts in O(2poly(n)) steps where n=|x|. Thus, we can construct a function fl(n)=c1+c2ec3nc4 such that for all x∈X, Tl(x) halts in less than or equal to fl(x) steps by picking c3, c4 large enough that they dominate all other terms in the polynomial for all n∈N. Note that fl is then computable in time polynomial in n.
Second, let H's new strategy be as follows:
Note that this strategy precisely replicates the strategy used in my proof that market making accesses EXP for inputs p:i and p:i:j. Thus, I'll just defer to that proof for why the strategy works and is polynomial time on those inputs. Note that this is where l∈EXP becomes essential, as it guarantees that i and j can be represented in polynomial space.
Then, the only difference between this strategy and the market making one is for the base p input. On p, given that M(p:i) works, M(p:f(|x|)) will always return a state after Tl has halted, which will always be an accept state if Tl accepts and a reject state if it rejects. Furthermore, since |M(p:f(|x|)|∈O(1) and f is computable in polynomial time, the strategy for p is polynomial, as desired.
Since this procedure works for all l∈EXP, we get that amplification with weak HCH accesses EXP, as desired.
Approval-based amplification accesses EXP
The proof here is almost precisely the same as my previous proof that approval-based amplification accesses PSPACE with the only modification being that we need to verify that |M(q)|+|H(q,M)| are still polynomial in size for the new imitative amplification strategy given above. However, the fact that i and j are bounded above by tp means that they are representable in polynomial space, as desired.
Recursive reward modeling accesses EXP
Again, the proof here is almost precisely the same as my previous proof that recursive reward modeling accesses PSPACE with the only modification being the same as with the above proof that approval-based amplification accesses EXP.