It would also be interesting to investigate which of those protocols are guaranteed to eventually halt on all inputs.
I can see that in last protocol (relativized for RE) a malicious prover can prevent verifier from ever terminating, if the verifier is required to accept all valid move sequences / pointers / ... however long they might be. (That is a mathematical theorem and not fixed by choosing a clever encoding for numbers of arbitrary length.)
I think both protocols mentioned (MIP* = RE and the pointers one) already do what you want here. In the background the provers have to do unbounded work to prepare for the stuff they show the verifier, but the verifier's work is limited to a fixed polynomial in the input size.
And more strongly: in the pointer version where we have two competing provers, a malicious prover can't force an honest prover to do significantly more work than would be required in an honest case.
Summary: This post highlights the need for results in AI safety, such as debate or scalable oversight, to 'relativise', i.e. for the result to hold even when all parties are given access to a black box 'oracle' (the oracle might be a powerful problem solver, a random function, or a model of arbitrary human preferences). Requiring this relativisation can often significantly change the complexity class associated with a given protocol and therefore the promise of a given method. We present a few examples of this change and the solutions required to restore the methods back to full power.
This is a purely expositional post describing some of the intuition for why we need relativisation to apply interactive proof techniques to AI safety, and what that does to the necessary proof systems. No new research results, but I previously hadn’t written this story down in a clean way, so now I have something to point to.
An interactive proof system consists of a verifier and one or more provers , such that the provers are trying to convince the verifier of answers to questions (formally, usually membership of strings in some language ). The simplest proof system is the non-interactive one: a single prover produces a proof string , and the verifier checks the proof as . This corresponds to the complexity class .
There are more complicated proof systems, reaching much higher complexity classes. With polynomially many rounds of interaction, a single prover can reach the complexity class , which is the celebrated result IP = PSPACE ( = interactive proofs). However, this result requires re-coding the input problem as a statement about polynomials over finite fields, then using randomness to do polynomial identity testing in the finite field case. For example, if our problem is the alternating Boolean circuit where is a circuit, we form an interactive proof by replacing , etc. (and doing some trickery to lower degrees, which I will ignore for this sketch). The prover then makes claims about what the summary polynomials are if we marginalise over some suffix of the variables. If the finite field is large enough, a single randomly chosen point gives us a lot of evidence about a summary being correct over the entire space, via the Schwartz-Zippel lemma.
Unfortunately, this proof breaks down if we give both verifier and prover access to a certain random oracle; in this case, with probability one (w.r.t. to the choice of oracle). The reason is that we can no longer use finite fields: after the encoding, the verifier would be required to answer questions about evaluated at points corresponding to finite field elements, which requires evaluating the oracle at finite field points, and the oracle is not defined there. (A random oracle might be able to answer these questions too, but it is very unlikely that its answers will be consistent!) A sufficiently powerful prover can still do the summary calculations, but the interaction breaks down.
The reason this is relevant for safety is that our verifiers are humans, and humans look like oracles. We can ask humans questions, but we don’t see the human as a Boolean circuit, and certainly can’t lift the human up into finite field space. Thus, we arrive at this heuristic:
This is only a heuristic, not least because I haven’t defined what a “non-relativising result” is, exactly. It’s possible someone will come up with an application that breaks the heuristic, and indeed that could be good: as often non-relativising techniques correspond to stronger methods. The heuristic is also restricted to the “messy” setting where we use something like humans as the verifier: if stronger verifiers are available, such as Amit et al., Models That Prove Their Own Correctness, we can use different techniques.
And even if the heuristic holds, it is only necessary, not sufficient. Using an oracle given to both verifier and prover models human intuition and limits to human intuition, but it does not model limits to machine intuition. Brown-Cohen and Irving, Debates, oracles, and obfuscated arguments describes how to model the latter, using two oracles instead of one.
If our verifier is limited to polynomial time and we want to reach the complexity class with our interactive proof system, we can recover relativisation by going from one prover to two competing provers.
Theorem (): Interactive proofs with a polynomial time verifier and two zero-sum competing provers corresponds to the complexity class , w.r.t. to any oracle.
Trivial proof: (totally quantified Boolean formulas) are a complete problem for . These look like , and thus correspond exactly to a two-prover game where one player chooses the x’s and the other chooses the y’s. This correspondence is unchanged if the makes calls to an oracle.
This proof is essentially just the definition of thought of as an interactive proof system, and thus it relativises. In summary, if we want to reach , we need just one prover if we don’t care about relativisation, and two if we care about relativisation.
TODO: Here we’ve used two competing provers playing a zero-sum game. Is it possible to do this with cooperating provers, in some way that doesn’t go all the way to ? It seems unlikely but we should figure this out.
Now say we’re being more ambitious, and want to get to (nondeterministic exponential time). The standard complete problem for is : we write down a polynomial-size circuit that describes an exponentially large Boolean circuit, then ask whether it is satisfiable.
In the non-relativising case, Babai, Fortnow, Lund 1991 (following Ben-or, Goldwasser, Kilian, and Wigderson 1988) showed that two potentially colluding provers able to communicate only via the verifier reach any problem (). However, this again proceeds by encoding Boolean circuits as finite fields, and fails to relativise as a result: w.r.t. a random oracle, .
We can recover relativisation if we (1) make the provers compete and (2) give the provers and verifier oracle access to both provers. This corresponds to the cross-examination proposal of Barnes and Christiano 2020, and the proof that cross examination = NEXP is here. Relative to the MIP setting where the two provers collude, we are making a much stronger assumption of zero-sum optimal play. Relative to debate (which is already two competing provers), the oracle access allows one debater to ask a sequence of questions of the other debate, to which the other debate must give independent answers: they don’t see the answers they gave earlier in the sequence. This forces the debaters to precommit to a large object (in this case the full exponential size solution to the expanded circuit), of which we show only a small component to the verifier.
is enormous (though amusingly not big enough to find Grobner bases, you need for that). We can go all the way to top, to all computable things done in any amount of time and a polynomial time verifier, using multiple interacting provers with an arbitrarily large number of entangled qubits as their only method of communicating (except through the verifier). This is the very silly result of Ji et al. 2020 (silly in a good way), where is any recursive computation (any language you can resolve with a Turing machine, ignoring time and space complexity).
MIP* = RE does not relativise, since again it uses finite field encoding (I don’t know if it algebrises). However, you can replace the setup using two quantum-entangled provers with two competing provers with (1) the ability to commit to an arbitrarily large two-dimensional table, (2) the ability to share “pointers” into this table with each other and the verifier, and (3) the ability to increment pointers in both directions in a verifiable manner. The pointer idea is from Paul Christiano, and Evan has more discussion here. Note that the pointers necessarily require an arbitrary number of bits even to represent, so the verifier can’t be directly shown that verification. Pointers are simple as an idea, but still all of this is very silly: it will not in fact be the case that computers have an unboundedly large internal state.
However, the point of this final silly section is to get one more row in a table of non-relativising vs. relativising interactive proof systems:
Desired complexity class | Non-relativising requirements | Relativising requirements |
One (malicious?) prover | Two competing provers | |
Two (colluding?) provers who can’t communicate | Two competing provers with oracle access to each other | |
Two (colluding?) provers with shared qubits | Two competing provers with pointers to arbitrary-size state |
There are other rows one could fill in!
We'd be very excited to collaborate on further research. If you're interested in collaborating with UK AISI, you can express interest here. If you're a non-profit or academic, you can also apply for grants up to £200,000 directly here.