The need to relativise in debate

Simon Marshall

LESSWRONG
LW

The need to relativise in debate — LessWrong

Summary: This post highlights the need for results in AI safety, such as debate or scalable oversight, to 'relativise', i.e. for the result to hold even when all parties are given access to a black box 'oracle' (the oracle might be a powerful problem solver, a random function, or a model of arbitrary human preferences). Requiring this relativisation can often significantly change the complexity class associated with a given protocol and therefore the promise of a given method. We present a few examples of this change and the solutions required to restore the methods back to full power.

Goal: self-contained intuition

This is a purely expositional post describing some of the intuition for why we need relativisation to apply interactive proof techniques to AI safety, and what that does to the necessary proof systems. No new research results, but I previously hadn’t written this story down in a clean way, so now I have something to point to.

Interactive proof systems and IP = PSPACE

An interactive proof system consists of a verifier and one or more provers $P$ , such that the provers are trying to convince the verifier of answers to questions (formally, usually membership of strings $x$ in some language $L$ ). The simplest proof system is the non-interactive one: a single prover produces a proof string $y = P (x)$ , and the verifier checks the proof as $V (x, y)$ . This corresponds to the complexity class $N P$ .

There are more complicated proof systems, reaching much higher complexity classes. With polynomially many rounds of interaction, a single prover can reach the complexity class $P S P A C E$ , which is the celebrated result IP = PSPACE ( $I P$ = interactive proofs). However, this result requires re-coding the input problem as a statement about polynomials over finite fields, then using randomness to do polynomial identity testing in the finite field case. For example, if our $P S P A C E$ problem is the alternating Boolean circuit $\exists x_{0} \forall y_{0} \dots \exists x_{m} \forall y_{n} ϕ (x, y)$ where $ϕ (x, y)$ is a circuit, we form an interactive proof by replacing $x \land y \to x y, x \lor y \to 1 - (1 - x) (1 - y)$ , etc. (and doing some trickery to lower degrees, which I will ignore for this sketch). The prover then makes claims about what the summary polynomials are if we marginalise over some suffix of the variables. If the finite field is large enough, a single randomly chosen point gives us a lot of evidence about a summary being correct over the entire space, via the Schwartz-Zippel lemma.

Why we want relativisation ( ${I P}^{O} < {P S P A C E}^{O}$ )

Unfortunately, this proof breaks down if we give both verifier and prover access to a certain random oracle; in this case, ${I P}^{O} < {P S P A C E}^{O}$ with probability one (w.r.t. to the choice of oracle). The reason is that we can no longer use finite fields: after the encoding, the verifier would be required to answer questions about $ϕ^{O} (x, y)$ evaluated at points corresponding to finite field elements, which requires evaluating the oracle at finite field points, and the oracle is not defined there. (A random oracle might be able to answer these questions too, but it is very unlikely that its answers will be consistent!) A sufficiently powerful prover can still do the summary calculations, but the interaction breaks down.

The reason this is relevant for safety is that our verifiers are humans, and humans look like oracles. We can ask humans questions, but we don’t see the human as a Boolean circuit, and certainly can’t lift the human up into finite field space. Thus, we arrive at this heuristic:

Relativisation heuristic: In order to be relevant for AI safety purposes, an interactive proof result must relativise. (This is mostly equivalent to saying we care about the behaviour of interactive proof systems w.r.t. to a random or generic oracle.)

This is only a heuristic, not least because I haven’t defined what a “non-relativising result” is, exactly. It’s possible someone will come up with an application that breaks the heuristic, and indeed that could be good: as often non-relativising techniques correspond to stronger methods. The heuristic is also restricted to the “messy” setting where we use something like humans as the verifier: if stronger verifiers are available, such as Amit et al., Models That Prove Their Own Correctness, we can use different techniques.

And even if the heuristic holds, it is only necessary, not sufficient. Using an oracle given to both verifier and prover models human intuition and limits to human intuition, but it does not model limits to machine intuition. Brown-Cohen and Irving, Debates, oracles, and obfuscated arguments describes how to model the latter, using two oracles instead of one.

Relativisation requires more provers: $D E B A T E = P S P A C E$

If our verifier $V$ is limited to polynomial time and we want to reach the complexity class $P S P A C E$ with our interactive proof system, we can recover relativisation by going from one prover to two competing provers.

Theorem ( $D E B A T E = P S P A C E$ ): Interactive proofs with a polynomial time verifier and two zero-sum competing provers corresponds to the complexity class $P S P A C E$ , w.r.t. to any oracle.

Trivial proof: $T Q B F$ (totally quantified Boolean formulas) are a complete problem for $P S P A C E$ . These look like $\exists x_{0} \forall y_{0} \dots \exists x_{n} \forall y_{n} ϕ (x, y)$ , and thus correspond exactly to a two-prover game where one player chooses the x’s and the other chooses the y’s. This correspondence is unchanged if the $ϕ = ϕ^{O}$ makes calls to an oracle.

This proof is essentially just the definition of $P S P A C E$ thought of as an interactive proof system, and thus it relativises. In summary, if we want to reach $P S P A C E$ , we need just one prover if we don’t care about relativisation, and two if we care about relativisation.

TODO: Here we’ve used two competing provers playing a zero-sum game. Is it possible to do this with cooperating provers, in some way that doesn’t go all the way to $N E X P$ ? It seems unlikely but we should figure this out.

$N E X P$ : From two cooperating provers to two competing provers

Now say we’re being more ambitious, and want to get to $N E X P$ (nondeterministic exponential time). The standard complete problem for $N E X P$ is $S u c c i n c t - S A T$ : we write down a polynomial-size circuit that describes an exponentially large Boolean circuit, then ask whether it is satisfiable.

In the non-relativising case, Babai, Fortnow, Lund 1991 (following Ben-or, Goldwasser, Kilian, and Wigderson 1988) showed that two potentially colluding provers able to communicate only via the verifier reach any $N E X P$ problem ( $N I P = N E X P$ ). However, this again proceeds by encoding Boolean circuits as finite fields, and fails to relativise as a result: w.r.t. a random oracle, ${M I P}^{O} < {N E X P}^{O}$ .

We can recover relativisation if we (1) make the provers compete and (2) give the provers and verifier oracle access to both provers. This corresponds to the cross-examination proposal of Barnes and Christiano 2020, and the proof that cross examination = NEXP is here. Relative to the MIP setting where the two provers collude, we are making a much stronger assumption of zero-sum optimal play. Relative to debate (which is already two competing provers), the oracle access allows one debater to ask a sequence of questions of the other debate, to which the other debate must give independent answers: they don’t see the answers they gave earlier in the sequence. This forces the debaters to precommit to a large object (in this case the full exponential size solution to the $S u c c i n c t - S A T$ expanded circuit), of which we show only a small component to the verifier.

Can we go further? ( ${M I P}^{*} = R E$ and pointer silliness)

$N E X P$ is enormous (though amusingly not big enough to find Grobner bases, you need $E X P S P A C E$ for that). We can go all the way to top, to all computable things done in any amount of time and a polynomial time verifier, using multiple interacting provers with an arbitrarily large number of entangled qubits as their only method of communicating (except through the verifier). This is the very silly ${M I P}^{*} = R E$ result of Ji et al. 2020 (silly in a good way), where $R E$ is any recursive computation (any language you can resolve with a Turing machine, ignoring time and space complexity).

MIP* = RE does not relativise, since again it uses finite field encoding (I don’t know if it algebrises). However, you can replace the ${M I P}^{*}$ setup using two quantum-entangled provers with two competing provers with (1) the ability to commit to an arbitrarily large two-dimensional table, (2) the ability to share “pointers” into this table with each other and the verifier, and (3) the ability to increment pointers in both directions in a verifiable manner. The pointer idea is from Paul Christiano, and Evan has more discussion here. Note that the pointers necessarily require an arbitrary number of bits even to represent, so the verifier can’t be directly shown that verification. Pointers are simple as an idea, but still all of this is very silly: it will not in fact be the case that computers have an unboundedly large internal state.

However, the point of this final silly section is to get one more row in a table of non-relativising vs. relativising interactive proof systems:

Desired complexity class	Non-relativising requirements	Relativising requirements
$P S P A C E$	One (malicious?) prover	Two competing provers
$N E X P$	Two (colluding?) provers who can’t communicate	Two competing provers with oracle access to each other
$R E$	Two (colluding?) provers with shared qubits	Two competing provers with pointers to arbitrary-size state

There are other rows one could fill in!

Collaborate with us

We'd be very excited to collaborate on further research. If you're interested in collaborating with UK AISI, you can express interest here. If you're a non-profit or academic, you can also apply for grants up to £200,000 directly here.

LESSWRONG
LW

LESSWRONG
LW

31

The need to relativise in debate

31

Ω 14

31

Ω 14

Goal: self-contained intuition

Interactive proof systems and IP = PSPACE

Why we want relativisation ( ${I P}^{O} < {P S P A C E}^{O}$ )

Relativisation requires more provers: $D E B A T E = P S P A C E$

$N E X P$ : From two cooperating provers to two competing provers

Can we go further? ( ${M I P}^{*} = R E$ and pointer silliness)

Collaborate with us

31

The need to relativise in debate

31

Ω 14

31

Ω 14

Goal: self-contained intuition

Interactive proof systems and IP = PSPACE

Why we want relativisation (IPO<PSPACEO)

Relativisation requires more provers: DEBATE=PSPACE

NEXP: From two cooperating provers to two competing provers

Can we go further? (MIP∗=RE and pointer silliness)

Collaborate with us

Why we want relativisation ( ${I P}^{O} < {P S P A C E}^{O}$ )

Relativisation requires more provers: $D E B A T E = P S P A C E$

$N E X P$ : From two cooperating provers to two competing provers

Can we go further? ( ${M I P}^{*} = R E$ and pointer silliness)