How to eliminate cut?

jessicata

The purpose of this post isn't to convince you that cut elimination is important. See, for example, the nLab article. Rather, the purpose of this post is to (semi-formally) prove cut elimination in a way that I at least find easy to understand. I have consulted existing sources (such as these lecture notes), but have found it hard to fill in all the details, given the sparsity of the usual presentations. I'll build on the previous post (on Gödel's Completeness Theorem) and show cut elimination in the first-order sequent calculus defined in that post. Recall that the cut rule states:

We can think of $Γ$ as the assumptions, $Δ$ as the conclusion, and P as a lemma. Intuitively, this states that, if it's possible to prove the conclusion or the lemma from the assumptions, and it's possible to prove the conclusion from the assumptions and the lemma, then it's possible to prove the conclusions from the assumptions. Cut-elimination is, therefore, the automated elimination of lemmas in a sequent proof. (As stated in the previous post, this presentation of the cut rule is somewhat nonstandard, but it can be shown equivalent to the standard form using weakening and contraction.)

Throughout the post, I will use the notion of the depth of a sentence, and the cut rank of a proof. The depth of a sentence is the depth of nesting of compound sentences; in particular, the depth of an atomic sentence is 0, the depth of a negation is one plus the depth of its inner sentence, the depth of a conjunction is one plus the maximum depth of the inner sentences, and the depth of a universal is one plus the depth of the inner sentence. The cut rank of a proof is a mapping $f : N \to N$ , where $f (i)$ is the number of times a cut is performed on a sentence of depth i; note that this is zero almost everywhere. We compare cut ranks lexicographically, with later entries counting more than previous ones.

Constant substitution

As a preliminary, we will show that constants can be substituted with terms in proofs without changing the proof structure (in particular, cut rank stays the same). As notation, if P is a term, sentence, or set of sentences, let P[t/c] indicate replacing the constant c with the term t anywhere in P. Suppose we have a proof of the judgment $Γ ⊢ Δ$ . We wish to show that there is a proof of $Γ [t / c] ⊢ Δ [t / c]$ with the same cut rank as the original proof.

Call the height of a sequent proof the longest path from top to bottom, counting by number of rule applications. I will show by induction that, for all natural $n \geq 1$ , constant substitution holds for a proof whose height is n.

In the base case, the only rule is the assumption rule. Then $Γ$ and $Δ$ both contain some sentence P. So $Γ [t / c]$ and $Δ [t / c]$ both contain $P [t / c]$ . So the assumption rule also shows $Γ [t / c] ⊢ Δ [t / c]$ .

In the inductive case, we consider different cases for the bottom-most rule. Suppose the bottom-most rule in the proof is the weakening rule. Then the proof looks like:

$\frac{Γ ⊢ Δ}{Γ \cup Σ ⊢ Δ \cup Σ}$

By the inductive assumption, we have a proof of $Γ [t / c] ⊢ Δ [t / c]$ . Then we straightforwardly show $Γ [t / c] \cup Σ [t / c] ⊢ Δ [t / c] \cup Σ [t / c]$ using weakening.

Suppose the bottom-most rule in the proof is the cut rule. Then the proof looks like:

$\frac{Γ ⊢ Δ, P Γ, P ⊢ Δ}{Γ ⊢ Δ}$

By the inductive assumption, we have proofs of $Γ [t / c] ⊢ Δ [t / c], P [t / c]$ and $Γ [t / c], P [t / c] ⊢ Δ [t / c]$ . Now we cut on $P [t / c]$ to get the result.

Suppose the bottom-most rule in the proof is the left negation rule. Then the proof looks like:

$\frac{Γ ⊢ Δ, P}{Γ, \neg P ⊢ Δ}$

By the inductive assumption, we have a proof of $Γ [t / c] ⊢ Δ [t / c], P [t / c]$ . We apply the left negation rule on $P [t / c]$ to get a proof of $Γ [t / c], \neg P [t / c] ⊢ Δ [t / c]$ .

Most of the remaining rules are similar, so I will skip them. I will consider the non-trivial case of the right universal rule. In this case, the proof looks like this:

$\frac{Γ ⊢ Δ, ϕ [d]}{Γ ⊢ Δ, (\forall x, ϕ [x])}$

where d is a constant not appearing in $Γ$ , $Δ$ , or $ϕ$ . Let d' be a constant not appearing in $Γ$ , $Δ$ , $ϕ$ , or t, and not equal to c. First we apply the inductive assumption to get a proof of $Γ [d^{'} / d] ⊢ Δ [d^{'} / d], ϕ [d] [d^{'} / d]$ or equivalently $Γ ⊢ Δ, ϕ [d^{'}]$ . Now we apply the inductive assumption again to get a proof of $Γ [t / c] ⊢ Δ [t / c], ϕ [d^{'}] [t / c]$ . Since d' does not appear in t and is unequal to c, we can swap the substitution order to get a proof of $Γ [t / c] ⊢ Δ [t / c], ϕ [t / c] [d^{'}]$ . At this point, since d' does not appear in $Γ [t / c], Δ [t / c]$ , or $ϕ [t / c]$ , we can apply the right universal rule to get a proof of $Γ [t / c] ⊢ Δ [t / c], (\forall x, ϕ [x]) [t / c]$ .

Eliminating weakening

It will be easier to show cut elimination in a logic without weakening. So it is more convenient to eliminate weakening before eliminating cut. This has the added benefit of eliminating weakening in addition to cut. Recall the weakening rule:

$\frac{Γ ⊢ Δ}{Γ \cup Σ ⊢ Δ \cup Π}$

I will show by induction that, for all natural $n \geq 1$ , weakening can be eliminated for a proof whose height is n+1, and whose last step is weakening.

Let's consider the base case. If the proof has height 2, and the bottom-most rule is weakening, then the top-most rule must be the assumption rule. In this case, the assumption rule could have been applied to the pre-weakened judgment.

Let's consider the inductive case. Suppose weakening can be eliminated form any proof whose height is at most n and whose last step is weakening. We now consider showing weakening can be eliminated from a proof whose height is n+1 and whose last step is weakening.

We do this by cases on the second-to-last rule. We have no need to handle the assumption rule, as that would make the height 2 (the base case).

Suppose the second-to-last rule is weakening. Then the two weakenings can be combined into one weakening. This reduces the height of the proof by one, so weakening can be eliminated inductively.

Suppose the second-to-last rule is cut. Then the proof looks like this:

$\frac{Γ ⊢ Δ, P Γ, P ⊢ Δ}{\frac{Γ ⊢ Δ}{Γ \cup Σ ⊢ Δ \cup Π}}$

Call the proof of the top-left judgment X and the proof of the top-right judgment Y. Then X and Y have height at most n-1. Now we consider re-writing the proof to put weakening higher:

$\frac{\frac{Γ ⊢ Δ, P}{Γ \cup Σ ⊢ Δ \cup Π, P} \frac{Γ, P ⊢ Δ}{Γ \cup Σ, P ⊢ Δ \cup Π}}{Γ \cup Σ ⊢ Δ \cup Π}$

The left proof of $Γ \cup Σ ⊢ Δ \cup Π, P$ has height at most n, and the right proof of $Γ \cup Σ ⊢ Δ \cup Π$ has height at most n. So weakening can be eliminated from both sides (using the inductive assumption).

Suppose the second-to-last rule is left negation. Then the proof looks like this:

$\frac{\frac{Γ ⊢ Δ, P}{Γ, \neg P ⊢ Δ}}{Γ \cup Σ, \neg P ⊢ Δ \cup Π}$

As before, we re-write to move weakening higher:

$\frac{\frac{Γ ⊢ Δ, P}{Γ \cup Σ ⊢ Δ \cup Π, P}}{Γ \cup Σ, \neg P ⊢ Δ \cup Π}$

And observe that the size of the proof with weakening at the bottom is now at most n, so weakening can be eliminated from it inductively.

I will skip most of the rules, as they are similar. The only nontrivial case is the right universal rule. The proof would look like this:

$\frac{\frac{Γ ⊢ Δ, ϕ [c]}{Γ ⊢ Δ, (\forall x, ϕ [x])}}{Γ \cup Σ ⊢ Δ \cup Π, (\forall x, ϕ [x])}$

where c does not appear in $Γ$ , $Δ$ , or $ϕ$ . Now we find a constant d which does not appear in $Γ$ , $Σ$ , $Δ$ , $Π$ , or $ϕ$ . We move weakening up:

$\frac{\frac{Γ ⊢ Δ, ϕ [d]}{Γ \cup Σ ⊢ Δ \cup Π, ϕ [d]}}{Γ \cup Σ ⊢ Δ \cup Π, (\forall x, ϕ [x])}$

We can convert the original proof of $Γ ⊢ Δ, ϕ [c]$ to one of equal height and cut rank proving $Γ ⊢ Δ, ϕ [d]$ using constant substitution. Now weakening can be eliminated from this proof using the inductive assumption.

Note that throughout this process, the structure of cuts has not been changed; the same cuts are applied to the same sentences. As such, the cut rank is the same.

As a corollary of weakening elimination, we can transform proofs so that, if a rule application is of the form

$\frac{Σ ⊢ Π}{Γ ⊢ Δ}$

then $Γ \subseteq Σ$ and $Δ \subseteq Π$ . This is because the non-weakening rules, such as the negation rules, have "implicit contraction" where there is no requirement to eliminate any sentence, and weakening elimination means these extra sentences in judgments are not a problem (as they could be eliminated with weakening anyway, and then the weakenings could be eliminated). I will call this transformation "redundant contraction". Note also that this does not change the cut rank of the proof.

Making the assumption rule only apply to atoms

Recall that an atomic sentence is a predicate applied to some terms. The assumption rule may apply to arbitrary sentences. We would like to transform sequent proofs to ones that only apply the assumption rule to atomic sentences.

To do this, we will consider proving judgments of the form $Γ, P ⊢ Δ, P$ without using the assumption rule except on atomic sentences. We will do this by induction on the structure of P.

Now we consider what form P could take. If P is atomic, we simply apply the assumption rule. Suppose P is $\neg Q$ . Then we prove the judgment as follows:

$\frac{Γ, Q ⊢ Δ, Q}{Γ, \neg Q ⊢ Δ, \neg Q}$

with the top judgment proven by the inductive assumption.

Suppose P is $Q \land R$ . Then we prove the judgment as follows:

$\frac{Γ, Q, R ⊢ Δ, Q Γ, Q, R ⊢ Δ, R}{\frac{Γ, Q, R ⊢ Δ, Q \land R}{Γ, Q \land R ⊢ Δ, Q \land R}}$

with the top judgments proven by the inductive assumption.

Suppose P is $(\forall x, ϕ [x])$ . Then we prove the judgment as follows:

$\frac{Γ, ϕ [c] ⊢ Δ, ϕ [c]}{\frac{Γ, (\forall x, ϕ [x]) ⊢ Δ, ϕ [c]}{Γ, (\forall x, ϕ [x]) ⊢ Δ, (\forall x, ϕ [x])}}$

with the top judgment proven by the inductive assumption, and where c is a constant not appearing in $Γ$ , $Δ$ , or $ϕ$ .

The inversion lemma

The rules for compound sentences are, for the most part, invertible, in that if the bottom judgment is provable with no cuts, so is the top judgment. I will show invertibility for these rules, assuming no weakening and that the assumption rule only applies to atoms.

In general, these proofs will work by applying redundant contraction to the proof of the bottom judgment and observing that the proof steps work for a modified version of the judgments, except for certain rule applications. Note that we intentionally omit the left universal rule, as it is not invertible like the others. It will instead be handled manually later.

A property that will be true throughout is that, if the original proof has no cuts, neither does the inverted proof.

Left negation

Consider the left negation rule:

$\frac{Γ ⊢ Δ, P}{Γ, \neg P ⊢ Δ}$

Suppose the bottom judgment is provable. Apply redundant contraction to the proof. We will do induction over the proof to show that each sub-proof of a judgment can be converted to one of a converted form of the judgment, where $\neg P$ is removed on the left and P is added to the right. Every step in the proof will convert automatically except for instances of the left negation rule applied to $\neg P$ . Those cases originally look like

$\frac{Σ, \neg P ⊢ Π, P}{Σ, \neg P ⊢ Π}$

and in the conversion we are trying to show $Σ ⊢ Π, P$ . We can prove this by inductively converting the proof of $Σ, \neg P ⊢ Π, P$ .

Overall, the converted proof proves $Γ ⊢ Δ, P$ . And if the original proof has no cuts, neither does the converted proof.

Right negation

Consider the right negation rule:

$\frac{Γ, P ⊢ Δ}{Γ ⊢ Δ, \neg P}$

Suppose the bottom judgment is provable. Symmetric with the left negation case, we convert the proof to a proof of $Γ, P ⊢ Δ$ . And if the original proof has no cuts, neither does the converted proof.

Left conjunction

Consider the left conjunction rule:

$\frac{Γ, P, Q ⊢ Δ}{Γ, P \land Q ⊢ Δ}$

Suppose the bottom judgment is provable. Apply redundant contraction to the proof. We will do induction over the proof to show that each sub-proof of a judgment can be converted to one of a converted form of the judgment, where $P \land Q$ is removed from and P and Q are added to the left. Every step in the proof will convert automatically except for when left conjunction is applied to $P \land Q$ . Those cases look like:

$\frac{Σ, P \land Q, P, Q ⊢ Π}{Σ, P \land Q ⊢ Π}$

and in the conversion we are trying to show $Σ, P, Q ⊢ Π$ . We can prove this by inductively converting the proof of $Σ, P \land Q, P, Q ⊢ Π$ .

Overall, the converted proof proves $Γ, P, Q ⊢ Δ$ , as desired. And if the original proof has no cuts, neither does the converted proof.

Right conjunction

Consider the right conjunction rule:

$\frac{Γ ⊢ Δ, P Γ ⊢ Δ, Q}{Γ ⊢ Δ, P \land Q}$

We will consider proofs of $Γ ⊢ Δ, P$ and $Γ ⊢ Δ, Q$ separately.

First consider $Γ ⊢ Δ, P$ . Suppose the bottom judgment is provable. Apply redundant contraction to the proof. We will do induction over this proof to show that each sub-proof of a judgment can be converted to one of a converted form of the judgment, where $P \land Q$ is removed from and P is added to the right side. Each step of the proof will convert automatically except for applications of the right conjunction rule to $P \land Q$ . Those cases look like:

$\frac{Σ ⊢ Π, P \land Q, P Σ ⊢ Π, P \land Q, Q}{Σ ⊢ Π, P \land Q}$

and in the conversion we are trying to show $Σ ⊢ Π, P$ . We prove this by inductively converting the proof of $Σ ⊢ Π, P \land Q, P$ .

Overall, the converted proof proves $Γ ⊢ Δ, P$ , as desired.

Now consider $Γ ⊢ Δ, Q$ . This is symmetric with the previous case, yielding a converted proof.

In both cases, if the original proof has no cuts, neither does the converted proof.

Right universal

Consider the right universal rule:

$\frac{Γ ⊢ Δ, ϕ [c]}{Γ ⊢ Δ, (\forall x, ϕ [x])}$

where c does not appear in $Γ, Δ, (\forall x, ϕ [x])$ . Suppose the bottom judgment is provable. Apply redundant contraction to this proof. We will do induction over the proof to show that each sub-proof of a judgment can be converted to one of a converted form of the judgment, where $(\forall x, ϕ [x])$ is removed from and $ϕ [c^{'}]$ is added to the right, where c' is a constant appearing nowhere in the proof. Every step will convert automatically except for applications of the right universal rule to $(\forall x, ϕ [x])$ . Those cases look like:

$\frac{Σ ⊢ Π, (\forall x, ϕ [x]), ϕ [d]}{Σ ⊢ Π, (\forall x, ϕ [x])}$

where d is a constant not appearing in $Σ, Π, (\forall x, ϕ [x])$ , and in the conversion we are trying to show $Σ ⊢ Π, ϕ [c^{'}]$ . We inductively convert the proof of $Σ ⊢ Π, (\forall x, ϕ [x]), ϕ [d]$ to get a proof of $Σ ⊢ Π, ϕ [d], ϕ [c^{'}]$ . Then we apply constant substitution to this proof, replacing d with c', to get a proof of $Σ ⊢ Π, ϕ [c^{'}]$ .

Overall, the converted proof proves $Γ ⊢ Δ, ϕ [c^{'}]$ . Now we apply constant substitution again to get a proof of $Γ ⊢ Δ, ϕ [c]$ . And if the original proof has no cuts, neither does the converted proof.

Showing cut elimination

We are now ready to eliminate cut from an arbitrary proof. Assume the proof has no weakening and that the assumption rule is only used on atoms (we have already shown how to convert a proof to one of this form). An instance of the cut rule looks like this:

$\frac{Γ ⊢ Δ, P Γ, P ⊢ Δ}{Γ ⊢ Δ}$

We consider different forms P could take in turn. Each time, we eliminate one instance of cut from the proof (a "cut reduction"), in a way that reduces the cut rank of the overall proof. We only eliminate cuts where the proofs of the premises do not themselves have any cuts; if the proof has at least one cut, a cut exists whose premise proofs don't have any cuts, so this is not an obstacle to the algorithm.

Atomic sentences

Suppose P is atomic. Assume the proofs of $Γ ⊢ Δ, P$ and $Γ, P ⊢ Δ$ are cut-free. Apply redundant contraction to the first proof. Each leaf of this proof now uses the assumption rule to prove $Σ ⊢ Π, P$ where $Γ \subseteq Σ$ and $Δ \subseteq Π$ . Now we consider eliminating P from the right hand side of every judgment in this proof (so the converted "proof" now "proves" $Γ ⊢ Δ$ ); every non-assumption rule can still be applied, but some of the leaves will now fail to be proven with the assumption rule. In those cases, when the judgment of the leaf is $Σ ⊢ Π$ , we know $P \in Σ$ , as the elimination of P from the right caused a failure of the assumption rule. In those cases, it is sufficient to show $Γ, P ⊢ Δ$ , by weakening elimination (since $Γ, P \subseteq Σ$ and $Δ \subseteq Π$ ). But we already have a cut-free proof of this, the original cut-free proof of $Γ, P ⊢ Δ$ . By repairing the leaves, we now have a cut-free proof of $Γ ⊢ Δ$ .

Negations

Suppose $P = \neg Q$ . Then the premises of the cut rule imply we have proofs of $Γ ⊢ Δ, \neg Q$ and $Γ, \neg Q ⊢ Δ$ . Assume these proofs are cut-free. Using invertibility, we can get cut-free proofs of $Γ, Q ⊢ Δ$ and $Γ ⊢ Δ, Q$ . Then apply cut on Q:

$\frac{Γ ⊢ Δ, Q Γ, Q ⊢ Δ}{Γ ⊢ Δ}$

This reduces the cut rank because cut is applied to a simpler sentence.

Conjunctions

Suppose $P = Q \land R$ . Then the premises of the cut rule imply we have proofs of $Γ ⊢ Δ, Q \land R$ and $Γ, Q \land R ⊢ Δ$ . Assume these proofs are cut-free. Using invertibility we can get cut-free proofs of $Γ ⊢ Δ, P$ , $Γ ⊢ Δ, Q$ , and $Γ, P, Q ⊢ Δ$ . Then apply cut twice:

$\frac{Γ ⊢ Δ, P \frac{Γ, P ⊢ Δ, Q Γ, P, Q ⊢ Δ}{Γ, P ⊢ Δ}}{Γ ⊢ Δ}$

This reduces the cut rank because cut is applied to simpler sentences. Note that we can convert the proof of $Γ ⊢ Δ, Q$ to one of $Γ, P ⊢ Δ, Q$ using weakening elimination.

Universals

Suppose $P = (\forall x, ϕ [x])$ . Then the premises of the cut rule imply that we have proofs of $Γ ⊢ Δ, (\forall x, ϕ [x])$ and $Γ, (\forall x, ϕ [x]) ⊢ Δ$ . Assume both these proofs are cut-free, and apply redundant contraction to the second. Using invertibility on the first proof, we can get a cut-free proof of $Γ ⊢ Δ, ϕ [c]$ where c is a constant not appearing in $Γ, Δ, (\forall x, ϕ [x])$ .

We will do induction over the proof of $Γ, (\forall x, ϕ [x]) ⊢ Δ$ to show that each sub-proof of a judgment can be converted to one of a converted form of the judgment, where $(\forall x, ϕ [x])$ is removed from the left, and where we only introduce cuts on sentences of the form $ϕ [t]$ . Each step of the proof will convert automatically except for applications of the left universal rule, of the form

$\frac{Σ, (\forall x, ϕ [x]), ϕ [t] ⊢ Π}{Σ, (\forall x, ϕ [x]) ⊢ Π}$

where $Γ \subseteq Σ$ and $Δ \subseteq Π$ . In the converted proof, we are instead trying to show $Σ ⊢ Π$ . We can prove this by inductively converting the proof of $Σ, (\forall x, ϕ [x]), ϕ [t] ⊢ Π$ to one of $Σ, ϕ [t] ⊢ Π$ , and then applying cut:

$\frac{Σ ⊢ Π, ϕ [t] Σ, ϕ [t] ⊢ Π}{Σ ⊢ Π}$

We can show $Σ ⊢ Π, ϕ [t]$ by applying constant substitution to our cut-free proof of $Γ ⊢ Δ, ϕ [c]$ to get a cut-free proof of $Γ ⊢ Δ, ϕ [t]$ , and then applying weakening elimination.

While we introduce more cuts into the proof, these all apply to a sentences of the form $ϕ [t]$ , which have lower depth than the original universal $(\forall x, ϕ [x])$ , so this still decreases the cut rank.

Summary

To summarize, we first modify our proof to have no weakening and to only apply the assumption rule to atoms. Then we find an instance of cut where the proofs of the premises are cut-free. Depending on what sentence is cut, we find a way to remove this cut, only replacing it with cuts on sentences with lower depth. Overall, this succeeds in reducing the cut rank of the proof. Since the set of cut ranks (assumed to be zero almost everywhere) are well-ordered, this iterative process will eventually eliminate all cuts from the proof.

Conclusion

Cut elimination is a fundamental theorem of formal logic. I have shown cut elimination for the first-order sequent calculus described in the post on Gödel's completeness theorem, which is a simplified form of system LK. Compared to explanations of cut elimination I have found in the literature, this is a relatively complete proof relative to its simplicity. It helps me at least understand how cut elimination can proceed in an algorithmic, syntactic manner on the proof tree. While applications of cut elimination are beyond the scope of this post, understanding the actual proof might help to understand how these applications

LESSWRONG
LW