# 4

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

The waterfall-based approaches to the Löbian obstacle offer a way around finitely-terminating sequences of trust which we get by adding towers of soundness schemas to . This creates a kind of illusion of self-trust by way of a non-well-founded chain of trust.

Another familiar situation where we are normally faced with the ability to construct arbitrarily high towers but not a single self-referential system is that of truth predicates. Tarski's undefinability theorem blocks the existence of a full truth predicate within the same language as the one which it describes. Perhaps a similar waterfall construction can be applied, to get an infinite descending chain of languages.

Extend the language of with a family of truth predicates . A Tarski-style approach would assert a T-schema for which contain truth predicates indexed strictly lower than . ( is the Gödel number of .) Here, we wish to flip this, and assert a T-schema which allows strictly higher .

This brings to mind Yablo's Paradox. A contradiction can likely be worked out in a way resembling that, but instead I'll note that this theory implies the naive soundness waterfall in which we construct a sequence of theories . This is because we can use the truth predicate to carry out a proof of soundness for the axioms and inference rules, with the exception of instances of the T-schema involving . This gives us the naive soundness waterfall, which we know to be inconsistent. (Note that I have not checked this in detail, however.)

My idea for fixing the T-schema, then, is to introduce the same predicate which asserts that is not the Gödel number of a proof of contradiction in . We make the new schema:

• , where contains only .

Because we can prove any particular , we can still apply the schema in specific cases. It seems likely that we can still carry out soundness arguments, as well, constructing the consistent version of the soundness waterfall. If so, the theory ends up being unsound as a result.

Here's a proof that the theory is unsound. Consider again Yablo's paradox. We construct an infinite sequence of statements, each of which assert that the subsequent statements in the sequence are all false. Specifically:

Considering any particular , we see that it implies , and also . By an application of the T-schema, however, these two statements are just the negation of each other. Therefore, the theory proves . The choice of was generic, so we can see that the system eventually proves every sentence in the sequence false. From outside, we can see that this implies that each of them is true, however.

If the system proposed here can also carry out that reasoning, then it will be inconsistent.

Even if it is consistent, it's still unsound, so it's unlikely to be very useful. It would be interesting if truth-predicate versions of the other solutions to the Löbian obstacle could be constructed. (My intuition is that this won't be possible for the consistency waterfall, but is likely possible for model polymorphism.)