Building on the very bad Gödel anti-AI argument (computers's are formal and can't prove their own Gödel sentence, hence no AI), it occurred to me that you could make a strong case that humans could never recognise a human Gödel sentence. The argument goes like this:

  1. Humans have a meta-proof that all Gödel sentences are true.
  2. If humans could recognise a human Gödel sentence G as being a Gödel sentence, we would therefore prove it was true.
  3. This contradicts the definition of G, which humans should never be able to prove.
  4. Hence humans could never recognise that G was a human Gödel sentence.

Now, the more usual way of dealing with human Gödel sentences is to say that humans are inconsistent, but that the inconsistency doesn't blow up our reasoning system because we use something akin to relevance logic.

But, if we do assume humans are consistent (or can become consistent), then it does seem we will never knowingly encounter our own Gödel sentences. As to where this G could hide and we could never find it? My guess would be somewhere in the larger ordinals, up where our understanding starts to get flaky.

33 comments, sorted by Click to highlight new comments since: Today at 2:23 AM
New Comment

Maaay-be, but I'm not at all convinced this is the right way to think about this. Suppose that we can become consistent, and suppose that there is some program P that could in principle be written down that simulates a humanity that has become consistent, and human mathematicians are going through all possible theorems in lexicographical order, and someone is writing down in a simple format every theorem humanity has found to be true -- so simple a format that given the output of program P, you can easily find in it the list of theorems found so far. Thus, there is a relatively simple recursive enumeration of all theorems this version of humanity will ever find -- just run P and output a theorem as soon as it appears on the list.

Now suppose humanity does figure out that it's living in a simulation, and figures out the source code of P. Then it knows its own Gödel sentence. Will this lead to a logical contradiction?

Well, if humanity does write down "our Gödel sentence is true" on the List, then that sentence will be false (because it says something equivalent to "this sentence does not appear on the List"). That actually doesn't make humanity inconsistent, but it makes it wrong (unsound), which we'll take to be just as bad (and we'll get inconsistent if we also write down all logical consequences of this statement + the Peano axioms). So what if humanity does not write it down on the list? Then the statement is true, and it's obvious to us that it's true. Not a logical contradiction -- but it does mean that humanity recognizes some things as true it isn't writing down on its list.

Paul Christiano has written some posts related to this (which he's kindly pointed out to me when I was talking about some related things with him :-)). One way I think about this is by the following thought experiment: Suppose you're taking part in an experiment supposedly testing your ability to understand simple electric circuits. You're put in a room with two buttons, red and green, and above them two LEDs, red and green. When you press one of the buttons, one of the lights will light up; your task is to predict which light will light up and press the button of the same color. The circuit is wired as follows: Pressing the red button makes the green light light up; pressing the green button makes the red light light up. I maintain that you can be able to understand perfectly how the mechanism works, and yet not be able to "prove" this by completing the task.

So, I'd say that "what humanity wrote on its list" isn't a correct formalization of "what humanity understands to be true". Now, it does seem possible that there still is a formal predicate that captures the meaning of "humanity understands proposition A to be true". If so, I'd expect the problem with recognizing it not to be that it talks about something like large ordinals, but that it talks about the individual neurons in every human mind, and is far too long for humans to process sensibly. (We might be able to recognize that it says lots of things about human neurons, but not that the exact thing it says is "proposition A will at some point be understood by humanity to be true".) If you try to include "human" minds scaled to arbitrarily large sizes (like 3^^^3 and so on), so that any finite-length statement can be comprehended by some large-enough mind, my guess would be that there is no finite-length predicate that can accurately reflect "mind M understands proposition A to be true" for arbitrarily large "human" minds M, though I won't claim strong confidence in this guess.

You shouldn't include things we know only by experience as part of our theoretical system, for the purpose of "the human Godel sentence." At best learning a theorem from experience would add an axiom, but then our Godel sentence changes. So if we knew our Godel sentence it would become something else.

Now suppose humanity does figure out that it's living in a simulation, and figures out the source code of P. Then it knows its own Gödel sentence.

Wait, why does that follow?

I postulated that "given the output of program P, you can easily find in it the list of theorems found so far" -- by which I meant that it's easy to write a program that takes the output of P until step t, and returns everything written on the list up to time t (was the confusion that it wasn't clear that this was what I meant?). If you also know the source of P, you have a program that for every t returns the list up to time t, so it's easy to write down the predicate L(n) of PA that says "there is some time t such that the proposition with Gödel number n appears on the list at time t." By the diagonal lemma, there is a sentence G such that

PA |- G <-> not L(the Gödel number of G).

G is humanity's Gödel sentence, and there is no trouble in writing it down inside the simulation, if you know the source of P and the source of the program that reads the list from P's output.

(Well, technically, G is one Gödel sentence, and there could be other ways to write it that are harder to recognize, but one recognizable Gödel sentence should be enough for the "no AI" proof to go through if it were a well-formed argument at all, and I don't think Stuart's claim is just that there are some obfuscated ways to write a Gödel sentence that are unrecognizable.)

Everyone seems to be taking the phrase "human Gödel sentence" (and, for that matter, "the Gödel sentence of a turing machine") as if its widely understood, so perhaps it's a piece of jargon I'm not familiar with. I know what the Gödel sentence of a computably enumerable theory is, which is the usual formulation. And I know how to get from a computably enumerable theory to the Turing machine which outputs the statements of that theory. But not every Turing machine is of this form, so I don't know what it means to talk about the Gödel sentence of an arbitrary Turing machine. For instance, what is the Gödel sentence of a universal Turing machine?

Some posters seem to be taking the human Gödel number to mean something like the Gödel number of the collection of things that person will ever believe, but the collection of things a person will ever believe has absolutely no need to be consistent, since people can (and should!) sometimes change their mind.

(This is primarily an issue with the original anti-AI argument; I don't know how defenders of that argument clarify their definitions.)

The thing is that the proof for Gödel's theorem is constructive. We have an algorithm to construct Gödel sentences from Axioms. So basically the only way we can be unable to recognize our Gödel sentences is being unable to recognize our axioms.

We won't be able to recognise the human Gödel sentence... and remain strictly human.

Fixed it for you! :)

[-][anonymous]10y 4

I think it's perfectly reasonable to encounter your Gödel sentence and say you don't know it's true. You just have to think "this is only true if the logic I use is consistent. I'd like to think it is, but I don't actually have an ironclad proof of that (Luckily for me - if I did have such a proof, I'd know I was inconsistent). So I can't actually prove this statement"

That would only work if you also say "there must be an error somewhere in Gödel's proof".

[-][anonymous]10y 3

Gödel's proof assumes the consistency of the theory that G is being created for. It's not an error in the proof, but it's not an assumption you're allowed to make when talking about your own logical system.

"Stuart Armstrong does not believe this sentence."

Yeah, I think that you got it right. The Gödel sentence of a person must be a thought that that person can think, and it will refer to that person's thought processes. So something like "You don't believe this" or "You don't understand this idea".

"Stuart Armstrong does not believe this sentence."

Aw, I happen to have a bit of difficulty in figuring out what proposition that desugars to in the language of Peano Arithmetic, could you help me out? :-)

(The serious point being, we know that you can write self-contradictory statements in English and we don't expect to be able to assign consistent truth-values to them, but the statements of PA or the question whether a given Turing machine halts seem to us to have well-defined meaning, and if human-level intelligence is computable, it seems at least at first as if we should be able to encode "Stuart Armstrong believes proposition A" as a statement of PA. But the result won't be anywhere as easily recognizable to him as what you wrote.)

But that sentence isn't self-contradictory like "This is a lie", it is just self-referential, like "This sentence has five words". It does have a well-defined meaning and is decidable for all hypothetical consistent people other than hypothetical consitentified Stuart Armstrong.

You're right, I didn't think that one through, thanks!

I still think the interesting thing is the potential for writing down a mathematical statement humanity can't decide, not an English one that we can't decide even though it is meaningful, but I'll shut up about the question for now.

Doesn't that allow for Stuart Armstrong to have a false belief?

"Stuart Armstrong cannot justify a belief in this sentence" might be a stronger form since it implicitly refers to any logical or rational reasoning that might be used to justify the belief.

There are some obvious weaknesses to human Godel sentences. Stuart Armstrong is not a permanent name. Additionally our biology changes over time so we probably can't have permanent personal Godel sentences.

Theorem 1: Stuart Armstrong cannot prove this sentence is true. (G-SA)

Proof (by contradiction): Suppose not-G-SA is true, that is, Stuart Armstrong can prove G-SA. Then by G-SA, we would have a contradiction. So not-G-SA is false (or our logic contains a contradiction, in which case the principle of explosion applies).

Theorem 2: Cyan cannot prove this sentence is true. (G-Cyan)

Proof left as an exercise for the reader.

Actually human Godel sentences are quite easy to construct.

For example, I can't prove that I'm not an idiot.

If I'm not an idiot, then I can perhaps make an argument that I'm not an idiot that seems reasonable to me, and that may persuade that I'm not an idiot.

However, if I am an idiot, then I can still perhaps make an argument that I'm not an idiot that seems reasonable to me.

Therefore any argument that I might make on whether I'm an idiot or not does not determine which of the two above states is the case. Whether I'm an idiot or not is therefore unprovable under my system.

You can't even help me. You might choose to inform me that I am / am not an idiot. I still have to decide whether you are a reasonable authority to decide the matter, and that question runs into the same problem - if I decide that you are, I may have decided so as an idiot, and therefore still have no definitive answer.

You cannot win, you can only say "I am what I am" and forget about it.

Doesn't this argument presume some sort of Platonism? If F is a formal system that fully captures the computational resources of our brains, then a Godel sentence for F, G, will be true in some models of the axioms of F and false in others. When you say that we have a meta-proof that G is true, you must mean that we have a meta-proof that G is true in the intended model. But how do we pick out the intended model? After all, our intentions are presumably also beholden to the computational limitations of our brains. How could we possibly distinguish between the different models of F in order to say "I'm talking about this model, not that one"?

If you're a Platonist, perhaps you can get around this by positing that only one of the many models of F actually obtains, in the sense that only the numbers that appear in this model actually exist. So while we can't pick out this model as the intended model from the inside, perhaps there's some externalist story about how this model is picked out and other models (in which G is false) are not. Maybe there's a quasi-causal connection between our brain and the numbers which makes it the case that our mathematical statements are about those numbers and not others. But Platonism, especially the sort of Platonism proposed here, is implausible.

Is there any other way to make sense of the claim that our mathematical claims are about a model in which G is true, even though there is no computational procedure by which we could pick out this model?

If we can model the standard natural numbers, then it seems we're fine - the number godel-encoding a proof would actually correspond to a proof, we don't need to worry further about models.

If we can't pick out the standard natural numbers, we can't say that any Godel sentences are true, even for very simple formal systems. All we can say is that they are unprovable from within that system.

If my brain is a Turing machine, doesn't it pretty much follow that I can't pick out the standard model? How would I do that?

I've never got a fully satisfactory answer to this. Basically the natural numbers are (informally) a minimal model of peano arithmetic - you can never have any model "smaller" than them.

And it may be possible to fomarlise this. Take the second order peano axioms. Their model is entirely dependent on the model of set theory.

Let M be a model of set theory. Then I wonder whether there can be models M' and N of set theory, such that: there exists a function mapping every set of M to set in M' that preserves the set theoretic properties. This function is an object in N.

Then the (unique) model of the second order peano axioms in M' must be contained inside the image of model in M. This allows us to give an inclusion relationship between second order peano models in different models of set theory. Then it might be that the standard natural numbers are the unique minimal element in this inclusion relationship. If that's the case, then we can isolate them.

Then it might be that the standard natural numbers are the unique minimal element in this inclusion relationship.

Why would we care about the smallest model? Then, we'd end up doing weird things like rejecting the axiom of choice in order to end up with fewer sets. Set theorists often actually do the opposite.

Generally speaking, the model of Peano arithmetic will get smaller as the model of set theory gets larger.

And the point is not to prefer smaller or larger models; the point is to see if there is a unique definition of the natural numbers.

[-][anonymous]10y 1

Humans have a meta-proof that all Gödel sentences are true.

That's not true. Gödel sentences are true for (most?) consistent systems, and false for all inconsistent systems.

Truth and falsity in an inconsistent system is a bit of a weird concept.

To be more specific: Gödel sentences are true within inconsistent systems (because of explosion) but if we give them a meta-interpretation then they are false about inconsistent systems.

[-][anonymous]10y 0

Truth within a consistent system is a bit of a weird concept as well. Is the continuum hypothesis true within ZFC? The question isn't actually meaningful; the continuum hypothesis is true within some models of ZFC, and false within others.

Humans have a meta-proof that all Gödel sentences are true.

I thought the whole point was that our "proof" of the (supposed) truth of the Gödel statement is via a different (weaker?) notion of proof than the logical system in which it's stated. So we can have a meta-proof of it without having an arithmetic proof, and regard that as "good enough". (Likewise, computers can have stronger and weaker thresholds for believing statements.)

With that said, I've long been interested in the topic of whether there are pseudo-Gödel inputs that can make humans "crash".

Here- have a general case Gödel sentence.

"This sentence cannot be proven true within any sound system which can express it."

To prove that false, it is necessary and sufficient to show that there is a sound system which can prove that it is true.

There is nothing like the Goedel theorem inside a finite world, in which we operate/live.

There is nothing like the Goedel theorem inside a finite world, in which we operate/live.

This assumes the universe is finite. But aside from that it has two serious problems:

First, finiteness doesn't save you from undecidability.

Second, if in fact the world is finite we get even worse situations. Let for example (n) be your favorite fast growing computable function, say f(n)=A(n,n) where A is the Ackermann function. Consider a question like "does f(10)+2 have an even number of distinct prime factors"? It is likely then that this question is not answerable in our universe even though it is essentially a trivial question from the standpoint of what questions can be answered in Peano Arithmetic.

New to LessWrong?