No License To Be Human


18


Eliezer_Yudkowsky

Followup toYou Provably Can't Trust Yourself

Yesterday I discussed the difference between:

  • A system that believes—is moved by—any specific chain of deductions from the axioms of Peano Arithmetic.  (PA, Type 1 calculator)
  • A system that believes PA, plus explicitly asserts the general proposition that PA is sound.  (PA+1, meta-1-calculator that calculates the output of Type 1 calculator)
  • A system that believes PA, plus explicitly asserts its own soundness.  (Self-PA, Type 2 calculator)

These systems are formally distinct.  PA+1 can prove things that PA cannot.  Self-PA is inconsistent, and can prove anything via Löb's Theorem.

With these distinctions in mind, I hope my intent will be clearer, when I say that although I am human and have a human-ish moral framework, I do not think that the fact of acting in a human-ish way licenses anything.

I am a self-renormalizing moral system, but I do not think there is any general license to be a self-renormalizing moral system.

And while we're on the subject, I am an epistemologically incoherent creature, trying to modify his ways of thinking in accordance with his current conclusions; but I do not think that reflective coherence implies correctness.

Let me take these issues in reverse order, starting with the general unlicensure of epistemological reflective coherence. 

If five different people go out and investigate a city, and draw five different street maps, we should expect the maps to be (mostly roughly) consistent with each other.  Accurate maps are necessarily consistent among each other and among themselves, there being only one reality.  But if I sit in my living room with my blinds closed, I can draw up one street map from my imagination and then make four copies: these five maps will be consistent among themselves, but not accurate. Accuracy implies consistency but not the other way around.

In Where Recursive Justification Hits Bottom, I talked about whether "I believe that induction will work on the next occasion, because it's usually worked before" is legitimate reasoning, or "I trust Occam's Razor because the simplest explanation for why Occam's Razor often works is that we live in a highly ordered universe".  Though we actually formalized the idea of scientific induction, starting from an inductive instinct; we modified our intuitive understanding of Occam's Razor (Maxwell's Equations are in fact simpler than Thor, as an explanation for lightning) based on the simple idea that "the universe runs on equations, not heroic mythology".  So we did not automatically and unthinkingly confirm our assumptions, but rather, used our intuitions to correct them—seeking reflective coherence.

But I also remarked:

"And what about trusting reflective coherence in general?  Wouldn't most possible minds, randomly generated and allowed to settle into a state of reflective coherence, be incorrect?  Ah, but we evolved by natural selection; we were not generated randomly."

So you are not, in general, safe if you reflect on yourself and achieve internal coherence.  The Anti-Inductors who compute that the probability of the coin coming up heads on the next occasion, decreases each time they see the coin come up heads, may defend their anti-induction by saying:  "But it's never worked before!"

The only reason why our human reflection works, is that we are good enough to make ourselves better—that we had a core instinct of induction, a core instinct of simplicity, that wasn't sophisticated or exactly right, but worked well enough.

A mind that was completely wrong to start with, would have no seed of truth from which to heal itself.  (It can't forget everything and become a mind of pure emptiness that would mysteriously do induction correctly.)

So it's not that reflective coherence is licensed in general, but that it's a good idea if you start out with a core of truth or correctness or good priors.  Ah, but who is deciding whether I possess good priors?  I am!  By reflecting on them!  The inescapability of this strange loop is why a broken mind can't heal itself—because there is no jumping outside of all systems.

I can only plead that, in evolving to perform induction rather than anti-induction, in evolving a flawed but not absolutely wrong instinct for simplicity, I have been blessed with an epistemic gift.

I can only plead that self-renormalization works when I do it, even though it wouldn't work for Anti-Inductors.  I can only plead that when I look over my flawed mind and see a core of useful reasoning, that I am really right, even though a completely broken mind might mistakenly perceive a core of useful truth.

Reflective coherence isn't licensed for all minds.  It works for me, because I started out with an epistemic gift.

It doesn't matter if the Anti-Inductors look over themselves and decide that their anti-induction also constitutes an epistemic gift; they're wrong, I'm right.

And if that sounds philosophically indefensible, I beg you to step back from philosophy, and conside whether what I have just said is really truly true.

(Using your own concepts of induction and simplicity to do so, of course.)

Does this sound a little less indefensible, if I mention that PA trusts only proofs from the PA axioms, not proofs from every possible set of axioms?  To the extent that I trust things like induction and Occam's Razor, then of course I don't trust anti-induction or anti-Occamian priors—they wouldn't start working just because I adopted them.

What I trust isn't a ghostly variable-framework from which I arbitrarily picked one possibility, so that picking any other would have worked as well so long as I renormalized it.  What I trust is induction and Occam's Razor, which is why I use them to think about induction and Occam's Razor.

(Hopefully I have not just licensed myself to trust myself; only licensed being moved by both implicit and explicit appeals to induction and Occam's Razor.  Hopefully this makes me PA+1, not Self-PA.)

So there is no general, epistemological license to be a self-renormalizing factual reasoning system.

The reason my system works is because it started out fairly inductive—not because of the naked meta-fact that it's trying to renormalize itself using any system; only induction counts.  The license—no, the actual usefulness—comes from the inductive-ness, not from mere reflective-ness.  Though I'm an inductor who says so!

And, sort-of similarly, but not exactly analogously:

There is no general moral license to be a self-renormalizing decision system.  Self-consistency in your decision algorithms is not that-which-is-right.

The Pebblesorters place the entire meaning of their lives in assembling correct heaps of pebbles and scattering incorrect ones; they don't know what makes a heap correct or incorrect, but they know it when they see it.  It turns out that prime heaps are correct, but determining primality is not an easy problem for their brains.  Like PA and unlike PA+1, the Pebblesorters are moved by particular and specific arguments tending to show that a heap is correct or incorrect (that is, prime or composite) but they have no explicit notion of "prime heaps are correct" or even "Pebblesorting People can tell which heaps are correct or incorrect". They just know (some) correct heaps when they see them, and can try to figure out the others.

Let us suppose by way of supposition, that when the Pebblesorters are presented with the essence of their decision system—that is, the primality test—they recognize it with a great leap of relief and satisfaction.  We can spin other scenarios—Peano Arithmetic, when presented with itself, does not prove itself correct.  But let's suppose that the Pebblesorters recognize a wonderful method of systematically producing correct pebble heaps.  Or maybe they don't endorse Adleman's test as being the essence of correctness—any more than Peano Arithmetic proves that what PA proves is true—but they do recognize that Adleman's test is a wonderful way of producing correct heaps.

Then the Pebblesorters have a reflectively coherent decision system.

But this does not constitute a disagreement between them and humans about what is right, any more than humans, in scattering a heap of 3 pebbles, are disagreeing with the Pebblesorters about which numbers are prime!

The Pebblesorters are moved by arguments like "Look at this row of 13 pebbles, and this row of 7 pebbles, arranged at right angles to each other; how can you see that, and still say that a heap of 91 pebbles is correct?"

Human beings are moved by arguments like "Hatred leads people to play purely negative-sum games, sacrificing themselves and hurting themselves to make others hurt still more" or "If there is not the threat of retaliation, carried out even when retaliation is profitless, there is no credible deterrent against those who can hurt us greatly for a small benefit to themselves".

This is not a minor difference of flavors.  When you reflect on the kind of arguments involved here, you are likely to conclude that the Pebblesorters really are talking about primality, whereas the humans really are arguing about what's right.  And I agree with this, since I am not a moral relativist.  I don't think that morality being moral implies any ontologically basic physical rightness attribute of objects; and conversely, I don't think the lack of such a basic attribute is a reason to panic.

I may have contributed to the confusion here by labeling the Pebblesorters' decisions "p-right".  But what they are talking about is not a different brand of "right".  What they're talking about is prime numbers.  There is no general rule that reflectively coherent decision systems are right; the Pebblesorters, in merely happening to implement a reflectively coherent decision system, are not yet talking about morality!

It's been suggested that I should have spoken of "p-right" and "h-right", not "p-right" and "right".

But of course I made a very deliberate decision not to speak of "h-right".  That sounds like there is a general license to be human.

It sounds like being human is the essence of rightness.  It sounds like the justification framework is "this is what humans do" and not "this is what saves lives, makes people happy, gives us control over our own lives, involves us with others and prevents us from collapsing into total self-absorption, keeps life complex and non-repeating and aesthetic and interesting, dot dot dot etcetera etcetera".

It's possible that the above value list, or your equivalent value list, may not sound like a compelling notion unto you.  Perhaps you are only moved to perform particular acts that make people happy—not caring all that much yet about this general, explicit, verbal notion of "making people happy is a value".  Listing out your values may not seem very valuable to you.  (And I'm not even arguing with that judgment, in terms of everyday life; but a Friendly AI researcher has to know the metaethical score, and you may have to judge whether funding a Friendly AI project will make your children happy.)  Which is just to say that you're behaving like PA, not PA+1.

And as for that value framework being valuable because it's human—why, it's just the other way around: humans have received a moral gift, which Pebblesorters lack, in that we started out interested in things like happiness instead of just prime pebble heaps.

Now this is not actually a case of someone reaching in from outside with a gift-wrapped box; any more than the "moral miracle" of blood-soaked natural selection producing Gandhi, is a real miracle.

It is only when you look out from within the perspective of morality, that it seems like a great wonder that natural selection could produce true friendship.  And it is only when you look out from within the perspective of morality, that it seems like a great blessing that there are humans around to colonize the galaxies and do something interesting with them.  From a purely causal perspective, nothing unlawful has happened.

But from a moral perspective, the wonder is that there are these human brains around that happen to want to help each other—a great wonder indeed, since human brains don't define rightness, any more than natural selection defines rightness.

And that's why I object to the term "h-right".  I am not trying to do what's human.  I am not even trying to do what is reflectively coherent for me.  I am trying to do what's right.

It may be that humans argue about what's right, and Pebblesorters do what's prime.  But this doesn't change what's right, and it doesn't make what's right vary from planet to planet, and it doesn't mean that the things we do are right in mere virtue of our deciding on them—any more than Pebblesorters make a heap prime or not prime by deciding that it's "correct".

The Pebblesorters aren't trying to do what's p-prime any more than humans are trying to do what's h-prime.  The Pebblesorters are trying to do what's prime.  And the humans are arguing about, and occasionally even really trying to do, what's right.

The Pebblesorters are not trying to create heaps of the sort that a Pebblesorter would create (note circularity).  The Pebblesorters don't think that Pebblesorting thoughts have a special and supernatural influence on whether heaps are prime.  The Pebblesorters aren't trying to do anything explicitly related to Pebblesorters—just like PA isn't trying to prove anything explicitly related to proof.  PA just talks about numbers; it took a special and additional effort to encode any notions of proof in PA, to make PA talk about itself.

PA doesn't ask explicitly whether a theorem is provable in PA, before accepting it—indeed PA wouldn't care if it did prove that an encoded theorem was provable in PA.  Pebblesorters don't care what's p-prime, just what's prime.  And I don't give a damn about this "h-rightness" stuff: there's no license to be human, and it doesn't justify anything.

 

Part of The Metaethics Sequence

Next post: "Invisible Frameworks"

Previous post: "You Provably Can't Trust Yourself"