The Thing That I Protect


17


Eliezer_Yudkowsky

Followup toSomething to Protect, Value is Fragile

"Something to Protect" discursed on the idea of wielding rationality in the service of something other than "rationality".  Not just that rationalists ought to pick out a Noble Cause as a hobby to keep them busy; but rather, that rationality itself is generated by having something that you care about more than your current ritual of cognition.

So what is it, then, that I protect?

I quite deliberately did not discuss that in "Something to Protect", leaving it only as a hanging implication.  In the unlikely event that we ever run into aliens, I don't expect their version of Bayes's Theorem to be mathematically different from ours, even if they generated it in the course of protecting different and incompatible values.  Among humans, the idiom of having "something to protect" is not bound to any one cause, and therefore, to mention my own cause in that post would have harmed its integrity.  Causes are dangerous things, whatever their true importance; I have written somewhat on this, and will write more about it.

But still - what is it, then, the thing that I protect?

Friendly AI?  No - a thousand times no - a thousand times not anymore.  It's not thinking of the AI that gives me strength to carry on even in the face of inconvenience.

I would be a strange and dangerous AI wannabe if that were my cause - the image in my mind of a perfected being, an existence greater than humankind.  Maybe someday I'll be able to imagine such a child and try to build one, but for now I'm too young to be a father.

Those of you who've been following along recent discussions, particularly "Value is Fragile", might have noticed something else that I might, perhaps, hold precious.  Smart agents want to protect the physical representation of their utility function for almost the same reason that male organisms are built to be protective of their testicles.  From the standpoint of the alien god, natural selection, losing the germline - the gene-carrier that propagates the pattern into the next generation - means losing almost everything that natural selection cares about.  Unless you already have children to protect, can protect relatives, etcetera - few are the absolute and unqualified statements that can be made in evolutionary biology - but still, if you happen to be a male human, you will find yourself rather protective of your testicles; that one, centralized vulnerability is why a kick in the testicles hurts more than being hit on the head.

To lose the pattern of human value - which, for now, is physically embodied only in the human brains that care about those values - would be to lose the Future itself; if there's no agent with those values, there's nothing to shape a valuable Future.

And this pattern, this one most vulnerable and precious pattern, is indeed at risk to be distorted or destroyed.  Growing up is a hard problem either way, whether you try to edit existing brains, or build de novo Artificial Intelligence that mirrors human values.  If something more powerful than humans, and not sharing human values, comes into existence - whether by de novo AI gone wrong, or augmented humans gone wrong - then we can expect to lose, hard.  And value is fragile; losing just one dimension of human value can destroy nearly all of the utility we expect from the future.

So is that, then, the thing that I protect?

If it were - then what inspired me when times got tough would be, say, thinking of people being nice to each other.  Or thinking of people laughing, and contemplating how humor probably exists among only an infinitesimal fraction of evolved intelligent species and their descendants.  I would marvel at the power of sympathy to make us feel what others feel -

But that's not quite it either.

I once attended a small gathering whose theme was "This I Believe".  You could interpret that phrase in a number of ways; I chose "What do you believe that most other people don't believe which makes a corresponding difference in your behavior?"  And it seemed to me that most of how I behaved differently from other people boiled down to two unusual beliefs.  The first belief could be summarized as "intelligence is a manifestation of order rather than chaos"; this accounts both for my attempts to master rationality, and my attempt to wield the power of AI.

And the second unusual belief could be summarized as:  "Humanity's future can be a WHOLE LOT better than its past."

Not desperately darwinian robots surging out to eat as much of the cosmos as possible, mostly ignoring their own internal values to try and grab as many stars as possible, with most of the remaining matter going into making paperclips.

Not some bittersweet ending where you and I fade away on Earth while the inscrutable robots ride off into the unknowable sunset, having grown beyond such merely human values as love or sympathy.

Screw bittersweet.  To hell with that melancholy-tinged crap.  Why leave anyone behind?  Why surrender a single thing that's precious?

(And the compromise-futures are all fake anyway; at this difficulty level, you steer precisely or you crash.)

The pattern of fun is also lawful.  And, though I do not know all the law - I do think that written in humanity's value-patterns is the implicit potential of a happy future.  A seriously goddamn FUN future.  A genuinely GOOD outcome.  Not something you'd accept with a sigh of resignation for nothing better being possible.  Something that would make you go "WOOHOO!"

In the sequence on Fun Theory, I have given you, I hope, some small reason to believe that such a possibility might be consistently describable, if only it could be made real.  How to read that potential out of humans and project it into reality... might or might not be as simple as "superpose our extrapolated reflected equilibria".  But that's one way of looking at what I'm trying to do - to reach the potential of the GOOD outcome, not the melancholy bittersweet compromise.  Why settle for less?

To really have something to protect, it has to be able to bring tears to your eyes.  That, generally, requires something concrete to visualize - not just abstract laws.  Reading the Laws of Fun doesn't bring tears to my eyes.  I can visualize a possibility or two that makes sense to me, but I don't know if it would make sense to others the same way.

What does bring tears to my eyes?  Imagining a future where humanity has its act together.  Imagining children who grow up never knowing our world, who don't even understand it.  Imagining the rescue of those now in sorrow, the end of nightmares great and small.  Seeing in reality the real sorrows that happen now, so many of which are unnecessary even now.  Seeing in reality the signs of progress toward a humanity that's at least trying to get its act together and become something more - even if the signs are mostly just symbolic: a space shuttle launch, a march that protests a war.

(And of course these are not the only things that move me.  Not everything that moves me has to be a Cause.  When I'm listening to e.g. Bach's Jesu: Joy of Man's Desiring, I don't think about how every extant copy might be vaporized if things go wrong.  That may be true, but it's not the point.  It would be as bad as refusing to listen to that melody because it was once inspired by belief in the supernatural.)

To really have something to protect, you have to be able to protect it, not just value it.  My battleground for that better Future is, indeed, the fragile pattern of value.  Not to keep it in stasis, but to keep it improving under its own criteria rather than randomly losing information.  And then to project that through more powerful optimization, to materialize the valuable future.  Without surrendering a single thing that's precious, because losing a single dimension of value could lose it all.

There's no easy way to do this, whether by de novo AI or by editing brains.  But with a de novo AI, cleanly and correctly designed, I think it should at least be possible to get it truly right and win completely.  It seems, for all its danger, the safest and easiest and shortest way (yes, the alternatives really are that bad).  And so that is my project.

That, then, is the service in which I wield rationality.  To protect the Future, on the battleground of the physical representation of value.  And my weapon, if I can master it, is the ultimate hidden technique of Bayescraft - to explicitly and fully know the structure of rationality, to such an extent that you can shape the pure form outside yourself - what some call "Artificial General Intelligence" and I call "Friendly AI".  Which is, itself, a major unsolved research problem, and so it calls into play the more informal methods of merely human rationality.  That is the purpose of my art and the wellspring of my art.

That's pretty much all I wanted to say here about this Singularity business...

...except for one last thing; so after tomorrow, I plan to go back to posting about plain old rationality on Monday.