Why safety is not safe

June 14, 3009

Twilight still hung in the sky, yet the Pole Star was visible above the trees, for it was a perfect cloudless evening.

"We can stop here for a few minutes," remarked the librarian as he fumbled to light the lamp. "There's a stream just ahead."

The driver grunted assent as he pulled the cart to a halt and unhitched the thirsty horse to drink its fill.

It was said that in the Age of Legends, there had been horseless carriages that drank the black blood of the earth, long since drained dry. But then, it was said that in the Age of Legends, men had flown to the moon on a pillar of fire. Who took such stories seriously?

The librarian did. In his visit to the University archive, he had studied the crumbling pages of a rare book in Old English, itself a copy a mere few centuries old, of a text from the Age of Legends itself; a book that laid out a generation's hopes and dreams, of building cities in the sky, of setting sail for the very stars. Something had gone wrong - but what? That civilization's capabilities had been so far beyond those of his own people. Its destruction should have taken a global apocalypse of the kind that would leave unmistakable record both historical and archaeological, and yet there was no trace. Nobody had anything better than mutually contradictory guesses as to what had happened. The librarian intended to discover the truth.

Forty years later he died in bed, his question still unanswered.

The earth continued to circle its parent star, whose increasing energy output could no longer be compensated by falling atmospheric carbon dioxide concentration. Glaciers advanced, then retreated for the last time; as life struggled to adapt to changing conditions, the ecosystems of yesteryear were replaced by others new and strange - and impoverished. All the while, the environment drifted further from that which had given rise to Homo sapiens, and in due course one more species joined the billions-long roll of the dead. For what was by some standards a little while, eyes still looked up at the lifeless stars, but there were no more minds to wonder what might have been.



Were I to submit the above to a science fiction magazine, it would be instantly rejected. It lacks a satisfying climax in which the hero strikes down the villain, for it has neither hero nor villain. Yet I ask your indulgence for a short time, for it may yet possess one important virtue: realism.

The reason we relate to stories with villains is easy enough to understand. In our ancestral environment, if a leopard or an enemy tribesman escaped your attention, failure to pass on your genes was likely. Violence may or may not have been the primary cause of death, depending on time and place; but it was the primary cause that you could do something about. You might die of malaria, you might die of old age, but there was little and nothing respectively that you could do to avoid these fates, so there was no selection pressure to be sensitive to them. There was certainly no selection pressure to be good at explaining the distant past or predicting the distant future.

Looked at that way, it's a miracle we possess as much general intelligence as we do; and certainly our minds have achieved a great deal, and promise more. Yet the question lurks in the background: are there phenomena, not in distant galaxies or inside the atomic nucleus beyond the reach of our eyes but in our world at the same scale we inhabit, nonetheless invisible to us because our minds are not so constructed as to perceive them?

In search of an answer to that question, we may ask another one: why is this document written in English instead of Chinese?

As late as the 15th century, this was by no means predictable. The great civilizations of Europe and China were roughly on par, the former having almost caught up over the previous few centuries; yet Chinese oceangoing ships were arguably still better than anything Europe could build. Fleets under Admiral Zheng He reached as far as East Africa. Perhaps China might have reached the Americas before Europeans did, and the shape of the world might have been very different.

The centuries had brought a share of disasters to both continents. War had ravaged the lands, laying waste whole cities. Plague had struck, killing millions, men, women and children buried in mass graves. Shifts of global air currents brought the specter of famine. Civilization had endured; more, it had flourished.

The force that put an end to the Chinese arc of progress was deadlier by far than all of these together, yet seemingly intangible as metaphysics. By the 16th century, the fleets had vanished, the proximate cause political; to this day there is no consensus on the underlying factors. It seems what saved Europe was its political disunity. Why was that lost in China? Some writers have blamed flat terrain, which others have disputed; some have blamed rice agriculture and its need for irrigation systems. Likely there were factors nobody has yet understood; perhaps we never will.

An entire future that might have been, was snuffed out by some terrible force compared to which war, plague and famine were mere pinpricks - and yet even with the benefit of hindsight, we still don't truly understand what it was.

Nor is this an isolated case. From the collapse of classical Mediterranean civilization to the divergent fates of the US and Argentina, whose prospects looked so similar as recently as the early 20th century, we find more terrible than any war or ordinary disaster are forces which operate unseen in plain sight and are only dimly understood even after the fact.

The saving grace has always been the outside: when one nation, one civilization, faltered, another picked up the torch and carried on; but with the march of globalization, there may soon be no more outside.

Unless of course we create a new one. Within this century, if we continue to make progress as quickly as possible, we may develop the technology to break our confinement, to colonize first the solar system and then the galaxy. And then our kind may truly be immortal, beyond the longest reach of the Grim Reaper, and love and joy and laughter be not outlived by the stars themselves.

If we continue to make progress as quickly as possible.

Yet at every turn, when risks are discussed, ten voices cry loudly about the violence that may be done with new technology for every one voice that quietly observes that we cannot afford to be without it, and we may not have as much time as we think we have. It is not that anyone is being intentionally selfish or dishonest. The critics believe what they are saying. It is that to the human mind, the dangers of progress are vivid even when imaginary; the dangers of its lack are scarcely perceptible even when real.

There are many reasons why we need more advanced technology, and we need it as soon as possible. Every year, more than fifty million people die for its lack, most in appalling suffering. But the one reason above all others is that the window of opportunity we are currently given may be the last step in the Great Filter, that we cannot know when it will close or if it does, whether it will ever open again.

Less Wrong is about bias, and the errors to which it leads us. I present then what may be the most lethal of all our biases: that we react instantly to the lesser death that comes in blood and fire, but the greater death that comes in the dust of time, is to our minds invisible.

And I ask that you remember, next time you contemplate alleged dangers of technology.

 

97 comments, sorted by
magical algorithm
Highlighting new comments since Today at 11:23 AM
Select new highlight date
Moderation Guidelinesexpand_more

I hardly think many here would object to love, joy, and laughter being not outlived by the stars themselves: as you say, the critics are not dishonest. As steven points out, any disagreement would seem to stem from differing assessments of the probabilities of stagnation risk and existential risk. If the future is going to be dominated by a hard takeoff Singularity, then it is incredibly important to make sure to get that first AGI exactly, perfectly right at the expense of all else. If the future is to be one of "traditional" space colonization and catostrophic risk from AI, MNT, &c. is negligible, then it's incredibly important to develop techs as quickly as possible. While the future does depend on what "we" decide to do now (bearing in mind that there is no unitary we), this is largely an empirical issue: how does the tech tree actually look? What does it take to colonize the stars? Is hard takeoff possible, and what would that take? &c. I think that these are the sorts of questions we need to be asking and trying to answer, rather then pledging ourselves to the "pro-safety" or "pro-technology" side. Since we all want more-or-less the same thing, it's in all of our best interests to try to reach the most accurate conclusions possible.

So basically you're saying that when Leo Szilard wanted to hide the true neutron cross section of purified graphite and Enrico Fermi wanted to publish it, you'd have published it.

I think rwallace is saying both men were right to continue their research.

Would you have hidden it?

You cannot hide the truth forever. Nuclear weapons were an inevitable technology. Likewise, whether or not Eurisko was genuine, someone will eventually cobble together an AGI. Especially if Eurisko was genuine, and the task really is that easy. The fact that you seem persuaded of the possibility of Lenat having danced on the edge of creating hard takeoff gives me more interest than ever before in a re-implementation.

Reading "value is fragile" almost had me persuaded that blindly pursuing AGI is wrong, but shortly after, "Safety is not Safe" reverted me back to my usual position: stagnation is as real and immediate a threat as ever there was, vastly dwarfing any hypothetical existential risks from rogue AI.

For instance, bloat and out-of-control accidental complexity have essentially halted all basic progress in computer software. I believe that the lack of quality programming systems will lead (and may already have led) directly to stagnation in other fields, such as computational biology. The near-term future appears to resemble Windows Vista rather than HAL. Engelbart's Intelligence Amplification dream has been lost in the noise. I thus expect civilization to succumb to Natural Stupidity in the near term future, unless a drastic reversal in these trends takes place.

Would you have hidden it?

I hope so. It was the right decision in hindsight, since the Nazi nuclear weapons program shut down when the Allies, at cost of some civilian lives, destroyed their source of deuterium. If they'd known they could've used purified graphite... well, they probably still wouldn't have gotten nuclear weapons in this Everett branch but they might have somewhere else.

Before 2001 I would probably have been on Fermi's side, but that's when I still believed deep down that no true harm could come to someone who was only faithfully trying to do science. (I.e. supervised universe thinking.)

stagnation is as real and immediate a threat as ever there was, vastly dwarfing any hypothetical existential risks from rogue AI.

How is blindly looking for AGI in a vast search space better than stagnation?

How does working on FAI qualify as "stagnation"?

How is blindly looking for AGI in a vast search space better than stagnation?

No amount of aimless blundering beats deliberate caution and moderation (see 15th century China example) for maintaining technological stagnation.

How does working on FAI qualify as "stagnation"?

It is a distraction from doing things which are actually useful in the creation of our successors.

You are trying to invent the circuit breaker before discovering electricity; the airbag before the horseless carriage. I firmly believe that all of the effort currently put into "Friendly AI" is wasted. The bored teenager who finally puts together an AGI in his parents' basement will not have read any of these deep philosophical tracts.

The bored teenager who finally puts together an AGI in his parents' basement will not have read any of these deep philosophical tracts.

AGI is a really hard problem. If it ever gets accomplished, it's going to be by a team of geniuses who have been working on the project for years. Will they be so immersed in the math that they won't have read the deep philosophical tracts?---maybe. But your bored teenager scenario makes no sense.

AGI is a really hard problem

It has successfully resisted solution thus far, but I suspect that it will seem laughably easy in retrospect when it finally falls.

If it ever gets accomplished, it's going to be by a team of geniuses who have been working on the project for years

This is not how truly fundamental breakthroughs are made.

Will they be so immersed in the math that they won't have read the deep philosophical tracts?

Here is where I agree with you - anyone both qualified and motivated to work on AGI will have no time or inclination to pontificate regarding some nebulous Friendliness.

But your bored teenager scenario makes no sense.

Why do you assume that AGI lies beyond the capabilities of any single intelligent person armed with a modern computer and a sufficiently unorthodox idea?

This is not how truly fundamental breakthroughs are made.

Hmm---now that you mention it, I realize my domain knowledge here is weak. How are truly fundamental breakthroughs made? I would guess that it depends on the kind of breakthrough---that there are some things that can be solved by a relatively small number of core insights (think Albert Einstein in the patent office) and some things that are big collective endeavors (think Human Genome Project). I would guess furthermore that in many ways AGI is more like the latter than the former, see below.

Why do you assume that AGI lies beyond the capabilities of any single intelligent person armed with a modern computer and a sufficiently unorthodox idea?

Only about two percent of the Linux kernel was personally written by Linus Torvalds. Building a mind seems like it ought to be more difficult than building an operating system. In either case, it takes more than an unorthodox idea.

Only about two percent of the Linux kernel was personally written by Linus Torvalds. Building a mind seems like it ought to be more difficult than building an operating system.

There is no law of Nature that says the consequences must be commensurate with their cause. We live in an unsupervised universe where a movement of butterfly's wings can determine the future of nations. You can't conclude that simply because the effect is expected to be vast, the cause ought to be at least prominent. This knowledge may only be found by a more mechanistic route.

You're right in the sense that I shouldn't have used the words ought to be, but I think the example is still good. If other software engineering projects take more than one person, then it seems likely that AGI will too. Even if you suppose the AI does a lot of the work up to the foom, you still have to get the AI up to the point where it can recursively self-improve.

How are truly fundamental breakthroughs made?

Usually by accident, by one or a few people. This is a fine example.

ought to be more difficult than building an operating system

I personally suspect that the creation of the first artificial mind will be more akin to a mathematician's "aha!" moment than to a vast pyramid-building campaign. This is simply my educated guess, however, and my sole justification for it is that a number of pyramid-style AGI projects of heroic proportions have been attempted and all failed miserably. I disagree with Lenat's dictum that "intelligence is ten million rules." I suspect that the legendary missing "key" to AGI is something which could ultimately fit on a t-shirt.

I personally suspect that the creation of the first artificial mind will be more akin to a mathematician's "aha!" moment than to a vast pyramid-building campaign. [...] my sole justification [...] is that a number of pyramid-style AGI projects of heroic proportions have been attempted and failed miserably.

"Reversed Stupidity is Not Intelligence." If AGI takes deep insight and a pyramid, then we would expect those projects to fail.

The bored teenager who finally puts together an AGI in his parents' basement will not have read any of these deep philosophical tracts.

That truly would be a sad day.

Are you seriously suggesting hypothetical AGIs built by bored teenagers in basements are "things which are actually useful in the creation of our successors"?

Is that your plan against intelligence stagnation?

Is that your plan against intelligence stagnation?

I'll bet on the bored teenager over a sclerotic NASA-like bureaucracy any day. Especially if a computer is all that's required to play.

This is an answer to a different question. A plan is something implemented to achieve a goal, not something that is just more likely to work (especially against you).

I view the teenager's success as simultaneously more probable and more desirable than that of a centralized bureaucracy. I should have made that more clear. And my "goal" in this case is simply the creation of superintelligence. I believe the entire notion of pre-AGI-discovery Friendliness research to be absurd, as I already explained in other comments.

You are using wrong terminology here. If the consequences of whatever AGI that got developed are seen as positive, if you are not dead as a result, it is already almost FAI, that is how it's defined: that the effect is positive. Deeper questions play on what it means for the effect to be positive, and how one can be wrong about considering certain effect positive even though it's not, but let's leave it aside for the moment.

If the teenager implemented something that has a good effect, it's FAI. The argument is not that whatever ad-hoc tinkering leads to is not within a strange concept of "Friendly AI", but that ad-hoc tinkering is expected to lead to disaster, however you call it.

if you are not dead as a result

I am profoundly skeptical of the link between Hard Takeoff and "everybody dies instantly."

ad-hoc tinkering is expected to lead to disaster

This is the assumption which I question. I also question the other major assumption of Friendly AI advocates: that all of their philosophizing and (thankfully half-hearted and ineffective) campaign to prevent the "premature" development of AGI will lead to a future containing Friendly AI, rather than no AI plus an earthbound human race dead from natural causes.

Ad-hoc tinkering has given us the seed of essentially every other technology. The major disasters usually wait until large-scale application of the technology by hordes of people following received rules (rather than an ab initio understanding of how it works) begins.

ad-hoc tinkering is expected to lead to disaster

This is the assumption which I question.

To discuss it, you need to address it explicitly. You might want to start from here, here and here.

I also question the other assumption of Friendly AI advocates: that all of their philosophizing and (thankfully half-hearted and ineffective) campaign to prevent the "premature" development of AGI will lead to a future containing Friendly AI, rather than no AI plus an earthbound human race dead from natural causes.

That's a wrong way to see it: the argument is simply that lack of disaster is better than a disaster (note that the scope of this category is separate from the first issue you raised, that is if it's shown that ad-hoc AGI is not disastrous, by all means go ahead and do it). Suicide is worse than pending death from "natural" causes. That's all. Whether it's likely that a better way out will be found, or even possible, is almost irrelevant to this position. But we ought to try to do it, even if it seems impossible, even if it is quite improbable.

Ad-hoc tinkering has given us the seed of essentially every other technology.

True, but if you expect a failure to kill civilization, the trial-and-error methodology must be avoided, even if it's otherwise convenient and almost indispensable, and has proven itself over the centuries.

You consider the creation of an unFriendly superinelligence a step on the road to understanding Friendliness?

As a Usenet discussion grows longer, the probability of a comparison involving Nazis or Hitler approaches 1.

Huh. Looking back, it actually seems that I already wrote up the complete reply to this post and it is "Raised in Technophilia" (Sep 17 '08).

My position is yes technology can kill us all. Also lack of technologies can get us all killed,

The ability to create intelligence is one we need on the long scale. I don't think ultra safe self-improving AI is a coherent concept, but I understand my thinking might be wrong. My problem is how can we move on from following that path if it is a dead end?

Here it is again: there is no such requirement that FAI needs to be "ultra safe" and that if it's not, it's unacceptable. This is a strawman. The requirement is that there needs to be any chance at all that the outcome is good (preferably a greater chance). Then, there is a separate conjecture that to have any chance at all, AI needs to be deeply understood.

If you think that being careful is unnecessary, that ad-hoc approach is ready to be used positively, you are not disputing the need for Friendliness in AGI. You are disputing the conjecture that Friendliness requires care. This is not a normative question, this is a factual question.

The normative question is whether to think about consequences of your actions, which is largely decided against or rather dismissed as trivial by far too many people who think they are working on AGI.

I got the impression from, "do the impossible" that Eliezer was going for definitely safe AI and might be safe was not good enough. Edit Oh and the sequence on fun theory suggested that scenarios where humanity just survived, were not good enough either.

I think we are so far away from having the right intellectual framework for creating AI or even thinking about its likely impact on the future, that the ad hoc approach might be valuable for pushing us in the right direction or telling us what the important structure in the human brain is going to look like.

I got the impression from, "do the impossible" that Eliezer was going for definitely safe AI and might be safe was not good enough.

The hypothesis here is that if you are unsure whether AGI is safe, it's not, and when you are sure it is, it's still probably not. Therefore, to have any chance of success, you have to be sure that you understand how the success is achieved. This is a question of human bias, not of the actual probability of success. See also: Possibility, Antiprediction.

I also thought that ad-hoc brings insight, but after learning more I changed my mind.

The hypothesis here is that if you are unsure whether AGI is safe, it's not, and when you are sure it is, it's still probably not.

I really didn't get that impression... Why worry about whether the AI will separate humanity if you think it might fail anyway. Surely spend more time making sure it doesn't fail...

I wish people were more scared of the dangers that cant yet be measured, like the chance a very large gamma ray could hit Earth for a short time then be aimed somewhere else. How do we know major extinctions in the past werent related to unknown behaviors of spacetime from outside where we measure? Or maybe the "constants" in the wave equations of physics sometimes vary. Is it really a good deal to let individual businesses hold the pieces of this knowledge to themselves instead of putting all our knowledge together to figure out whats possible?

I wish people were more scared of the dangers that cant yet be measured

Why?

The map is not the territory, but most people are happy to take the lack of dangers on their map as evidence of the safety of the terrirtory, so they dont update their maps.

It's nice to hear a quote from Wittgenstein. I hope we can get around to discussing the deeper meaning of this, which applies to all kinds of things... most especially, the process by which each kind of creature (bats, fish, homo sapiens, and potential embodied artifactual (n.1) minds (and also not embodied in the contemporaneously most often used sense of the term -- Watson was not embodied in that sense) *constructs it's own ontology) (or ought to, by virtuue of being embued with the right sort of architecture.)

That latter sense, and the incommensurability of competing ontologies in competing creatures (where 'creature' is defined defined as a hybrid, and N-tuple, of cultural legacy contructs, endemic evolutionarily bequeathed physiological sensorium, it's individual autobiographical experience...), but not (in my view, in the theory I am developing) opaque to enlightened translatability -- though the conceptual scaffolding for translaiton involves the nature of, purpose of, and boundaries, both logical and temporal of the "specious present", the quantum zeno effect, and other considerations, so it is more suble than meets the eye)... is more of what Wittengensttein was thinking about, considering Kant's answer to skepticism, and lots of other issues.

Your more straightforward point bears merit, however. Most of us have spend a good deal of our lives battling not issue opacity, as much as human opacity to new, expanded, revised, or unconventional ideas.

Note 1.: BY the way, I occasionally write 'artifactual' as opposed to 'artificial' because of the sense in which, as products of nature, everything we do -- including building AIs -- is, ipso facto, a product of nature, and hence, 'artificial' is an adjective we should be careful about.

most people are happy to take the lack of dangers on their map as evidence of the safety of the terrirtory

I believe they are mostly correct in that. What other evidence should they consider?

so they dont update their maps

That's a non sequitur. There are strong natural selection forces against this kind of behaviour.

I don't agree with your conclusion or the connection to AI research. But the segment about civilizations collapsing for unknown reasons is brilliant and well written, and really stands on it's own.

Actually, there is a science fiction story very similar to your opening section. I'm putting author and title in rot13 because the story got much of its effect because I read it under normal science fiction protocols. Surely humanity would eventually get into space-- but it doesn't and dies out. Well, then aliens will manage-- but there aren't any.

On the other hand, that's just one story in a large field, and I think it's only been reprinted once.

Zhecul'f Unyy ol Cbhy Naqrefba.

I agree that groups/societies get stuck for fairly long periods of time and that independence and competition between groups/societies is often beneficial. But I think stagnation is unlikely unless we end up with a totalitarian world government. See Bryan Caplan's essay

This seems to apply more to the space program than any field where progress is impeded by safety concerns, or such impediments are advocated here. Yes, we should have Mars colonies by now. That's not a shocking revelation, it's a near-universal belief among nerdy types, including LWers. But, since we don't, we need to minimize existential risks to life on Earth until we do, and we will always need to minimize existential risks capable of crossing interplanetary distances (i.e. uFAI, maybe nanotech or memetic plagues, although we don't know enough to even know if those are a danger.)