Continuation ofHard Takeoff

The analysis given in the last two days permits more than one possible AI trajectory:

  1. Programmers, smarter than evolution at finding tricks that work, but operating without fundamental insight or with only partial insight, create a mind that is dumber than the researchers but performs lower-quality operations much faster.  This mind reaches k > 1, cascades up to the level of a very smart human, itself achieves insight into intelligence, and undergoes the really fast part of the FOOM, to superintelligence.  This would be the major nightmare scenario for the origin of an unFriendly AI.
  2. Programmers operating with partial insight, create a mind that performs a number of tasks very well, but can't really handle self-modification let alone AI theory.  A mind like this might progress with something like smoothness, pushed along by the researchers rather than itself, even all the way up to average-human capability - not having the insight into its own workings to push itself any further.  We also suppose that the mind is either already using huge amounts of available hardware, or scales very poorly, so it cannot go FOOM just as a result of adding a hundred times as much hardware.  This scenario seems less likely to my eyes, but it is not ruled out by any effect I can see.
  3. Programmers operating with strong insight into intelligence, directly create along an efficient and planned pathway, a mind capable of modifying itself with deterministic precision - provably correct or provably noncatastrophic self-modifications.  This is the only way I can see to achieve narrow enough targeting to create a Friendly AI.  The "natural" trajectory of such an agent would be slowed by the requirements of precision, and sped up by the presence of insight; but because this is a Friendly AI, notions like "You can't yet improve yourself this far, your goal system isn't verified enough" would play a role.

So these are some things that I think are permitted to happen, albeit that case 2 would count as a hit against me to some degree because it does seem unlikely.

Here are some things that shouldn't happen, on my analysis:

  • An ad-hoc self-modifying AI as in (1) undergoes a cycle of self-improvement, starting from stupidity, that carries it up to the level of a very smart human - and then stops, unable to progress any further.  (The upward slope in this region is supposed to be very steep!)
  • A mostly non-self-modifying AI as in (2) is pushed by its programmers up to a roughly human level... then to the level of a very smart human... then to the level of a mild transhuman... but the mind still does not achieve insight into its own workings and still does not undergo an intelligence explosion - just continues to increase smoothly in intelligence from there.

And I also don't think this is allowed:

  • The "scenario that Robin Hanson seems to think is the line-of-maximum-probability for AI as heard and summarized by Eliezer Yudkowsky":
    • No one AI that does everything humans do, but rather a large, diverse population of AIs.  These AIs have various domain-specific competencies that are "human+ level" - not just in the sense of Deep Blue beating Kasparov, but in the sense that in these domains, the AIs seem to have good "common sense" and can e.g. recognize, comprehend and handle situations that weren't in their original programming.  But only in the special domains for which that AI was crafted/trained.  Collectively, these AIs may be strictly more competent than any one human, but no individual AI is more competent than any one human.
    • Knowledge and even skills are widely traded in this economy of AI systems.
    • In concert, these AIs, and their human owners, and the economy that surrounds them, undergo a collective FOOM of self-improvement.  No local agent is capable of doing all this work, only the collective system.
    • The FOOM's benefits are distributed through a whole global economy of trade partners and suppliers, including existing humans and corporations, though existing humans and corporations may form an increasingly small fraction of the New Economy.
    • This FOOM looks like an exponential curve of compound interest, like the modern world but with a substantially shorter doubling time.

Mostly, Robin seems to think that uploads will come first, but that's a whole 'nother story.  So far as AI goes, this looks like Robin's maximum line of probability - and if I got this mostly wrong or all wrong, that's no surprise.  Robin Hanson did the same to me when summarizing what he thought were my own positions.  I have never thought, in prosecuting this Disagreement, that we were starting out with a mostly good understanding of what the Other was thinking; and this seems like an important thing to have always in mind.

So - bearing in mind that I may well be criticizing a straw misrepresentation, and that I know this full well, but I am just trying to guess my best - here's what I see as wrong with the elements of this scenario:

• The abilities we call "human" are the final products of an economy of mind - not in the sense that there are selfish agents in it, but in the sense that there are production lines; and I would even expect evolution to enforce something approaching fitness as a common unit of currency.  (Enough selection pressure to create an adaptation from scratch should be enough to fine-tune the resource curves involved.)  It's the production lines, though, that are the main point - that your brain has specialized parts and the specialized parts pass information around.  All of this goes on behind the scenes, but it's what finally adds up to any single human ability.

In other words, trying to get humanlike performance in just one domain, is divorcing a final product of that economy from all the work that stands behind it.  It's like having a global economy that can only manufacture toasters, but not dishwashers or light bulbs.  You can have something like Deep Blue that beats humans at chess in an inhuman, specialized way; but I don't think it would be easy to get humanish performance at, say, biology R&D, without a whole mind and architecture standing behind it, that would also be able to accomplish other things.  Tasks that draw on our cross-domain-ness, or our long-range real-world strategizing, or our ability to formulate new hypotheses, or our ability to use very high-level abstractions - I don't think that you would be able to replace a human in just that one job, without also having something that would be able to learn many different jobs.

I think it is a fair analogy to the idea that you shouldn't see a global economy that can manufacture toasters but not manufacture anything else.

This is why I don't think we'll see a system of AIs that are diverse, individually highly specialized, and only collectively able to do anything a human can do.

• Trading cognitive content around between diverse AIs is more difficult and less likely than it might sound.  Consider the field of AI as it works today.  Is there any standard database of cognitive content that you buy off the shelf and plug into your amazing new system, whether it be a chessplayer or a new data-mining algorithm?  If it's a chess-playing program, there are databases of stored games - but that's not the same as having databases of preprocessed cognitive content.

So far as I can tell, the diversity of cognitive architectures acts as a tremendous barrier to trading around cognitive content.  If you have many AIs around that are all built on the same architecture by the same programmers, they might, with a fair amount of work, be able to pass around learned cognitive content.  Even this is less trivial than it sounds.  If two AIs both see an apple for the first time, and they both independently form concepts about that apple, and they both independently build some new cognitive content around those concepts, then their thoughts are effectively written in a different language.  By seeing a single apple at the same time, they could identify a concept they both have in mind, and in this way build up a common language...

...the point being that even when two separated minds are running literally the same source code, it is still difficult for them to trade new knowledge as raw cognitive content without having a special language designed just for sharing knowledge.

Now suppose the two AIs are built around different architectures.

The barrier this opposes to a true, cross-agent, literal "economy of mind", is so strong, that in the vast majority of AI applications you set out to write today, you will not bother to import any standardized preprocessed cognitive content.  It will be easier for your AI application to start with some standard examples - databases of that sort of thing do exist, in some fields anyway - and redo all the cognitive work of learning on its own.

That's how things stand today.

And I have to say that looking over the diversity of architectures proposed at any AGI conference I've attended, it is very hard to imagine directly trading cognitive content between any two of them.  It would be an immense amount of work just to set up a language in which they could communicate what they take to be facts about the world - never mind preprocessed cognitive content.

This is a force for localization: unless the condition I have just described changes drastically, it means that agents will be able to do their own cognitive labor, rather than needing to get their brain content manufactured elsewhere, or even being able to get their brain content manufactured elsewhere.  I can imagine there being an exception to this for non-diverse agents that are deliberately designed to carry out this kind of trading within their code-clade.  (And in the long run, difficulties of translation seems less likely to stop superintelligences.)

But in today's world, it seems to be the rule that when you write a new AI program, you can sometimes get preprocessed raw data, but you will not buy any preprocessed cognitive content - the internal content of your program will come from within your program.

And it actually does seem to me that AI would have to get very sophisticated before it got over the "hump" of increased sophistication making sharing harder instead of easier.  I'm not sure this is pre-takeoff sophistication we're talking about, here.  And the cheaper computing power is, the easier it is to just share the data and do the learning on your own.

Again - in today's world, sharing of cognitive content between diverse AIs doesn't happen, even though there are lots of machine learning algorithms out there doing various jobs.  You could say things would happen differently in the future, but it'd be up to you to make that case.

• Understanding the difficulty of interfacing diverse AIs, is the next step toward understanding why it's likely to be a single coherent cognitive system that goes FOOM via recursive self-improvement.  The same sort of barriers that apply to trading direct cognitive content, would also apply to trading changes in cognitive source code.

It's a whole lot easier to modify the source code in the interior of your own mind, than to take that modification, and sell it to a friend who happens to be written on different source code.

Certain kinds of abstract insights would be more tradeable, among sufficiently sophisticated minds; and the major insights might be well worth selling - like, if you invented a new general algorithm at some subtask that many minds perform.  But if you again look at the modern state of the field, then you find that it is only a few algorithms that get any sort of general uptake.

And if you hypothesize minds that understand these algorithms, and the improvements to them, and what these algorithms are for, and how to implement and engineer them - then these are already very sophisticated minds, at this point, they are AIs that can do their own AI theory.  So the hard takeoff has to have not already started, yet, at this point where there are many AIs around that can do AI theory.  If they can't do AI theory, diverse AIs are likely to experience great difficulties trading code improvements among themselves.

This is another localizing force.  It means that the improvements you make to yourself, and the compound interest earned on those improvements, is likely to stay local.

If the scenario with an AI takeoff is anything at all like the modern world in which all the attempted AGI projects have completely incommensurable architectures, then any self-improvements will definitely stay put, not spread.

• But suppose that the situation did change drastically from today, and that you had a community of diverse AIs which were sophisticated enough to share cognitive content, code changes, and even insights.  And suppose even that this is true at the start of the FOOM - that is, the community of diverse AIs got all the way up to that level, without yet using a FOOM or starting a FOOM at a time when it would still be localized.

We can even suppose that most of the code improvements, algorithmic insights, and cognitive content driving any particular AI, is coming from outside that AI - sold or shared - so that the improvements the AI makes to itself, do not dominate its total velocity.

Fine.  The humans are not out of the woods.

Even if we're talking about uploads, it will be immensely more difficult to apply any of the algorithmic insights that are tradeable between AIs, to the undocumented human brain, that is a huge mass of spaghetti code, that was never designed to be upgraded, that is not end-user-modifiable, that is not hot-swappable, that is written for a completely different architecture than what runs efficiently on modern processors...

And biological humans?  Their neurons just go on doing whatever neurons do, at 100 cycles per second (tops).

So this FOOM that follows from recursive self-improvement, the cascade effect of using your increased intelligence to rewrite your code and make yourself even smarter -

The barriers to sharing cognitive improvements among diversely designed AIs, are large; the barriers to sharing with uploaded humans, are incredibly huge; the barrier to sharing with biological humans, is essentially absolute.  (Barring a (benevolent) superintelligence with nanotechnology, but if one of those is around, you have already won.)

In this hypothetical global economy of mind, the humans are like a country that no one can invest in, that cannot adopt any of the new technologies coming down the line.

I once observed that Ricardo's Law of Comparative Advantage is the theorem that unemployment should not exist.  The gotcha being that if someone is sufficiently unreliable, there is a cost to you to train them, a cost to stand over their shoulders and monitor them, a cost to check their results for accuracy - the existence of unemployment in our world is a combination of transaction costs like taxes, regulatory barriers like minimum wage, and above all, lack of trust.  There are a dozen things I would pay someone else to do for me - if I wasn't paying taxes on the transaction, and if I could trust a stranger as much as I trust myself (both in terms of their honesty and of acceptable quality of output).  Heck, I'd as soon have some formerly unemployed person walk in and spoon food into my mouth while I kept on typing at the computer - if there were no transaction costs, and I trusted them.

If high-quality thought drops into a speed closer to computer time by a few orders of magnitude, no one is going to take a subjective year to explain to a biological human an idea that they will be barely able to grasp, in exchange for an even slower guess at an answer that is probably going to be wrong anyway.

Even uploads could easily end up doomed by this effect, not just because of the immense overhead cost and slowdown of running their minds, but because of the continuing error-proneness of the human architecture.  Who's going to trust a giant messy undocumented neural network, any more than you'd run right out and hire some unemployed guy off the street to come into your house and do your cooking?

This FOOM leaves humans behind -

- unless you go the route of Friendly AI, and make a superintelligence that simply wants to help humans, not for any economic value that humans provide to it, but because that is its nature.

And just to be clear on something - which really should be clear by now, from all my other writing, but maybe you're just wandering in - it's not that having squishy things running around on two legs is the ultimate height of existence.  But if you roll up a random AI with a random utility function, it just ends up turning the universe into patterns we would not find very eudaimonic - turning the galaxies into paperclips.  If you try a haphazard attempt at making a "nice" AI, the sort of not-even-half-baked theories I see people coming up with on the spot and occasionally writing whole books about, like using reinforcement learning on pictures of smiling humans to train the AI to value happiness, yes this was a book, then the AI just transforms the galaxy into tiny molecular smileyfaces...

It's not some small, mean desire to survive for myself at the price of greater possible futures, that motivates me.  The thing is - those greater possible futures, they don't happen automatically.  There are stakes on the table that are so much an invisible background of your existence that it would never occur to you they could be lost; and these things will be shattered by default, if not specifically preserved.

• And as for the idea that the whole thing would happen slowly enough for humans to have plenty of time to react to things - a smooth exponential shifted into a shorter doubling time - of that, I spoke yesterday.  Progress seems to be exponential now, more or less, or at least accelerating, and that's with constant human brains.  If you take a nonrecursive accelerating function and fold it in on itself, you are going to get superexponential progress.  "If computing power doubles every eighteen months, what happens when computers are doing the research" should not just be a faster doubling time.  (Though, that said, on any sufficiently short timescale, progress might well locally approximate an exponential because investments will shift in such fashion that the marginal returns on investment balance, even in the interior of a single mind; interest rates consistent over a timespan imply smooth exponential growth over that timespan.)

You can't count on warning, or time to react.  If an accident sends a sphere of plutonium, not critical, but prompt critical, neutron output can double in a tenth of a second even with k = 1.0006.  It can deliver a killing dose of radiation or blow the top off a nuclear reactor before you have time to draw a breath.  Computers, like neutrons, already run on a timescale much faster than human thinking.  We are already past the world where we can definitely count on having time to react.

When you move into the transhuman realm, you also move into the realm of adult problems.  To wield great power carries a price in great precision.  You can build a nuclear reactor but you can't ad-lib it.  On the problems of this scale, if you want the universe to end up a worthwhile place, you can't just throw things into the air and trust to luck and later correction.  That might work in childhood, but not on adult problems where the price of one mistake can be instant death.

Making it into the future is an adult problem.  That's not a death sentence.  I think.  It's not the inevitable end of the world.  I hope.  But if you want humankind to survive, and the future to be a worthwhile place, then this will take careful crafting of the first superintelligence - not just letting economics or whatever take its easy, natural course.  The easy, natural course is fatal - not just to ourselves but to all our hopes.

That, itself, is natural.  It is only to be expected.  To hit a narrow target you must aim; to reach a good destination you must steer; to win, you must make an extra-ordinary effort.

New Comment
21 comments, sorted by Click to highlight new comments since:
Progress seems to be exponential now, more or less, or at least accelerating, and that's with constant human brains.

Except that, functionally speaking, human computing capabilities are nowhere near constant, due to:

  • Culture. Humans download almost all their thinking software from the pool of human culture. That's the main thing that separates a modern human from a Neanderthal man. New thinking tools are arising all the time now, and we download them from the internet - thus souping up our brains with new and improved software.

  • Machines. Human brains are functionally augmented by machines. In addition to vastly expanding our sensory and motor channels, they intelligently preprocess our sensory inputs and post-process our motor outputs. The machines compensate for our weaknesses - having abilities at serial deterministic calculations that we are poor at - and having huge, reliable memories.

The net result of these two effects is an enormous increase in human capability.


Eliezer, what are you going to do next?

I designed, with a co-worker, a cognitive infrastructure for DARPA that is supposed to let AIs share code. I intended to have cognitive modules be web services (at present, they're just software agents). Every representation used was to be evaluated using a subset of Prolog, so that expressions could be automatically converted between representations. (This was never implemented; nor was ontology mapping, which is really hard and would also be needed to translate content.) Unfortunately, my former employer didn't let me publish anything on it. Also, it works only with symbolic AI.

It wouldn't change the picture Eliezer is drawing much even if it worked perfectly, though.

on any sufficiently short timescale, progress should locally approximate an exponential because of competition between interest rates (even in the interior of a single mind).

That looks so... dim. (But sadly, it sounds too true.) So I ask too: what to do next? Hack AI and... become "death, destroyer of worlds"? Or think about FAI without doing anything specific? And doing that not just using that "just for fun" curiosity, which is needed (or so it seems) for every big scientific discovery. (Or is it just me who thinks it that way?)

Anyway... Do we have any information about what the human brain is capable of without additional downloaded "software"? (Or has the co-evolution of the brain and the "software" played such an important role that certain parts of it need some "drivers" to be useful at all?)


I haven't read this entire post but would you consider all the different natural intelligences in the world today different architectures? Would you consider different humans as having the same hardware (with exceptions) but different software? Can intelligence be created or exist as a individual entity independent of the existence any other intelligent entity? It seems likely to me that it could only exist as a social artifact even if only as a set of clones interacting, learning, and diverging independently, or only in relation to its human creators. It also seems human selection pressure would be weak compared to natural pressures and lack of pressure would translate to a lack of advancement.

There are plenty of things you can prove about specific, chosen computations. You just can't prove them about arbitrary computations.


What the heck does the theory of relativity have to do with AI? The theory of relativity's only worth is that it gives an accurate description of how the universe works.

Your system might have some common mathematical grounds with relativity, but I don't think that makes it "based on the Theory of Relativity". Or at least, the validity of the theory of relativity doesn't say anything about how applicable the mathematical models are applicable to your domain. Sounds like marketese hype to me.

(Sorry for the somewhat agressive diverging issue; I don't have much to say about the main post beyond "I want to see more about that", or "I'm curious as to what Robin will answer")

"It's a whole lot easier to modify the source code in the interior of your own mind, then to take that modification, and sell it to a friend who happens to be written on different source code"

If I understand the sentence correctly, "then" should be "than".

It depends on what he was trying to say. As it is it looks like a sequence, "A, then B, and C". But I think his point would be better made and stronger by using "than" rather then "then" and eliminating the commas.

I just made a similar goof - it should have been "rather than" not "rather then".

Eliezer, "changes in my progamming that seem to result in improvements" are sufficently arbitrary that you may still have to face the halting problem, i.e. if you are programming an intelligent being, it is going to be sufficiently complicated that you will never prove that there are no bugs in your original programming, i.e. even ones that may show no effect until it has improved itself 1,000,000 times, and by then it will be too late.

Apart from this, no intelligent entity can predict in own actions, i.e. it will always have a feeling of "free will." This is necessary because whenever it looks at a choice between A and B, it will always say, "I could do A, if I thought it was better," and "I could also do B, if I thought it was better." So it's own actions are surely unpredictable to it, it can't predict the choice until it actually makes the choice, just like us. But this implies that "insight into intelligence" may be impossible, or at least full insight into one's own intelligence, and that is enough to imply that your whole project may be impossible, or at least that it may go very slowly, so Robin will turn our to be right.

Once again, that is not the halting problem. The halting problem has to do with what can be said about an arbitrary program. A well-constructed self-improving intelligence is a decidedly non-arbitrary program.

The halting problem's undecidability prevents you from writing a program that is guaranteed to prove whether other arbitrary programs halt or not. You can't write a halting-checker program and then feed it the source code to your AGI and have it say "this halts" or "this doesn't halt".

You can prove that some specific programs halt. If you're trying to prove it about a program that you are writing at the time, it can even be easy. I have written many programs that halt on all input sequences, and then proved this fact about them.

Here's a super-easy example: print "hello world" provably always halts.

Rereading this post some years later I found a number of things were explained:

"Localization" usually has to do with being spatially adjacent. This post apparently means something quite different by the term - something to do with not doing "sharing of cognitive content between diverse AIs".

That apparently explains some of the confusing comments in the RSI post - in particular this:

I haven't yet touched on the issue of localization (though the basic issue is obvious: the initial recursive cascade of an intelligence explosion can't race through human brains because human brains are not modifiable until the AI is already superintelligent).

...which just seems like crazy talk under the usual meaning of the term "localization".

A more conventional vision involves development being spread across labs and data centres distributed throughout the planet. That is not terribly "local" in ordinary 3D space - but may be "local" in whatever space is being talked about in this post.

The post contains this:

I can imagine there being an exception to this for non-diverse agents that are deliberately designed to carry out this kind of trading within their code-clade.

Exactly. So: not "local" al all.

I recently read the wiki article on criticality accidents, and it seems relevant here. "A criticality accident, sometimes referred to as an excursion or a power excursion, is the unintentional assembly of a critical mass of a given fissile material, such as enriched uranium or plutonium, in an unprotected environment."

Assuming Eliezer's analysis is correct, we cannot afford even 1 of these in the domain of self-improving AI. Thankfully, its harder to accidentally create a self-improving AI than it is to drop a brick in the wrong place at the wrong time.

Here are some things that shouldn't happen, on my analysis: An ad-hoc self-modifying AI as in (1) undergoes a cycle of self-improvement, starting from stupidity, that carries it up to the level of a very smart human - and then stops, unable to progress any further.

I'm sure this has been discussed elsewhere, but to me it seems possible that progress may stop when the mind becomes too complex to make working changes to.

I used to think that a self-improving AI would foom because as it gets smarter, it gets easier for it to improve itself. But it may get harder for it to improve itself, because as it self-improves it may turn itself into more and more of an unmaintainable mess.

What if creating unmaintainable messes is the only way that intelligences up to very-smart-human-level know how to create intelligences up to very-smart-human level? That would make that level a hard upper limit on a self-improving AI.

  1. Programmers operating with partial insight, create a mind that performs a number of tasks very well, but can't really handle self-modification let alone AI theory [...] This scenario seems less likely to my eyes, but it is not ruled out by any effect I can see.

Twelve and a half years later, does new evidence for the scaling hypothesis make this scenario more plausible? If we're in the position of being able to create increasingly capable systems without really understanding how they work by throwing lots of compute at gradient descent, then won't those systems themselves likely also be in the position of not understanding themselves enough to "close the loop" on recursive self-improvement?