or: Why our universe has already had its one and only foom
In the late 1980s, I added half a megabyte of RAM to my Amiga 500. A few months ago, I added 2048 megabytes of RAM to my Dell PC. The later upgrade was four thousand times larger, yet subjectively they felt about the same, and in practice they conferred about the same benefit. Why? Because each was a factor of two increase, and it is a general rule that each doubling tends to bring about the same increase in capability.
That's a pretty important rule, so let's test it by looking at some more examples.
How does the performance of a chess program vary with the amount of computing power you can apply to the task? The answer is that each doubling of computing power adds roughly the same number of ELO rating points. The curve must flatten off eventually (after all, the computation required to fully solve chess is finite, albeit large), yet it remains surprisingly constant over a surprisingly wide range.
Is that idiosyncratic to chess? Let's look at Go, a more difficult game that must be solved by different methods, where the alpha-beta minimax algorithm that served chess so well, breaks down. For a long time, the curve of capability also broke down: in the 90s and early 00s, the strongest Go programs were based on hand coded knowledge such that some of them literally did not know what to do with extra computing power; additional CPU speed resulted in zero improvement.
The breakthrough came in the second half of last decade, with Monte Carlo tree search algorithms. It wasn't just that they provided a performance improvement, it was that they were scalable. Computer Go is now on the same curve of capability as computer chess: whether measured on the ELO or the kyu/dan scale, each doubling of power gives a roughly constant rating improvement.
Where do these doublings come from? Moore's Law is driven by improvements in a number of technologies, one of which is chip design. Each generation of computers is used, among other things, to design the next generation. Each generation needs twice the computing power of the last generation to design in a given amount of time.
Looking away from computers to one of the other big success stories of 20th-century technology, space travel, from Goddard's first crude liquid fuel rockets, to the V2, to Sputnik, to the half a million people who worked on Apollo, we again find that successive qualitative improvements in capability required order of magnitude after order of magnitude increase in the energy a rocket could deliver to its payload, with corresponding increases in the labor input.
What about the nuclear bomb? Surely that at least was discontinuous?
At the simplest physical level it was: nuclear explosives have six orders of magnitude more energy density than chemical explosives. But what about the effects? Those are what we care about, after all.
The death tolls from the bombings of Hiroshima and Nagasaki have been estimated respectively at 90,000-166,000 and 60,000-80,000. That from the firebombing of Hamburg in 1943 has been estimated at 42,600; that from the firebombing of Tokyo on the 10th of March 1945 alone has been estimated at over 100,000. So the actual effects were in the same league as other major bombing raids of World War II. To be sure, the destruction was now being carried out with single bombs, but what of it? The production of those bombs took the labor of 130,000 people, the industrial infrastructure of the worlds most powerful nation, and $2 billion of investment in 1945 dollars, nor did even that investment at that time gain the US the ability to produce additional nuclear weapons in large numbers at short notice. The construction of the massive nuclear arsenals of the later Cold War took additional decades.
(To digress for a moment from the curve of capability itself, we may also note that destructive power, unlike constructive power, is purely relative. The death toll from the Mongol sack of Baghdad in 1258 was several hundred thousand; the total from the Mongol invasions was several tens of millions. The raw numbers, of course, do not fully capture the effect on a world whose population was much smaller than today's.)
Does the same pattern apply to software as hardware? Indeed it does. There's a significant difference between the capability of a program you can write in one day versus two days. On a larger scale, there's a significant difference between the capability of a program you can write in one year versus two years. But there is no significant difference between the capability of a program you can write in 365 days versus 366 days. Looking away from programming to the task of writing an essay or a short story, a textbook or a novel, the rule holds true: each significant increase in capability requires a doubling, not a mere linear addition. And if we look at pure science, continued progress over the last few centuries has been driven by exponentially greater inputs both in number of trained human minds applied and in the capabilities of the tools used.
If this is such a general law, should it not apply outside human endeavor? Indeed it does. From protozoa which pack a minimal learning mechanism into a single cell, to C. elegans with hundreds of neurons, to insects with thousands, to vertebrates with millions and then billions, each increase in capability takes an exponential increase in brain size, not the mere addition of a constant number of neurons.
But, some readers are probably thinking at this point, what about...
... what about the elephant at the dining table? The one exception that so spectacularly broke the law?
Over the last five or six million years, our lineage upgraded computing power (brain size) by about a factor of three, and upgraded firmware to an extent that is unknown but was surely more like a percentage than an order of magnitude. The result was not a corresponding improvement in capability. It was a jump from almost no to fully general symbolic intelligence, which took us from a small niche to mastery of the world. How? Why?
To answer that question, consider what an extraordinary thing is a chimpanzee. In raw computing power, it leaves our greatest supercomputers in the dust; in perception, motor control, spatial and social reasoning, it has performance our engineers can only dream about. Yet even chimpanzees trained in sign language cannot parse a sentence as well as the Infocom text adventures that ran on the Commodore 64. They are incapable of arithmetic that would be trivial with an abacus let alone an early pocket calculator.
The solution to the paradox is that a chimpanzee could make an almost discontinuous jump to human level intelligence because it wasn't developing across the board. It was filling in a missing capability - symbolic intelligence - in an otherwise already very highly developed system. In other words, its starting point was staggeringly lopsided.
(Is there an explanation why this state of affairs came about in the first place? I think there is - in a nutshell, most conscious observers should expect to live in a universe where it happens exactly once - but that would require a digression into philosophy and anthropic reasoning, so it really belongs in another post; let me know if there's interest, and I'll have a go at writing that post.)
Can such a thing happen again? In particular, is it possible for AI to go foom the way humanity did?
If such lopsidedness were to repeat itself... well even then, the answer is probably no. After all, an essential part of what we mean by foom in the first place - why it's so scarily attractive - is that it involves a small group accelerating in power away from the rest of the world. But the reason why that happened in human evolution is that genetic innovations mostly don't transfer across species. The dolphins couldn't say hey, these apes are on to something, let's snarf the code for this symbolic intelligence thing, oh and the hands too, we're going to need manipulators for the toolmaking application, or maybe octopus tentacles would work better in the marine environment. Human engineers carry out exactly this sort of technology transfer on a routine basis.
But it doesn't matter, because the lopsidedness is not occurring. Obviously computer technology hasn't lagged in symbol processing - quite the contrary. Nor has it really lagged in areas like vision and pattern matching - a lot of work has gone into those, and our best efforts aren't clearly worse than would be expected given the available development effort and computing power. And some of us are making progress on actually developing AGI - very slow, as would be expected if the theory outlined here is correct, but progress nonetheless.
The only way to create the conditions for any sort of foom would be to shun a key area completely for a long time, so that ultimately it could be rapidly plugged into a system that is very highly developed in other ways. Hitherto no such shunning has occurred: every even slightly promising path has had people working on it. I advocate continuing to make progress across the board as rapidly as possible, because every year that drips away may be an irreplaceable loss; but if you believe there is a potential threat from unfriendly AI, then such continued progress becomes the one reliable safeguard.
While it is true that exponential improvements in computer speed and memory often have the sort of limited impact you are describing, algorithmic improvements are frequently much more helpful. When RSA-129 was published as a factoring challenge, it was estimated that even assuming Moore's law it would take a very long time to factor (the classic estimate was that it would take on the order of 10^15 years assuming that one could do modular arithmetic operations at 1 per a nanosecond. Assuming a steady progress of Moore's law one got an estimate in the range of hundreds of years at minimum.) However, it was factored only a few years later because new algorithms made factoring much much easier. In particular, the quadratic sieve and the number field sieve were both subexponential. The analogy here is roughly to the jump in Go programs that occurred when the new Monte Carlo methods were introduced.
An AI that is a very good mathematician, and can come up with lots of good algorithms might plausibly go FOOM. For example, if it has internet access and has finds a practical polynomial time factoring algorithm it will control much of the internet quite quickly. This is not the only example... (read more)
Sure, you can always cook up some ad hoc problem such that my perfect 100% solution to problem A is just a measly 5% subcomponent of problem B . That doesn't change the fact that I've solved problem A, and all the ramifications that come along with it. You're just relabeling things to automate a moving goal post. Luckily, an algorithm by any other name would still smell as sweet.
This is an interesting case, and reason enough to form your hypothesis, but I don't think observation backs up the hypothesis:
The difference in intelligence between the smartest academics and even another academic is phenomenal, to say nothing of the difference between the smartest academics and an average person. Nonetheless, the brain size of all these people is more or less the same. The difference in effectiveness is similar to the gulf between men and apes. The accomplishments of the smartest people are beyond the reach of average people. There are things smart people can do that average people couldn't, irregardless of their numbers or resources.
Such as, for instance, the fact that our brains aren't likely optimal computing machines, and could be greatly accelerated on silicon.
Forget about recursive FOOMs for a minute. Do you not think a greatly accelerated human would be orders of magnitude more useful (more powerful) than a regular human?
Correlation between IQ and effectiveness does break down at higher IQs, you're right. Nonetheless, there doesn't appear to be any sharp limit to effectiveness itself. This suggests to me that it is IQ that is breaking down, rather than us reaching some point of diminishing returns.
My point here was that human mind are lop-sided, to use your terminology. They are sorely lacking in certain hardware optimizations that could render them thousands or millions of times faster (this is contentious, but I think reasonable). Exposing human minds to Moore's Law doesn't just give them the continued benefit of exponential growth, it gives them a huge one-off explosion in capability.
For all intents and purposes, an uploaded IQ 150 person accelerated a million times might as well be a FOOM in terms of capability. Likewise an artificially constructed AI with similar abilities.
(Edit: To be clear, I'm skeptical of true recursive FOOMs as well. However, I don't think something that powerful is needed in practice for a hard take off to occur, and think arguments for FAI carry through just as well even if self modifying AIs hit a ceiling after the first or second round of self modification.)
I found this an interesting rule, and thought that after the examples you would establish some firmer theoretical basis for why it might work the way it does. But you didn't do that, and instead jumped to talking about AGI. It feels like you're trying to apply a rule before having established how and why it works, which raises "possible reasoning by surface analogy" warning bells in my head. The tone in your post is a lot more confident than it should be.
This rule smells awfully like a product of positive/confirmation bias to me.
Did you pre-select a wide range of various endeavours and projects and afterwards analysed their rate of progress, thus isolating this common pattern;
or did you notice a similarity between a couple of diminishing-returns phaenomena, and then kept coming up with more examples?
My guess is strongly on the latter.
I'm open to the usual kinds of Bayesian evidence. Lets see. H is "there will be no more FOOMs". What do you have in mind as a good E? Hmm, lets see. How will the world be observably different if you are right, from how it will look if you are wrong?
Point out such an E, and then observe it, and you may sway me to your side.
Removing my tongue from my cheek, I will make an observation. I'm sure that you have heard the statement "Extraordinary claims require extraordinary evidence." Well, there is another kind of claim that requires extraordinary evidence. Claims of the form "We don't have to worry about that, anymore."
It seems like you're entirely ignoring feedback effects from more and better intelligence being better at creating more and better intelligence, as argued in Yudkowsky's side of the FOOM debate.
And hardware overhang (faster computers developed before general cognitive algorithms, first AGI taking over all the supercomputers on the Internet) and fast infrastructure (molecular nanotechnology) and many other inconvenient ideas.
Also if you strip away the talk about "imbalance" what it works out to is that there's a self-contained functioning creature, the chimpanzee, and natural selection burps into it a percentage more complexity and quadruple the computing power, and it makes a huge jump in capability. Nothing is offered to support the assertion that this is the only such jump which exists, except the bare assertion itself. Chimpanzees were not "lopsided", they were complete packages designed for an environment; it turned out there were things that could be done which created a huge increase in optimization power (calling this "symbolic processing" assumes a particular theory of mind, and I think it is mistaken) and perhaps there are yet more things like that, such as, oh, say, self-modification of code.
Interesting. Can you elaborate or link to something?
I'm not Eliezer, but will try to guess what he'd have answered. The awesome powers of your mind only feel like they're about "symbols", because symbols are available to the surface layer of your mind, while most of the real (difficult) processing is hidden. Relevant posts: Detached Lever Fallacy, Words as Mental Paintbrush Handles.
It would be better if you waited until you had made somewhat of a solid argument before you resorted to that appeal. Even Robin's "Trust me, I'm an Economist!" is more persuasive.
The Bottom Line is one of the earliest posts in Eliezer's own rationality sequences and describes approximately this objection. You'll note that he added an Addendum:
But barely. ;)
You would not believe how little that would impress me. Well, I suppose you would - I've been talking with XiXi about Ben, after all. I wouldn't exactly say that your status incentives promote neutral reasoning on this position - or Robin on the same. It is also slightly outside of the core of your expertise, which is exactly where the judgement of experts is notoriously demonstrated to be poor.
You are trying to create AGI without friendliness and you would like to believe it will go foom? And this is supposed to make us trust your judgement with respect to AI risks?
Incidentally, 'the bottom line' accusation here was yours, not the other way around. The reference was to question its premature use as a fully general counterargument.
We are talking here about pred... (read more)
When you make a sound carry 10 times as much energy, it only sounds a bit louder.
If your unit of measure already compensates for huge leaps in underlying power, then you'll tend to ignore that the leaps in power are huge.
How you feel about a ram upgrade is one such measure, because you don't feel everything that happens inside your computer. You're measuring benefit by how it works today vs. yesterday, instead of what "it" is doing today vs. 20 years ago.
Isn't that lopsideness the state computers are currently in? So the first computer that gets the 'general intelligence' thing will have a huge advantage even before any fancy self-modification.
You seem to be arguing from the claim that there are exponentially diminishing returns within a certain range, to the claim that there are no phase transitions outside that range. You explain away the one phase transition you've noticed (ape to human) in an unconvinving manner, and ignored three other discontinuous phase transitions: agriculture, industrialization, and computerization.
Also, this statement would require extraordinary evidence that is not present:
It seems to me that the first part of your post, that examines PC memory and such, simply states that (1) there are some algorithms (real-world process patterns) that take exponential amount of resources, and (2) if capability and resources are roughly the same thing, then we can unpack the concept of "significant improvement in capability" as "roughly a doubling in capability", or, given that resources are similar to capability in such cases, "roughly a doubling in resources".
Yes, such things exist. There are also other kinds of things.
I'm not convinced; but it's interesting. A lot hinges on the next-to-last paragraph, which is dubious and handwavy.
One weakness is that, when you say that chimpanzees looked like they were well-developed creatures, but really they had this huge unknown gap in their capabilities, which we filled in, I don't read that as evidence that now we are fully-balanced creatures with no gaps. I wonder where the next gap is. (EDIT: See jimrandomh's excellent comment below.)
What if an AI invents quantum computing? Or, I don't know, is rational?
Another weakness is the assumption that the various scales you measure things on, like go ratings, are "linear". Go ratings, at least, are not. A decrease in 1 kyu is supposed to mean an increase in likelihood ratio of winning by a factor of 3. Also, by your logic, it should take twice as long to go from 29 kyu to 28, as from 30 to 29; no one should ever reach 10 kyu.... (read more)
You seem to have completely glossed over the idea of recursive self-improvement.
Whenever you have a process that requires multiple steps to complete, you can't go any faster than the slowest step. Unless Intel's R&D department actually does nothing but press the "make a new CPU design" button every few months, I think the limiting factor is still the step that involves unimproved human brains.
Elsewhere in this thread you talk about other bottlenecks, but as far as I know FOOM was never meant to imply unbounded speed of progress, only fast enough that humans have no hope of keeping up.
Even if AGI had only very small comparative advance(in skill and ability to recursively self-improve) over humans supported by then best available computer technology and thus their ability to self-improve, it would eventually, propably even then quite fast, overpower humans totally and utterly. And it seems fairly likely that eventually you could build fully artifical agent that was strictly superior(or could recursively self-update to be one) to humans. This intuition is fairly likely given that humans are not ultimately designed to be the singularity-su... (read more)
There is no law of nature that requires consequences to be commensurate with their causes. One can build a doom machine that is activated by pressing a single button. A mouse exerts gravitational attraction on all of the galaxies in the future light cone.
What has capability that's super-logarithmic?
Physical understanding. One thing just leads to another thing, which leads to another thing... Multiple definitions of capability here, but what I'm thinking of is the fact that there are
You're reaching. There is no such general law. There is just the observation that whatever is at the top of the competition in any area, is probably subject to some diminishing returns from that local optimum. This is an interesting generalization and could provide insight... (read more)
This, to me, could be a much more compelling argument, if presented well. So there's definitely a lot of interest from me.
This one is a little bit like my "The Intelligence Explosion Is Happening Now" essay.
Surely my essay's perspective is a more realistic one, though.
Essentially, this is an argument that a FOOM would be a black swan event in the history of optimization power. That provides some evidence against overconfident Singularity forecasts, but doesn't give us enough reason to dismiss FOOM as an existential risk.
The ELO rating scheme is calculated on a logistic curve - and so includes an exponent - see details here. It gets harder to climb up the ratings the higher you get.
It's the same with traditional go kyu/dan ratings - 9 kyu to 8 kyu is easy, 8 dan to 9 dan is very difficult.
This is a very interesting proposal, but as a programmer and with my knowledge of statistical analysis I have to disagree:
The increase in computing power that you regrettably cannot observe at user level is a fairly minor quirk of the development; pray tell, does the user interface of your Amiga system's OS and your Dell system's OS look alike? The reason why modern computers don't feel faster is because the programs we run on them are wasteful and gimmicky. However, in therms of raw mathematics, we have 3D games with millions and millions of polygons, we ... (read more)
This post is missing the part where the observations made support the conclusion to any significant degree.
Computer chips used to assist practically identical human brains to tweak computer chips is a superficial similarity to a potential foom at the very best.
Well, ELO rating is a logarithmic scale, after all.
A very familiar class of discontinuities in engineering refers to functionality being successfully implemented for the first time. This is a moment where a design finally comes together, all its components adjusted to fit with each other, the show-stopping bugs removed from the system. And then it just works, where it didn't before.
Following logic of the article: our current mind-designs, both human, and real AIs are lopsided in that they can't effectively modify themselves. There is theory of mind, but as far as maps go, this one is plenty far from the territory. Once this obstacle is dealt with there might be another FOOM.
I got some questions regarding self-improvement:
I'm asking because I sense that those questions might be important in regard to the argument in the original post that each significant increase in capability requires a doubling of resources.
The program that the AI is writing is itself, so the second half of those two years takes less than one year - as determined by the factor of "significant difference". And if there's a significant difference from 1 to 2, there ought to be a significant difference from 2 to 4 as well, no? But the time taken to get from 2 to 4 is not two years; it is 2 years divided by the square of whatever integer you care to represent "significa... (read more)
Could the current computing power we have, be obtained by enough Commodore 64 machines?
Maybe, but we would have to have many orders of magnitude bigger power supply for that. And this is just the beginning. The cell phone based on those processors could weight a ton each.
In other words, an impossibility from a whole bunch of reasons.
We already had a foom stretched over the last 30 years. Not the first and not the last one, if we are going to proceed as planed.
What is the Relationship between Language, Analogy, and Cognition?... (read more)
Side issue: should the military effectiveness of bombs be measured by the death toll?
I tentatively suggest that atomic and nuclear bombs are of a different kind than chemical explosives, as shown by the former changing the world politically.
I'm not sure exactly why-- it may have been the shock of novelty.
Atom and nuclear bombs combine explosion, fire, and poison, but I see no reason to think there would have have been the same sort of wide spread revulsion against chemical explosives, and the world outlawed poison gas in WWI and just kept on going with chemical explosives.
Chemical explosives changed the world politically, just longer ago. Particularly when they put the chemicals in a confined area and put lead pellets on top...
The neglected missing piece is understanding of intelligence. (Not understanding how to solve a a Rubik cube, but understanding the generalized process that upon seeing a Rubik cube for the first time, and hearing the goal of the puzzle, figures out how to solve it.)
Do you have a citation for the value of the Go performance improvement per hardware doubling?
Chimpanzee brains wer... (read more)
Snagged from a thread that's gone under a fold:
One thing people do that neither chimps nor computers have managed is invent symbolic logic.
Maybe it's in the sequences somewhere, but what does it take to notice gaps in one's models and oddities that might be systematizable?
 If I'm going to avoid P=0, then I'll say it's slightly more likely that chimps have done significant intellectual invention than computers.