Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligence conference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.
As this is the first case study, it will also introduce the paper's prediction classification shemas.
Taxonomy of predictions
There will never be a bigger plane built.
Boeing engineer on the 247, a twin engine plane that held ten people.
A fortune teller talking about celebrity couples, a scientist predicting the outcome of an experiment, an economist pronouncing on next year's GDP figures - these are canonical examples of predictions. There are other types of predictions, though. Conditional statements - if X happens, then so will Y - are also valid, narrower, predictions. Impossibility results are also a form of prediction. For instance, the law of conservation of energy gives a very broad prediction about every single perpetual machine ever made: to wit, that they will never work.
The common thread is that all these predictions constrain expectations of the future. If one takes the prediction to be true, one expects to see different outcomes than if one takes it to be false. This is closely related to Popper's notion of falsifiability (Pop). This paper will take every falsifiable statement about future AI to be a prediction.
For the present analysis, predictions about AI will be divided into four types:
- Timelines and outcome predictions. These are the traditional types of predictions, giving the dates of specific AI milestones. Examples: An AI will pass the Turing test by 2000 (Tur50); Within a decade, AIs will be replacing scientists and other thinking professions (Hal11).
- Scenarios. These are a type of conditional predictions, claiming that if the conditions of the scenario are met, then certain types of outcomes will follow. Example: If someone builds a human-level AI that is easy to copy and cheap to run, this will cause mass unemployment among ordinary humans (Han94).
- Plans. These are a specific type of conditional prediction, claiming that if someone decides to implement a specific plan, then they will be successful in achieving a particular goal. Example: AI can be built by scanning a human brain and simulating the scan on a computer (San08).
- Issues and metastatements. This category covers relevant problems with (some or all) approaches to AI (including sheer impossibility results), and metastatements about the whole field. Examples: an AI cannot be built without a fundamental new understanding of epistemology (Deu12); Generic AIs will have certain (potentially dangerous) behaviours (Omo08).
There will inevitably be some overlap between the categories, but the division is natural enough for this paper.
Just as there are many types of predictions, there are many ways of arriving at them - crystal balls, consulting experts, constructing elaborate models. An initial review of various AI predictions throughout the literature suggests the following loose schema for prediction methods (as with any such schema, the purpose is to bring clarity to the analysis, not to force every prediction into a particular box, so it should not be seen as the definitive decomposition of prediction methods):
- Causal models
- Non-causal models
- The outside view
- Philosophical arguments
- Expert judgement
- Non-expert judgement
Causal model are a staple of physics and the harder sciences: given certain facts about the situation under consideration (momentum, energy, charge, etc.) a conclusion is reached about what the ultimate state will be. If the facts were different, the end situation would be different.
Outside of the hard sciences, however, causal models are often a luxury, as the underlying causes are not well understood. Some success can be achieved with non-causal models: without understanding what influences what, one can extrapolate trends into the future. Moore's law is a highly successful non-causal model (Moo65).
In the the outside view, specific examples are grouped together and claimed to be examples of the same underlying trend. This trend is used to give further predictions. For instance, one could notice the many analogues of Moore's law across the spectrum of computing (e.g. in numbers of transistors, size of hard drives, network capacity, pixels per dollar), note that AI is in the same category, and hence argue that AI development must follow a similarly exponential curve (Kur99). Note that the use of the outside view is often implicit rather than explicit: rarely is it justified why these examples are grouped together, beyond general plausibility or similarity arguments. Hence detecting uses of the outside view will be part of the task of revealing hidden assumptions. There is evidence that the use of the outside view provides improved prediction accuracy, at least in some domains (KL93).
Philosophical arguments are common in the field of AI. Some are simple impossibility statements: AI is decreed to be impossible, using arguments of varying plausibility. More thoughtful philosophical arguments highlight problems that need to be resolved in order to achieve AI, interesting approaches for doing so, and potential issues that might emerge if AIs were to built.
Many of the predictions made by AI experts aren't logically complete: not every premise is unarguable, not every deduction is fully rigorous. In many cases, the argument relies on the expert's judgement to bridge these gaps. This doesn't mean that the prediction is unreliable: in a field as challenging as AI, judgement, honed by years of related work, may be the best tool available. Non-experts cannot easily develop a good feel for the field and its subtleties, so should not confidently reject expert judgement out of hand. Relying on expert judgement has its pitfalls, however.
Finally, some predictions rely on the judgement of non-experts, or of experts making claims outside their domain of expertise. Prominent journalists, authors, CEOs, historians, physicists and mathematicians will generally be no more accurate than anyone else when talking about AI, no matter how stellar they are in their own field (Kah11).
Predictions often use a combination of these methods, as will be seen in the various case studies - expert judgement, for instance, is a common feature in all of them.
In the beginning, Dartmouth created the AI and the hype...
- Classification: plan, using expert judgement and the outside view.
Hindsight bias is very strong and misleading (Fis75). Humans are often convinced that past events couldn't have unfolded differently than how they did, and that the people at the time should have realised this. Even worse, people unconsciously edit their own memories so that they misremember themselves as being right even when they got their past predictions wrong (one of the reasons that it is important to pay attention only to the actual prediction as written at the time, and not to the author's subsequent justifications or clarifications). Hence when assessing past predictions, one must cast aside all knowledge of subsequent events, and try to assess the claims given the knowledge available at the time. This is an invaluable exercise to undertake before turning attention to predictions whose timelines have not come to pass.
The 1956 Dartmouth Summer Research Project on Artificial Intelligence was a major conference, credited with introducing the term ''Artificial Intelligence'' and starting the research in many of its different subfields. The conference proposal, written in 1955, sets out what the organisers thought could be achieved. Its first paragraph reads:
''We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.''
This can be classified as a plan. Its main backing would have been expert judgement. The conference organisers were John McCarthy (a mathematician with experience in the mathematical nature of the thought process), Marvin Minsky (Harvard Junior Fellow in Mathematics and Neurology, and prolific user of neural nets), Nathaniel Rochester (Manager of Information Research, IBM, designer of the IBM 701, the first general purpose, mass-produced computer, and designer of the first symbolic assembler) and Claude Shannon (the ''father of information theory''). These were individuals who had been involved in a lot of related theoretical and practical work, some of whom had built functioning computers or programing languages - so one can expect them all to have had direct feedback about what was and wasn't doable in computing. If anyone could be considered experts in AI, in a field dedicated to an as yet non-existent machine, then they could. What implicit and explicit assumptions could they have used to predict that AI would be easy?
Reading the full proposal doesn't give the impression of excessive optimism or overconfidence. The very first paragraph hints at the rigour of their ambitions - they realised that precisely describing the features of intelligence is a major step in simulating it. Their research plan is well decomposed, and different aspects of the problem of artificial intelligence are touched upon. The authors are well aware of the inefficiency of exhaustive search methods, of the differences between informal and formal languages, and of the need for encoding creativity. They talk about the need to design machines that can work with unreliable components, and that can cope with randomness and small errors in a robust way. They propose some simple models of some of these challenges (such as forming abstractions, or dealing with more complex environments), point to some previous successful work that has been done before, and outline how further improvements can be made.
Reading through, the implicit reasons for their confidence seem to become apparent (as with any exercise in trying to identify implicit assumptions, this process is somewhat subjective. It is not meant to suggest that the authors were thinking along these lines, merely to point out factors that could explain their confidence - factors, moreover, that could have lead dispassionate analytical observers to agree with them). These were experts, some of whom had been working with computers from early days, who had a long track record of taking complex problems, creating simple (and then more complicated) models to deal with them. These models they used to generate useful insights or functioning machines. So this was an implicit use of the outside view - they were used to solving certain problems, these looked like the problems they could solve, hence they assumed they could solve them. To modern eyes, informal languages are hugely complicated, but this may not have been obvious at the time. Computers were doing tasks, such as complicated mathematical manipulations, that were considered high-skill, something only impressive humans had been capable of. Moravec's paradox had not yet been realised (this is the principle that high-level reasoning requires very little computation, but low-level sensorimotor skills require enormous computational resources - sometimes informally expressed as ''everything easy [for a human] is hard [for a computer], everything hard is easy''). The human intuition about the relative difficulty of tasks was taken as accurate: there was no reason to suspect that parsing English was much harder than the impressive feats computer could already perform. Moreover, great progress had been made in logic, in semantics, in information theory, giving new understanding to old concepts: there was no reason to suspect that further progress wouldn't be both forthcoming and dramatic.
Even at the time, though, one could criticise their overconfidence. Philosophers, for one, had a long track record of pointing out the complexities and subtleties of the human mind. It might have seemed plausible in 1955 that further progress in logic and information theory would end up solving all these problems - but it could have been equally plausible to suppose that the success of formal models had been on low-hanging fruit, and that further progress would become much harder. Furthermore, the computers at the time were much simpler than the human brain (e.g. the IBM 701, with 73728 bits of memory), so any assumption that AIs could be built was also an assumption that most of the human brain's processing was wasted. This implicit assumption was not obviously wrong, but neither was it obviously right.
Hence the whole conference project would have seemed ideal, had it merely added more humility and qualifiers in the text, expressing uncertainty as to whether a particular aspect of the program might turn out to be hard or easy. After all, in 1955, there were no solid grounds for arguing that such tasks were unfeasible for a computer.
Nowadays, it is obvious that the paper's predictions were very wrong. All the tasks mentioned were much harder to accomplish than they claimed at the time, and haven't been successfully completed even today. Rarely have such plausible predictions turned out to be so wrong; so what can be learned from this?
The most general lesson is perhaps on the complexity of language and the danger of using human-understandable informal concepts in the field of AI. The Dartmouth group seemed convinced that because they informally understood certain concepts and could begin to capture some of this understanding in a formal model, then it must be possible to capture all this understanding in a formal model. In this, they were wrong. Similarities of features do not make the models similar to reality, and using human terms - such as 'culture' and 'informal' - in these model concealed huge complexity and gave an illusion of understanding. Today's AI developers have a much better understanding of how complex cognition can be, and have realised that programming simple-seeming concepts into computers can be very difficult. So the main lesson to draw is that reasoning about AI using human concepts (or anthropomorphising the AIs by projecting human features onto it) is a very poor guide to the nature of the problem and the time and effort required to solve it.
- [Arm] Stuart Armstrong. General purpose intelligence: arguing the orthogonality thesis. In preparation.
- [ASB12] Stuart Armstrong, Anders Sandberg, and Nick Bostrom. Thinking inside the box: Controlling and using an oracle ai. Minds and Machines, 22:299-324, 2012.
- [BBJ+03] S. Bleich, B. Bandelow, K. Javaheripour, A. Muller, D. Degner, J. Wilhelm, U. Havemann-Reinecke, W. Sperling, E. Ruther, and J. Kornhuber. Hyperhomocysteinemia as a new risk factor for brain shrinkage in patients with alcoholism. Neuroscience Letters, 335:179-182, 2003.
- [Bos13] Nick Bostrom. The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. forthcoming in Minds and Machines, 2013.
- [Cre93] Daniel Crevier. AI: The Tumultuous Search for Artificial Intelligence. NY: BasicBooks, New York, 1993.
- [Den91] Daniel Dennett. Consciousness Explained. Little, Brown and Co., 1991.
- [Deu12] D. Deutsch. The very laws of physics imply that artificial intelligence must be possible. what's holding us up? Aeon, 2012.
- [Dre65] Hubert Dreyfus. Alchemy and ai. RAND Corporation, 1965.
- [eli66] Eliza-a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9:36-45, 1966.
- [Fis75] Baruch Fischho. Hindsight is not equal to foresight: The effect of outcome knowledge on judgment under uncertainty. Journal of Experimental Psychology: Human Perception and Performance, 1:288-299, 1975.
- [Gui11] Erico Guizzo. IBM's Watson jeopardy computer shuts down humans in final game. IEEE Spectrum, 17, 2011.
- [Hal11] J. Hall. Further reflections on the timescale of ai. In Solomonoff 85th Memorial Conference, 2011.
- [Han94] R. Hanson. What if uploads come first: The crack of a future dawn. Extropy, 6(2), 1994.
- [Har01] S. Harnad. What's wrong and right about Searle's Chinese room argument? In M. Bishop and J. Preston, editors, Essays on Searle's Chinese Room Argument. Oxford University Press, 2001.
- [Hau85] John Haugeland. Artificial Intelligence: The Very Idea. MIT Press, Cambridge, Mass., 1985.
- [Hof62] Richard Hofstadter. Anti-intellectualism in American Life. 1962.
- [Kah11] D. Kahneman. Thinking, Fast and Slow. Farra, Straus and Giroux, 2011.
- [KL93] Daniel Kahneman and Dan Lovallo. Timid choices and bold forecasts: A cognitive perspective on risk taking. Management science, 39:17-31, 1993.
- [Kur99] R. Kurzweil. The Age of Spiritual Machines: When Computers Exceed Human Intelligence. Viking Adult, 1999.
- [McC79] J. McCarthy. Ascribing mental qualities to machines. In M. Ringle, editor, Philosophical Perspectives in Artificial Intelligence. Harvester Press, 1979.
- [McC04] Pamela McCorduck. Machines Who Think. A. K. Peters, Ltd., Natick, MA, 2004.
- [Min84] Marvin Minsky. Afterword to Vernor Vinges novel, "True names." Unpublished manuscript. 1984.
- [Moo65] G. Moore. Cramming more components onto integrated circuits. Electronics, 38(8), 1965.
- [Omo08] Stephen M. Omohundro. The basic ai drives. Frontiers in Artificial Intelligence and applications, 171:483-492, 2008.
- [Pop] Karl Popper. The Logic of Scientific Discovery. Mohr Siebeck.
- [Rey86] G. Rey. What's really going on in Searle's Chinese room". Philosophical Studies, 50:169-185, 1986.
- [Riv12] William Halse Rivers. The disappearance of useful arts. Helsingfors, 1912.
- [San08] A. Sandberg. Whole brain emulations: a roadmap. Future of Humanity Institute Technical Report, 2008-3, 2008.
- [Sea80] J. Searle. Minds, brains and programs. Behavioral and Brain Sciences, 3(3):417-457, 1980.
- [Sea90] John Searle. Is the brain's mind a computer program? Scientific American, 262:26-31, 1990.
- [Sim55] H.A. Simon. A behavioral model of rational choice. The quarterly journal of economics, 69:99-118, 1955.
- [Tur50] A. Turing. Computing machinery and intelligence. Mind, 59:433-460, 1950.
- [vNM44] John von Neumann and Oskar Morgenstern. Theory of Games and Economic Behavior. Princeton, NJ, Princeton University Press, 1944.
- [Wal05] Chip Walter. Kryder's law. Scientific American, 293:32-33, 2005.
- [Win71] Terry Winograd. Procedures as a representation for data in a computer program for understanding natural language. MIT AI Technical Report, 235, 1971.
- [Yam12] Roman V. Yampolskiy. Leakproofing the singularity: artificial intelligence confinement problem. Journal of Consciousness Studies, 19:194-214, 2012.
- [Yud08] Eliezer Yudkowsky. Artificial intelligence as a positive and negative factor in global risk. In Nick Bostrom and Milan M. Ćirković, editors, Global catastrophic risks, pages 308-345, New York, 2008. Oxford University Press.
Just wanted to note that the case under consideration was quite unlike your garden-variety predictions, but more like an industrial project estimate. It gave quite concrete scope, effort and schedule estimates, something rarely done by long-term forecasters. So the relevant reference class would be ambitious scientific and industrial projects. One example is the failed Fifth Generation Computer Systems project. Another was the Manhattan project (not sure if there are circa 1941 estimates evailable for it). Not even the Moon landing project was in the same reference class, because it was clearly "just" an issue of (extremely massive) scaling up, rather than solving unknown problems. Not sure about the controlled nuclear fusion, maybe there are some predictions from 1950s available. Consider looking into Grand Challenges for further examples of the same reference class. Some other projects which come to mind: Feynman gives an account of Wheeler promising to present quantized gravity for the next seminar (because EM was just successfully quantized and gravity is just another field theory, with a spin-2 carrier instead of a spin-1 one).
By the way, there are quite specific reasons one should have, in retrospect, expected a very low probability of the summer AI project to succeed, common to most estimates massaging the data to fit the deadlines and the available budget, rather than working upwards from the scope and risk analysis.
— Tyler Hamilton, Mad Like Tesla, p. 32
Thanks! I like that reference class.
I think this is a great lesson to draw. I think another lesson is that Dartmouth folks either haven't noticed or thought they could get around the fact that much of what they are trying to do is covered by statistics, and statistics is difficult. In fact, there turned out to be no royal road for learning from data.
Here's my attempt to translate these lessons for folks who worry about foom:
(a) Taboo informal discussions of powerful AI and/or implications of such. If you can't discuss it in math terms, it's probably not worth discussing.
(b) Pay attention to where related fields are stuck. If e.g. coordination problems are hard, or getting optimization processes (corps, governments, etc.) to do what we want is hard, this is food for thought as far as getting a constructed optimization process to do what we want.
I'd add "initial progress in a field does not give a good baseline for estimating ultimate success".
I'm not sure how this follows from the previous lesson. Analysing the impact of a new technology seems mostly distinct from the research needed to develop it.
For example, suppose somebody looked at progress in chemistry and declared that soon the dreams of alchemy will be realized and we'd be able to easily synthesize any element we wanted out of any other. I'd call this a similar error to the one made by the Dartmouth group, but I don't think it then follows that we can't discuss what the impacts would be of being able to easily synthesize any element out of any other.
It might be good advice nonetheless, but I don't think it follows from the lesson.
This isn't emphasized enough. This difference is many orders of magnitude large. As the first example that springs to mind, Moravec estimated in 1997 a capacity of 100 million MIPS for simulating human behavior (that is, 1e14 instructions per second). (A brain simulation on a biological or physical level would probably take much more.) Wikipedia lists the IPS ratings of many chips; a top of the line Intel CPU from 2011 achieves 177 thousand MIPS = 1.77e11.
The IBM 704 computer released in 1954 achieved 4000 instructions per second. (That's 4e3, no millions involved.) The IBM 701 mentioned in the post was slower (WIkipedia doesn't specify how much). Furthermore, there was no reason in 1956 to anticipate the high speed of Moore's Law.
Given this huge difference in scale, they can't have merely assumed the brain wastes most of its capacity; they would have had to assume the brain wastes all but a negligibly tiny fraction of its capacity. Do you know what they thought? Did they explicitly discuss this question?
Biology was pre-genetics, but they would have known that the brain must be doing something useful. A big brain is metabolically expensive. A big head makes childbirth dangerous. Humans have recently evolved much bigger brains than any other primate (even without correcting for body size), at the same time as they evolved intelligence and culture.
My guesses as to what they may have thought:
Maybe they assumed that each macroscopic region of the brain was essentially made of few simple neural circuits replicated over and over again to provide signal strength, much like a muscle is made of a few types of muscle fibers replicated over and over again.
Just like you don't need hundreds billions hydraulic cylinders to replicate the functionality of a muscle, they may have thought that you didn't need hundreds billions processing components to replicate the functionality of the brain.
Was this a reasonable hypothesis? I don't know if a neuroscientist of the time would have agreed, but it seems to me that it may not have been too far fetched for the Dartmouth Conference people.
I suppose that with the observation techniques of the time, the brain looked quite homogeneous below the level of macroscopic regions. The Dartmouth also lacked the theoretical insight about complex pattern of connectivity.
Moreover, computers of the time equaled or vastly surpassed humans at many tasks that were previously thought to require great intelligence, such as numerical computation.
I think you're making a jump here, or at least making an insinuation about 'waste' which is unjustified: Moravec's estimate in 1997 is not a good indicator of how much computing power they estimated the human brain equated to back in the 1950s: an estimate in 1997, besides benefiting from decades and countless breakthroughs in neuroscience, is one made with the benefit of hindsight and Moravec's own paradox commenting on the Dartmouth-style failure. Given the general reason for the optimism - the striking success of the small early efforts in logic, chess-playing etc - I would personally expect them to have assigned very small estimates of the brain's computing power since that's what they had observed so far. A case in point: IIRC, Alan Turing seems to have been an extreme pessimist in that he put off the development of human-level AI to ~2000 - because that's when he calculated gigabyte-sized memories would become available!
They probably just thought it takes that much brain matter to calculate the answer non-digitally, and brains didn't have a choice in substrate or approach: it was neurons or nothing.
So you're saying they didn't expect to learn from how human brain did things, or to emulate it. They just thought human-equivalent AI was inherently a simple problem, both algorithmically and in terms of the digital processing power needed.
I wonder if part of the reason they thought this was that not only AI was a very young field, but so was all of modern computer science. It had been developing very quickly because everyone had been working on very low-hanging fruit that hadn't been interesting even fifteen years before, because no computers had existed before then.
So all their salient examples were of quick progress in new fields. When confronted with a completely new research field, they assigned much higher priors to making rapid progress than we would in 2013, regardless of what that field was. That seems reasonable - AI happened to be one of the least tractactable new fields, but they couldn't know that in advance. Looking back and saying they got their predictions amazingly wrong demonstrates some hindsight/selection bias.
That might be it. Recall Djikstra on computers thinking and submarines swimming, and how very few of our technologies are biomimetic; note that the perceptron algorithm - never mind multilayer or backpropagation neural nets - dates to 1957.
But suppose they did have neural nets, and someone asked them - "hey guys, maybe we're wrong that the brain is applying all these billions of neurons because neurons just aren't very good and you need billions if you're going to do anything remotely intelligent? If so, we're being wildly overoptimistic, since a quick calculation says that if neurons are as complex and powerful as they could be, then we won't get human-equivalent computers past, gosh, 2000 or worse! Let's take our best chess-playing alpha-beta-pruning LISP-1 code and see how it does against a trained perceptron with a few hundred nodes."
Now, I haven't actually written a chess-playing program or perceptrons or compared them head-to-head, but I'm guessing the comparison would end with the GOFAI program crushing the perceptron in both chess skill and resources used up, possibly by orders of magnitude in both direction (since even now, with perceptrons long obsolete and all sorts of fancy new tools like deep learning networks, neural networks are still rarely used in chess-playing).
"So, that was a reasonable hypothesis, sure, but it looks like it just doesn't pan out: we put the 'powerful neurons' to the test and they failed. And it's early days yet! We've barely scratched the surface of computer chess!"
Thanks, corrected now!