I'm currently working on a research project for MIRI, and I would welcome feedback on my research as I proceed. In this post, I describe the project.
As a part of an effort to steel-man objections to MIRI's mission, MIRI Executive Director Luke Muehlhauser has asked me to develop the following objection:
"Even if AI is somewhat likely to arrive during the latter half of this century, how on earth can we know what to do about it now, so far in advance?"
In Luke's initial email to me, he wrote:
I think there are plausibly many weak arguments and historical examples suggesting that P: "it's very hard to nudge specific distant events in a positive direction through highly targeted actions or policies undertaken today." Targeted actions might have no lasting effect, or they might completely miss their mark, or they might backfire.
If P is true, this would weigh against the view that a highly targeted intervention today (e.g. Yudkowsky's Friendly AI math research) is likely to positively affect the future creation of AI, and might instead weigh in favor of the view that all we can do about AGI from this distance is to engage in broad interventions likely to improve our odds of wisely handling future crises in general — e.g. improving decision-making institutions, spreading rationality, etc.
I'm interested in abstract arguments for P, but I'm even more interested in historical data. What can we learn from seemingly analogous cases, and are those cases analogous in the relevant ways? What sorts of counterfactual history can we do to clarify our picture?
Luke and I brainstormed a list of potential historical examples of people predicting the future 10+ years out, and using the predictions to inform their actions. We came up with the following potential examples, which I've listed in chronological order by approximate year:
- 1896: Svante Arrhenius's prediction of anthropogenic climate change.
- 1935: Leo Szilard's ~1935 attempts to keep his patent of the atomic bomb secret from Germany.
- 1950-1980: Efforts to win the Cold War decades later, such as increasing education for gifted children.
- 1960: Norbert Weiner highlighting the dangers of artificial intelligence.
- 1972: The circle of ideas and actions around the The Limits to Growth, a book about the consequences of unchecked population growth and economic growth.
- 1975: The WASH-1400 reactor safety study, which attempted to assess the risks associated with nuclear reactors.
- 1975: The Asilomar Conference on Recombinant DNA, which set up guidelines to ensure the safety of recombinant DNA technology.
- 1978: China's one-child policy to reduce population growth.
- 1980: The Ford Foundation setting up a policy think in India that helped India recover from its 1991 financial crisis
- 1988: Early climate change mitigation efforts
- 1992+: Asteroid strike deflection efforts
- ???: Possible deliberate long term efforts to produce revolutionary scientific technologies.
- ????: Long term computer security research
- The Signal and the Noise: Why So Many Predictions Fail — but Some Don't by Nate Silver
- Expert Political Judgment: How Good Is It? How Can We Know? by Philip Tetlock
Many of these are known because of their prescience or impact. So the ratio of success to failure in the set is almost meaningless, although systematic differences between the failures and successes could be interesting.
ETA: I don't mean to imply that you were going to count them as though they were independent of outcome, just to raise the fact that we can't.
Meaningless seems too strong – you seem to be assuming a very strong selection effect – what selection effect are you assuming?
In any case, we're not simply counting successes and failures.
We're looking at this.
First, this seems like an excellent issue to tackle, so I hope you get somewhere. This "fog of the future" objection is what stops me from taking MIRI more seriously. The obvious pattern matching of the UFAI with other apocalyptic scenarios does not help, either.
Second, when I ask myself "what argument/logic/experiment would convince me to take the AGI x-risk seriously enough to personally try to do something about it?", I come up with nothing. Well, maybe a broad consensus among the AI researchers based on some solid experimental data, similarly to the current situation with anthropogenic climate change.
Just to make an extra step toward MIRI, suppose it had a convincing argument that without the FAI research the odds of human extinction due to UFAI are at least 10% (with high confidence), and that the FAI research can reduce the odds to, say, 1% (again, with high confidence), then I would possibly reevaluate my attitude.
I don't see how any of the mentioned historical examples can do that. And definitely not any kind of counterfactual history scenarios, those have too low confidence to be taken seriously.
I don't think the hypothetical is true (by a large margin the expected impact is too big), but why only "possibly"? A high confidence intervention to avert a 9% risk of human extinction (for far less than 9% of world GDP) would be ludicrously good by normal standards.
Do you mean that "high confidence" is only conditional on the "convincing" argument, but "convincing" corresponds to relatively low confidence in the arguments itself? What is the hypothetical here?
"A large margin" which way?
I'd have to reevaluate the odds again, the confidence and my confidence in my confidence (probably no more meta than that) before actually changing my behavior based on that
compare with other potential x-risks prevention measures which can pop up at the same level of surprise when evaluated as thoroughly and at the same level
even if convinced that yes, AI indeed has a 10% or more chance of wiping out the human race as we know it AND would not replace it with something "better" in some sense of the word, AND that yes, MIRI can reduce this chance to mere 1%, AND no, other x-risk prevention efforts are not nearly as effective in improving the humans' odds of surviving (in some form) the next century or millennium, I would also have to convince myself whether donating to MIRI and/or advocating for it, and/or volunteering and/or doing pro bono research for it would be an effective strategy.
Not sure I follow the question... I am no Bayesian, to me the argument being convincing is a statement about the odds of the argument being true, while the confidence in the predicted outcomes depends on how narrow the distribution the argument produces is, provided it's true.
9% is far too high.
I see. I thought you were more in tune with Eliezer on this issue. I was simply trying to see what would make me take the MIRI research much more seriously. I am fascinated by the mathematical side of it, which is hopefully of high enough quality to attract expert attention, but I am currently much more skeptical of its effects on the odds of humanity surviving the next century or two.
I changed specifics to variables because I was interested more in the broader point than the specific case.
Asteroid tracking involved spending ~$100MM to eliminate most of the expected losses from civilization-wrecking asteroids. Generously, it might have eliminated as much as a 10^-6 extinction risk (if we had found a dinosaur-killer on course our civilization would have mobilized to divert it). At the same tradeoff, getting rid of a 9% extinction risk would seem to be worth $9T or more. Billions are spent on biodefense and nuclear nonproliferation programs each year.
So it seems to me that a 9% figure 'overshoots' the relevant thresholds in other areas: a much lower believed cost per increment of existential risk reduction would seem to suffice for more-than-adequate support (e.g. national governments, large foundations, and plenty of scientific talent would step in before that, based on experiences with nuclear weapons, climate change, cancer research, etc).
For comparison, consider someone who says that she will donate to malaria relief iff there is solidly convincing proof that at least 1000 cases of malaria affecting current people will be averted per dollar in the short-term. This is irrelevant in a world with a Gates Foundation, GiveWell, and so on: she will never get the chance as those with less stringent thresholds act.
I was trying to clarify whether you were using an extreme example to make the point in principle, or were saying that your threshold for action would actually be in that vicinity.
Is your position
or something else?
You and I might be on the same page here. How broadly are you defining "FAI research" ?
There are potentially promising interventions that are less targeted than the FAI research that MIRI is currently doing (e.g. lobbying for government regulations on AI research).
Can you clarify what sorts of counterfactual history scenarios you have in mind?
I don't have a well defended position. All I have is an estimate of confidence that my action or inaction would affect the hypothetical AGI x-risk in a known way. And that confidence is too low to be worth acting upon.
Any research included in such an argument, in any area. Really, anything that provides some certainty.
I have extremely low confidence that these interventions can affect the hypothetical AGI x-risk in the desired direction.
I can't imagine anything convincing. Similarly, I don't find an argument "if one of the Hitler assassination attempts were successful, would be avoided" compelling. Not to say that one should not have tried to assassinate him at the time, given the information available. But a valid reason to carry out such an assassination attempt would have to be something near-term and high-confidence, like reducing the odds of further poor military decisions or something.
This is close to my current position, but I would update if I learned that there's a non-negligible chance of AGI within the next 20 years.
This is the issue under investigation
What about policies to reduce hydrofluorocarbons emissions that would otherwise deplete the ozone layer?
Well, there is no need for any fancy counterfactual history there, the link was confirmed experimentally with high confidence.
Yes the Montreal Protocol, an extremely successful international treaty.
By the way, do I know you personally? Feel free to email me at firstname.lastname@example.org if you'd like to correspond.
I doubt it. And I don't think I have much to contribute to any genuine AGI/risk research.
Some possible sources of examples, many I haven't checked and many lacking precise details:
There are some mundane examples that fit your description but I don't think are what you are asking for. E.g., people working on buildings that take more than 10 years to complete, people buying something for a young daughter and saving it for when she gets married (I guess this used to be a thing), people getting mortgages, people saving for retirement, people studying anatomy when 16 in hopes of becoming a doctor, making a trust fund for your kid.
From Leverage Research's website:
Miller predicted the return of Jesus in 21 years, not a century. (Added: I don't mean to imply that this makes him uninteresting.)
I would be very interested to learn if anyone has ever predicted the supernatural end of the world on that time scale.
You mean, that it would end some time after they were already dead? Well, I remember speaking with a family of creationists who were moderately confident (no man may know the hour) that the ressurrection would end up being the midpoint of time, with the >1000 years of the Tribulation included in that figure, so we could expect that to begin in around 500 years.
Fixed. Thanks for catching it.
Thanks Nick. I didn't give enough detail as to what I'm looking for in my original post. My initial reactions
Yes, Luke and I talked about this — I forgot to list it.
I think that I'm looking for things involving more speculation. Also, we don't know what the impact of asteroid tracking effots will be.
Here there's a clear historical precedent that people are using to inform their decision making.
How many people?
I'll brood on this. Intuitively, it doesn't seem relevant, but I have trouble placing my finger on why.
I'm looking at this as a part of the Limits to Growth investigation.
Did they have any impact? Could they plausibly have?
These seem like they fall into the category "general scientific research that could have humanitarian value" rather than being driven by specific predictions and being aimed at influencing specific outcomes.
Here there's a clear historical precedent that people are using to inform their decision making.
I recall happening upon an ancient book on biology in a library, and glancing at what someone in the field thought about eugenics a hundred years ago. The author was as racist as one would expect for his time, but he considered eugenics an absolute waste of time on the basis that actually having a significant impact would require interventions on a vastly larger scale than anybody was really imagining; in practice, he thought it obvious that any deliberate selection effects would be miniscule and completely swamped by the ongoing effects of normal human mate selection practices. I don't know if his view was widespread, but it does seem to be true of eugenics on the scale it was usually discussed or attempted by anyone except the Nazis, and something that at least some people had figured out even before eugenics went out of favor for other reasons.
Interesting point, although it has the problem that we haven’t actually observed eugenics being useless because of the vast scale of intervention required. (Since the Nazis were stopped, and the other reasons for eugenics not working—e.g., having become a dirty word—could well explain its apparent uselessness even if the “vast scale” was not a problem.)
Your “as one would expect” comment reminded me of something obvious that doesn’t seem to be mentioned. One of the thing SF does is try to anticipate the future. I’ve read last year some rather old SF (early Harlan Ellison and the like), and the contrast between what “sounds weird” now and didn’t then, and vice-versa, suggests that we could “extract predictions” from such stories.
The example that came to mind was that the “weird future thing” in one of the stories (which, incidentally, I think was set around now) was that “in the future” only women were doing some cool job (I think piloting), almost never men, because women were better at it. The claim would sound normal now, but was probably a daring prediction for the time.
So, the trick for finding good predictions would be to read old SF, notice things that you wouldn’t notice around you, but that would be daringly avant-garde at the time. (For failed predictions, look for things that still sound weird.)
Although, of course, 90% of everything is mostly crap, including SF, even the good SF isn’t always applicable, and that “historical precedent that people are using to inform their decision making” part is harder. I heard (within the last decade or so) of some part of the US government explicitly asking for predictions from SF writers, but I doubt that would have happened much fifty years ago, much less actually taking the advice.
In Starship Troopers, women were the pilots-- iirc, because of better reflexes.
Not Heinlein's best moment, forecasting-wise. That males have better visuospatial skills and faster reaction times are stalwarts of the gender differences literature.
True, though... do you really think that was an actual forecast in any meaningful way? I mean, other than "the future will be different, and women are better than men at other things than housekeeping".
Well, even if it's not a forecast, it's still not a great example because anyone familiar with the facts (the reaction time literature and gender differences go back to the 1800s, for example) will dismiss it annoyedly ('no, that's not how it works. Also, explosions in space don't produce any sound!')
Still, flying was sufficiently new that most people wouldn’t probably be justified in reaching very high confidence about what abilities are needed and in what combination, especially for future aircraft, just by knowing all the literature existing up to then. (Also, if you live in a world where women almost never are trained and then work for years at some task X, it’s more or less impossible to compare (with statistical significance) how good experienced men and women are at X, because it would take years to obtain female candidates.)
Now that I think of it, that too. But I’m pretty sure it was something/someone else. Strange that two authors would use that particular speculation, in what I think were very different kinds of stories.
I couldn't find the answer on the Wikipedia page, but it sounds like it was a big deal. One data point is this (from the Wikipedia page):
Another is that Miller spawned the Seventh-Day Adventists who have approximately 17M members today.
They certainly thought they could have a big impact. There was no precedent of them having the impact they wanted. And it was a major movement. I have no sense of how feasible their strategies should have looked at the time.
I'm not a military historian (I'm not any kind of historian), but it strikes me that there are probably lots of examples of military planning which turned out to be for the wrong war or wrong technological environment. Like putting rams on ships in the late 19th Century:
On the other hand, Dwight Eisenhower said, "in preparing for battle I have always found that plans are useless, but planning is indispensable." So, it might make sense for MIRI to think about what interventions would be useful -- even if nothing they think of is directly useful.
It isn't a bad thing to make predictions, but the likelihood of success is small. Beware lists of successful predictions - that's like saying "how amazing! I flipped coins ten times, and all came up heads" after having chosen ten particular coin flips.
The value of thought exercises about - for example - friendly AI - is not that you'll be right on the mark; this rarely happens. The value is that you'll gain a bit of knowledge, as will thousands or millions of others via other predictions and experiments, and those many bits of knowledge will combine in interesting ways, some of them fruitful ten or twenty years in the future. You cannot test hypotheses which have not even been proposed.
FWIW, I'd expect any such list to overrepresent correct predictions relative to incorrect ones, since the correct ones will be associated with better-known issues.
1896, and Svante.
Some random examples:
Doris Lessing's "Report on the Threatened City" (I found it unreadable, so this is not a recommendation) points out that Californians live with the constant threat of a major earthquake. A big enough quake could kill millions, although a quake of that magnitude would be rather infrequent. In general, seismic, volcanic, and weather events are a matter of when, not whether, so perhaps this is not quite in the right reference class.
Albania spent billions of dollars on useless bunkers in case of an invasion.
Many countries now or in the past have banned human cloning. There are a number of justifications for this, but some of them center around speculative risks.
Can you give a reference for discussion of speculative risks?
Here, for example:
All of these are speculative.
Do you have a sense for the size of the threat that Y2K presented?
Building regulations may count, though I think that the historical precedent somewhat frequent large earthquakes in California makes the case disanalogous to the issue of AI risk, which involves an event that has never happened before.
Can you give a reference? Who did they anticipate potential invasion from?
Some competing cost estimates. I tend towards the "fix it when it fails" side of things, but that is a tendency not a rule.
And a related issue
Bunkers - invasion from the US or the USSR; cost was twice the Maginot Line, which Wikipedia elsewhere describes as 3 billion French Francs.
this is the conversion to 2012 Euros
Constantinople managed to keep the secret of Greek Fire.
Roman emperor Constantine the Great picked Constantinople as his new capital because it could be easily defended against barbarians. He did this although (I think) he faced no barbarian threat. This worked out very well.
Although Japan had been a very closed society, when the Japanese saw in 1853 how much more advanced the American navy was from its own the Japanese quickly modernized and became one of the few Asian countries to escape western colonization (until after WWII at least).
Silk was kept secret for centuries, although not forever.
I think (am not sure) that these examples are too old to have significant relevance to the question of planning for AI. Do you see a connection?
Maybe I should clarify that we're looking for examples of people pushing for more specific outcomes (preventing climate change, preventing overpopulation, etc.) than improving the economy in general.
How long ago something happened shouldn't be relevant if you are looking to see if our species is capable of implementing certain types of long-term plans.
Making Constantinople the capital of the Eastern Empire and building its defenses represents perhaps the most successful example in all of human history of someone nudging "specific distant events in a positive direction through highly targeted actions or policies."
I was under the impression that EY wants to keep some of what he discovered a secret. Greek Fire represents an historical example of successfully keeping a tech secret, despite that secret having enormous military value.
It seems to me that keeping secrets has gotten much harder since then. The US government, for example, seems to be having enormous difficulties keeping its diplomatic, intelligence, and technology (including military technology) secrets secret. Do you have a different impression?
I would have agreed with you until the very recent Snowden revelations. Snowden seems to have revealed secrets that a huge number of people had access to (over a million?), showing its possible for a vast number of people to keep secrets. I have a much higher probability that 1,000 or so people could keep a really good secret than I did before.
Snowden was a contract sysadmin for NSA. Surely there aren't anywhere near a million such people? Where are you getting that number from? Are you talking about the 4 million people having “top secret” security clearance? I'm pretty sure the vast majority of them did not have access to the particular secrets that Snowden is revealing, i.e., there are other controls besides the clearance level that prevented them from accessing those secrets.
If there really were a huge number of people who had access to the secrets, it seems likely that foreign intelligence agencies already knew them. Do you have a reason to think otherwise? (In other words, given that we don't know whether they really were kept secret rather than merely not publicly known, why are you updating towards secrets being more easily kept?)
I've heard the million number in the media but I'm not sure about it, hence the "?".
Yes, "knowledgeable people" are saying that Snowden has damaged U.S. security.
I think what they meant is that the typical terrorist did not know about the NSA programs, not that foreign intelligence agencies didn't know. (If they actually had good evidence that foreign intelligence agencies did not know, that would also reflect badly on people's abilities to keep secrets in general.)
Thanks, I'll investigate these things more.
Obligatory non-standard formatting grumble.
Do you know how I can fix it?
Solution: You can compose posts in Markdown, which is more readable than HTML. When done, use the Markdown Dingus to convert this into clean HTML, and paste this into LW's HTML editor. That's what I do.
Does this explanation help?
US support for the White Army against the Bolsheviks predates those measures. Still, I'd be hesitant to consider that prescient since it was a self-fulfilling prediction to some extent.
Was this based on an anticipation of things happening 10+ years later?
I don't know. They were anti-communist, so I guess it was an immediate impulse. At the same time, they also probably knew they wouldn't get along with a communist country in the future. Either way, I don't think Cold War actions in the 1950s count as prescient since the relationship had soured long ago.
How did Arrhenius's prediction inform his actions? Wikipedia says he believed global warming would be beneficial. Did he do something to bring it about?
ETA: sorry, didn't realize there was already a separate discussion about this. Also, on second reading I realize that you meant other people may have used his prediction to inform their actions.
I'm concerned that your earliest example is in 1896. People have been thinking about the future for more than 100 years. Here you indicate that you believe early examples are less relevant than recent examples. I disagree. Looking a single timeframe reduces the robustness of your investigation, and I believe that greater familiarity with the recent past often biases people to underestimate the relevance of the far past.
Your arguments would be much more convincing if you showed results from actual code. In engineering fields, including control theory and computer science, papers that contain mathematical arguments but no test data are much more likely to have errors than papers that include test data, and most highly-cited papers include test data. In less polite language, you appear to be doing philosophy instead of science (science requires experimental data, while philosophy does not).
I imagine you have not actually written code because it seems too hard to do anything useful -- after 50 years of Moore's law, computers will execute roughly 30 million times as many operations per unit time as present-day computers. That is, a 2063 computer will do in 1 second what my 2013 computer can do in 1 year. You can close some of this gap by using time on a high-powered computing cluster and running for longer times. At minimum, I would like to see you try to test your theories by examining the actual performance of real-world computer systems, such as search engines, as they perform tasks analogous to making high-level ethical decisions.
Your examples about predicting the future are only useful if you can identify trends by also considering past predictions that turned out to be inaccurate. The most exciting predictions about the future tend to be wrong, and the biggest advances tend to be unexpected.
I agree that this seems like an important area of research, though I can't confidently speculate about when human-level general AI will appear. As far as background reading, I enjoyed Marshall Brain's "Robotic Nation", an easy-to-read story intended to popularize the societal changes that expert systems will cause. I share his vision of a world where the increased productivity is used to deliver a very high minimum standard of living to everyone.
It appears that as technology improves, human lives become better and safer. I expect this trend to continue. I am not convinced that AI is fundamentally different -- in current societies, individuals with greatly differing intellectual capabilities and conflicting goals already coexist, and liberal democracy seems to work well for maintaining order and allowing incremental progress. If current trends continue, I would expect competing AIs to become unimaginably wealthy, while non-enhanced humans enjoy increasing welfare benefits. The failure mode I am most concerned about is a unified government turning evil (in other words, evolution stopping because the entire population becomes one unchanging organism), but it appears that this risk is minimized by existing antitrust laws (which provide a political barrier to a unified government) and by the high likelihood of space colonization occurring before superhuman AI appears (which provides a spatial barrier to a unified government).
What would you want this code to do? What code (short of a full-functioning AGI) would be at all useful here?
Can you expand on this, possibly with example tasks, because I'm not sure what you are requesting here.
This is a trenchant critique, but it ultimately isn't that strong: having trouble predicting should be a reason to if anything be more worried rather than less.
This is missing the primary concern of people at MIRI and elsewhere. The concern isn't anything like gradually more and more competing AI coming online that are slightly smarter than baseline humans. The concern is that the first true AGI will self-modify itself to become far smarter and more capable of controlling the environment around it than anything else. In that scenario, issues like anti-trust or economics aren't relevant. It is true that on balance human lives have become better and safer, but that isn't by itself a strong reason to think that trend will continue, especially when considering hypothetical threats such the AGI threat whose actions are fundamentally discontinuous to prior human trends for standards of living.
Thanks for the thoughtful reply!
Possible experiments could include:
Simulate Prisoner's Dilemma agents that can run each others' code. Add features to the competition (e.g. group identification, resource gathering, paying a cost to improve intelligence) to better model a mix of humans and AIs in a society. Try to simulate what happens when some agents gain much more processing power than others, and what conditions make this a winning strategy. If possible, match results to real-world examples (e.g. competition between people with different education backgrounds). Based on these results, make a prediction of the returns to increasing intelligence for AIs.
Create an algorithm for a person to follow recommendations from information systems -- in other words, write a flowchart that would guide a person's daily life, including steps for looking up new information on the Internet and adding to the flowchart. Try using it. Compare the effectiveness of this approach with a similar approach using information systems from 10 years ago, and from 100 years ago (e.g. books). Based on these results, make a prediction for how quickly machine intelligence will become more powerful over time.
Identify currently-used measures of machine intelligence, including tests normally used to measure humans. Use Moore's Law and other data to predict the rate of intelligence increase using these measures. Make a prediction for how machine intelligence changes with time.
Write an expert system for making philosophical statements about itself.
In general, when presenting a new method or applied theory, it is good practice to provide the most convincing data possible -- ideally experimental data or at least simulation data of a simple application.
You're right -- I am worried about the future, and I want to make accurate predictions, but it's a hard problem, which is no excuse. I hope you succeed in predicting the future. I assume your goal is to make a general prediction theory to accurately assign probabilities to future events, e.g. an totalitarian AI appearing. I'm trying to say that your theory will need to accurately model past false predictions as well as past true predictions.
I agree that is a possible outcome. I expect multiple AIs with comparable strength to appear at the same time, because I imagine the power of an AI depends primarily on its technology level and its access to resources. I expect multiple AIs (or a mix of AIs and humans) will cooperate to prevent one agent from obtaining a monopoly and destroying all others, as human societies have often done (especially recently, but not always). I also expect AIs will stay at the same technology level because it's much easier to steal a technology than to initially discover it.
Make it scientific articles instead. Thus MIRI will get more publications. :D
You can also make different expect systems compete with each other by trying to get most publications and citations.
That sounds exciting too. I don't know enough about this field to get into a debate about whether to save the metaphorical whales or the metaphorical pandas first. Both approaches are complicated. I am glad the MIRI exists, and I wish the researchers good luck.
My main point re: "steel-manning" the MIRI mission is that you need to make testable predictions and then test them or else you're just doing philosophy and/or politics.
I suspect that either would be of sufficient interest that if well done it could get published. Also, there's a danger in going down research avenues simply because they are more publishable.
So instead o f paper clip maximizers we end up with a world turned into researchpapertronium?
(This last bit is a joke- I think your basic idea is sound.)