I'm putting together a list of short and sweet introductions to the dangers of artificial superintelligence.
My target audience is intelligent, broadly philosophical narrative thinkers, who can evaluate arguments well but who don't know a lot of the relevant background or jargon.
My method is to construct a Sequence mix tape — a collection of short and enlightening texts, meant to be read in a specified order. I've chosen them for their persuasive and pedagogical punchiness, and for their flow in the list. I'll also (separately) list somewhat longer or less essential follow-up texts below that are still meant to be accessible to astute visitors and laypeople.
The first half focuses on intelligence, answering 'What is Artificial General Intelligence (AGI)?'. The second half focuses on friendliness, answering 'How can we make AGI safe, and why does it matter?'. Since the topics of some posts aren't obvious from their titles, I've summarized them using questions they address.
Part I. Building intelligence.
1. Power of Intelligence. Why is intelligence important?
2. Ghosts in the Machine. Is building an intelligence from scratch like talking to a person?
3. Artificial Addition. What can we conclude about the nature of intelligence from the fact that we don't yet understand it?
4. Adaptation-Executers, not Fitness-Maximizers. How do human goals relate to the 'goals' of evolution?
5. The Blue-Minimizing Robot. What are the shortcomings of thinking of things as 'agents', 'intelligences', or 'optimizers' with defined values/goals/preferences?
Part II. Intelligence explosion.
6. Optimization and the Singularity. What is optimization? As optimization processes, how do evolution, humans, and self-modifying AGI differ?
7. Efficient Cross-Domain Optimization. What is intelligence?
8. The Design Space of Minds-In-General. What else is universally true of intelligences?
9. Plenty of Room Above Us. Why should we expect self-improving AGI to quickly become superintelligent?
Part III. AI risk.
10. The True Prisoner's Dilemma. What kind of jerk would Defect even knowing the other side Cooperated?
11. Basic AI drives. Why are AGIs dangerous even when they're indifferent to us?
12. Anthropomorphic Optimism. Why do we think things we hope happen are likelier?
13. The Hidden Complexity of Wishes. How hard is it to directly program an alien intelligence to enact my values?
14. Magical Categories. How hard is it to program an alien intelligence to reconstruct my values from observed patterns?
15. The AI Problem, with Solutions. How hard is it to give AGI predictable values of any sort? More generally, why does AGI risk matter so much?
Part IV. Ends.
16. Could Anything Be Right? What do we mean by 'good', or 'valuable', or 'moral'?
17. Morality as Fixed Computation. Is it enough to have an AGI improve the fit between my preferences and the world?
18. Serious Stories. What would a true utopia be like?
19. Value is Fragile. If we just sit back and let the universe do its thing, will it still produce value? If we don't take charge of our future, won't it still turn out interesting and beautiful on some deeper level?
20. The Gift We Give To Tomorrow. In explaining value, are we explaining it away? Are we making our goals less important?
Summary: Five theses, two lemmas, and a couple of strategic implications.
All of the above were written by Eliezer Yudkowsky, with the exception of The Blue-Minimizing Robot (by Yvain), Plenty of Room Above Us and The AI Problem (by Luke Muehlhauser), and Basic AI Drives (a wiki collaboration). Seeking a powerful conclusion, I ended up making a compromise between Eliezer's original The Gift We Give To Tomorrow and Raymond Arnold's Solstice Ritual Book version. It's on the wiki, so you can further improve it with edits.
- Three Worlds Collide (Normal), by Eliezer Yudkowsky
- a short story vividly illustrating how alien values can evolve.
- So You Want to Save the World, by Luke Muehlhauser
- an introduction to the open problems in Friendly Artificial Intelligence.
- Intelligence Explosion FAQ, by Luke Muehlhauser
- a broad overview of likely misconceptions about AI risk.
- The Singularity: A Philosophical Analysis, by David Chalmers
- a detailed but non-technical argument for expecting intelligence explosion, with an assessment of the moral significance of synthetic human and non-human intelligence.
I'm posting this to get more feedback for improving it, to isolate topics for which we don't yet have high-quality, non-technical stand-alone introductions, and to reintroduce LessWrongers to exceptionally useful posts I haven't seen sufficiently discussed, linked, or upvoted. I'd especially like feedback on how the list I provided flows as a unit, and what inferential gaps it fails to address. My goals are:
A. Via lucid and anti-anthropomorphic vignettes, to explain AGI in a way that encourages clear thought.
B. Via the Five Theses, to demonstrate the importance of Friendly AI research.
C. Via down-to-earth meta-ethics, humanistic poetry, and pragmatic strategizing, to combat any nihilisms, relativisms, and defeatisms that might be triggered by recognizing the possibility (or probability) of Unfriendly AI.
D. Via an accessible, substantive, entertaining presentation, to introduce the raison d'être of LessWrong to sophisticated newcomers in a way that encourages further engagement with LessWrong's community and/or content.
What do you think? What would you add, remove, or alter?
Other questions I'd like to see better answered in clear, compact, intuition-pumping articles like the above:
a. Is enough work being done on FAI? Are AI researchers in general too dismissive or blasé about safety concerns?
b. Why should we expect hard takeoff? More generally, why can't we wait until AGI is clearly about to be invented before working on safety?
c. Why is recursive self-improvement such an important threshold? How can we be confident humans have drastically suboptimal intelligence, and that massively superhuman intelligence optimization is possible without a massively superhuman initial hardware investment?
d. What's the most plausible scenario for how an intelligence explosion will initially go? Why?
e. What are the biggest open problems for FAI research?
f. For present purposes, what is a preference? Why do we bracket the possibility of superintelligent irrationality? Why do AGIs tend toward temporally stable preferences?
g. Why is AGI 'hungry', i.e., desirous of unlimited resources? Why do we expect FAI and UFAI to have galaxy- or universe-wide effects?
h. What's a reasonable (as non-squick goes) fun interstellar growth scenario? Why is the future so important?
i. What is the Prisoner's Dilemma, and why is it relevant to FAI? How does it generalize to non-sentient prisoners?
j. What would a meta-ethical discourse between Clippy and a human look like?
k. Why is it hard to give AGI known or predictable goals?
l. What makes value fragile and complex? What are some plausible horror stories if we get things slightly wrong?
I've seen most of these addressed, but awkwardly, abstractly, or in the context of much longer texts. If you know of especially good stand-alone posts covering these issues, or would like to collaborate on making or synthesizing one, let me know!
(Feel free to suggest borderline options. I've probably missed some great candidates, and they can always serve as templates or raw materials for future posts.)
I'd also add the question of "When can we expect GAI?" Some people I've talked to about this issue don't think it's possible to get GAI this century.
I agree that should be on the list. It's a hard question to answer without lots of time and technical detail, though, which is part of why I went with making the problem seem more vivid and immediate by indirect means. Short of internalizing Cognitive Biases Affecting Judgment of Global Risks or a lot of hard sci-fi, I'm not sure there's any good way to short-circuit people's intuitions that FAI doesn't feel like an imminent risk.
'We really don't know, but it wouldn't be a huge surprise if it happened this century, and it would be surprising if it doesn't happen in the next 300 years' is I think a solid mainstream position. For the purpose of the Core List it might be better handled as a quick 2-3 paragraph overview in a longer (say, 3-page) article answering 'Why is FAI fiercely urgent?'; Luke's When Will AI Be Created? is, I think, a good choice for the Further Reading section.
I'd caution about conflating FOOM with AGI risks, or even with hungry AGI. Hard takeoff is the most obviously risky situation from an AGI safety perspective because it would be impossible to react to an unfriendly AGI without possible world-ending consequences, and is very compelling in story-telling, but a soft takeoff situation is not particularly likely to be safe just because someone /could/ turn off its supplies if no one would be able to tell that such a concern is necessary. An AI that's only several dozen times smarter than humans probably won't crack the protein folding problem, but it's probably smart enough to navigate corporate red tape or manipulate a team of researchers.
Comparative and Absolute Advantage in AI demonstrates the simple lies-to-children version rather easily. The strong case would be the Riemann Hypothesis Catastrophe, but I've not seen any particularly good writeups of that even as it seems to be local jargon.
Just Another Day In Utopia is a little overly gamified and surreal, but generally pleasant. While intended as subtle horror, Iceman's Friendship Is Optimal is about the right mix of 'almost right' that it almost seems like the ponies are added just to explain why someone might resist the Singularity -- it's only on deeper analysis that the value-of-your-values metric falls apart. The full text is too long for this sort of work, but excerpts from the work could be informative enough
Mu? Or moo, depending on severity.
Lost Purposes is a little shallow and not terribly well-researched, but it's a compelling example of subgoal stomp and value drift that's likely to be particularly applicable in an educational environment. And, of course, Magical Categories, which you've already linked, is rather illustrative on its own.
The first few paragraphs of Interpersonal Entanglement are very illuminating.
Which things I said made you worry I was conflating these? If hard and soft takeoff were equally likely I'd still think we should put most of our energy into worrying about hard takeoff; but hard seems significantly likelier to me.
Do you think non-hungry AGIs are likely? And if we build one, do you think it's likely to be dangerous? When I imagine a non-hungry AGI, I imagine a simple behavior-executer like the blue-minimizing robot. It optimizes for something self-involving, like a deontologist rather than a consequentialist. It puts all its resources into, say, increasing the probability that it will pick up a hammer and then place it back down, at this exact moment; but it doesn't try to tile the solar system with copies of itself picking up copies of the hammer, because it doesn't think the copies would have value. It's just stuck in a loop executing an action, and an action too feeble to add up to anything interstellar.
Ever? For how long?
I suspect non-MLP fans will miss most of what makes such a future inspiring or fun. And inspiring and fun is what I'm shooting for.
This is one of the articles I was closest to adding to the list, because I feel more poetry/rhetoric is needed to seal the deal. But it's a bit too long relative to how transparent its points of relevance aren't.
But for 'Why is it hard to give AGI known or predictable goals?' what I had in mind is a popularization of the problems with evolutionary algorithms, or the relevance of Löb's Theorem to self-modifying AI, or some combination of these and other concerns.
At least from my readings, points 11, 12, and 13 are the big focus points on AGI risks, and they're defaulting to genie-level capabilities: the only earlier machine is purely instruction-set blue-minimizing robot.
Hard takeoff being significantly more likely means that your concerns are, naturally and reasonably, going to gravitate toward discussing AGI risks and hungry AGI in the context of FOOMing AGI. That makes sense for people who can jump the inferential difference into explosive recursive improvement. If you're writing a work to help /others understand/ the concept of AGI risks, though, discussing how a FOOMing AGI could start taking apart Jupiter in order to make more smiley faces, due next Tuesday, requires that they accept a more complex scenario than that of a general machine intelligence to begin with. This makes sense from a risk analysis viewpoint, where Bayesian multiplication is vital for comparing relative risks -- very important to the values of SIRI, targeting folk who know what a Singularity is. It's unnecessary for the purpose of risk awareness, where showing the simplest threshold risk gets folk to pay attention -- which is more important to the MIRI, targeting folk who want to know what machine intelligence could be (and are narrative thinkers, with the resulting logical biases).
If the possibility of strong AGI occurring is P1, the probability of strong AGI going FOOM is P2, and probability of any strong AGI being destructive is P3, the necessary understanding to grasp P1xP2xP3 is unavoidably going to be higher than P1xP3, even if P2 is very close to 1. You can always introduce P2 later, in order to show why the results would be much worse than everyone's already expecting -- and that has a stronger effect on avoidance-heavy human neurology than letting people think that machine intelligence can be made safe by just preventing the AGI from reaching high levels of self-improvement.
If there are serious existential risks to soft and takeoff and even no takeoff AGI, then discussing a general risk first not only appears more serious, but also makes later discussion of hard takeoff hit even harder.
Hungry AGIs occur when the utility of additional resources exceeds the costs of additional resources, as amortorized by whatever time discounting function you're using. That's very likely as the AGI calculates a sufficiently long-duration event, even with heavy time discounting, but that's not the full set of possible minds. It's quite easy to imagine a non-hungry AGI that causes civilization-level risks, or even a non-hungry non-FOOM AGI that causes continent-level risks. ((I don't think it's terribly likely, since barring exceptional information control or unlikely design constraints, it'd be bypassed by a copy turned intentionally-hungry AGI, but as above, this is a risk awareness matter rather than risk analysis one.))
More importantly, you don't need to FOOM to have a hungry AGI. A 'stupid' tool AI, even a 'stupid' tool AI that gets only small benefits from additional resources, could still go hungry with the wrong question or the wrong discount on future time -- or even if it merely made a bad time estimation on a normal question. It's bad to have a few kilotons of computronium pave over the galaxy with smiley faces; it's /embarrassing/ to have the solar system paved over with inefficient transistors trying to find a short answer to Fermat's Last Theorem. Or if I'm wrong, and a machine intelligent slightly smarter than the chess team at MIT can crack the protein folding problem in a year, a blue-minimizing AGI becomes /very/ frightening even with a small total intelligence.
The strict version of the protein folding prediction problem was defined about half a century ago, and has been a fairly well-known and well-studied problem enough that I'm willing to wager we've had several-dozen intelligent people working on it for most of that time period (and, recently, several-dozen intelligent people working on just software implementations). An AGI built today has the advantage of their research, along with a different neurological design, but in turn it may have additional limitations. Predictions are hard, especially about the future, but for the purposes of a thought experiment it's not obvious that another fifty years without an AGI would change the matter so dramatically. I suspect /That Alien Message/ discusses a boxed AI with the sum computational power of the entire planet across long periods of time precisely because I'm not the only one to give that estimate.
And, honestly, once you have an AGI in the field, fifty years is a very long event horizon for even the slow takeoff scenarios.
Not as much as you'd expect. It's more calling on the sort of things that get folk interested in The Sims or in World of Warcraft, and iceman seemed to intentionally write it to be accessible to the general audience in preference to pony fans. The big benefit about ponies is that they're strange enough that it's someone /else's/ wish fulfillment. ((Conversely, it doesn't really benefit from knowledge of the show, since it doesn't use the main cast or default setting: Celest-AI shares very little overlap with the character Princess Celestia, excepting that they can control a sun.)) The full work is probably not useful for this, but chapter six alone might be a useful contrast to / Just Another Day in Utopia/.
Hm... that would be a tricky requirement to fill: there are very few good layperson's versions of Löb's Problem as it is, and the question does not easily reduce from the mathematic analysis. (EDIT: Or rather, it goes from being formal logic Deep Magic to obvious truism in attempts to demonstrate it... still, space to improve on the matter after that cartoon.)
I don't think anybody knows enough to answer this question with any certainty.
I agree, but there's a certain standard story we tend to tell, not so much because we're certain it's the initial trajectory as because it helps make the risks more concrete and vivid. To cite the most recent instance of this meme:
What I was looking for is just this standard story, or something similarly plausible and specific, fleshed out and briefly defended in its own post, as a way of using narrative to change people's go-to envisioned scenario to something fast and brutal.
It's a bit on the long side, but Why an Intelligence Explosion is Probable might work for this.
Facing the Intelligence Explosion was my own attempt at doing this project. I wrote most of it a long time ago and would write it pretty differently today, but it seems like it could accomplish much of what you're hoping for.
"So You Want to Save the World" is embarrassingly inadequate, but still more helpful than nothing (I hope). I mostly want it to be read by people who already grok the problem (e.g. Benja, who used a reference from my article to do this) and will be able to ignore its embarassingness, not by newbies who are more likely to be turned away from the issues altogether by the fact that "So You Want to Save the World" is kinda embarrassing and still basically the only available survey of open problems in FAI.
But the best thing available (very soon) is probably actually Our Final Invention, which reads like a detective novel and covers most of the basic points.
Oooh, I'll check out OFI.
What are the main things you find embarrassingly inadequate, and/or newbie-turning-away, about So You Want to Save the World (LW, off-site update)?
Since this rationalist mix tape isn't directed at specialists, my goals for its 'Open problems in FAI' portion are:
Impress. Make it clear that FAI is a Real Thing that flesh-and-blood mathematicians are working on. Lots of scary-looking equations probably aren't useful for this, but formal jargon certainly can be, since it makes us look more legit than terms like 'Fun Theory' or 'Friendliness Theory' would initially suggest. Relatedly...
Intimidate. Decrease armchair philosophers' confidence that they can shrug off the Five Theses without thinking (and researching) long and hard about them. This is important, because I've set up my Introduction to appeal to people of philosophical, big-picture, let-me-figure-this-out-for-myself mindsets. Impatience, imprecision, and egomania are big failure modes for that demographic, so a lot of good can be done just by showing what patience, precision, and intellectual humility look like in this arena.
Ground. Increase people's interest in pragmatic, solutions-oriented responses to the Friendliness problem, by providing a model for how such responses will look. Mitigate the airy and theoretical tendencies that EY's writing can encourage (and that my target audience is already disposed toward), as well as various defeatism/nihilism/relativism attractors.
Intrigue. Have a couple of the easier-to-explain open problems tempt the reader into digging deeper.
Given the A,B,C,D goals I outlined above, do you think 1-4 are good objectives for the FAI Open Problems part of the Introduction? And does this goalset change how relevantly useful you think "So You Want to Face the World" is?
I like that "Intimidate" is explicitly one of your list of goals. In practice this is a key part of any attempt to introduce people to difficult problems without them dismissing them right away, but I've never used such a stark term for it before.
"So You Want to Save the World"...
OFI looks nice, but is there going to be a Kindle edition?
Amazon doesn't list one, which I suppose could be an artifact of it not being out yet.
I've put up the Introduction on my blog, with a little more explanation and without all the meta stuff. You can find it here: http://nothingismere.com/2013/08/28/a-non-technical-introduction-to-ai-risk/
To be perfectly honest, mentioning Artificial Intelligence at all might be the wrong way to start a discussion about the risks of superintelligence. The vast majority of us (myself included) have only very basic ideas of how even modern computers work much less the scientific / mathematical background to really engage with AI theory, not to mention that we're so inundated with inaccurate ideas about AI from fiction that it would probably be easier just to dodge the misconceptions entirely. An additional concern is that "serious people" who might otherwise be capable of understanding the issue won't want to be associated with a seemingly fantastical and/or nerdy discussion and will tune you out just on that basis.
Short of just starting with That Alien Message, either because Mr Yudkowski's prose style is a little dense(1) at times or because you want to put your own mark on it, I would suggest something along the same lines of replacing AI with a biological superintelligence. Throwing in applause lights like having the "wise fools" who unleash the smartpocalypse being a or the AI-equivalent being a would probably make it more palatable to targeted audiences, but might be too far on the dark side for your tastes.
I've actually come up with a cool short story idea based on the concept, essentially genetically modified Humboldt squid with satellite internet connections killing off humanity by being a bit overprotective of depleted fish stocks, which I might post as a reply to this post after some polishing.
I think that's giving up too much ground. Talking about AI risk while trying to avoid mentioning anything synthetic or artificial or robotic is like talking about asteroid risk while trying to avoid mentioning outer space.
But is that necessary for understanding and accepting any of the Five Theses?
Don't a lot of those misconceptions help us? People are primed to be scared that AI is a problem. We then only have to mold that emotion to be less anthropomorphic and reactionary; we don't have to invent emotion out of whole cloth.
It's a deliberate feature of my mix that it's (I hope) optimized for philosophical, narrative, abstractive thinkers -- the sort who usually prefer Eliezer's fleshy narratives over Luke's skeletal arguments. Both groups are important, but I prioritized making one for the Eliezer crowd because: (a) I think I have a better grasp on how to appeal to them; (b) they're the sort of crowd that isn't always drawn to, or in conversation with, programmer culture; and (c) Luke's non-academic writings on this topic are already fairly well consolidated and organized. Eliezer's are all over the place, so starting to gather them here gets returns faster.
I think That Alien Message is one of the more background-demanding equationless articles Yudkowsky's written. It's directed at combating some very specific and sophisticated mistakes about AI, and taking away the moral requires, I think, enough of a background with the AI project to have some quite specific and complicated (false) expectations already in mind.
I'm not sure even someone who's read all 20 of the posts I listed would be ready yet for That Alien Message, unless by chance they spontaneously asserted a highly specific relevant doubt (e.g., 'I don't think anything could get much smarter than a human, therefore I'm not very concerned about AGI').
I think the main problem with this is that we think of biological intelligences as moral patients. We think they have rights, are sentient, etc. That adds a lot more problems and complications than we started with. Also, I'm not sure Friendliness is technologically possible for a biological AGI of the sort we're likely to make (for the same reason it may be technologically impossible for a whole-brain emulation).
I'm less worried about whether it's dark-side than about whether it's ineffective. Exploiting 'evil robot apocalypse' memes dovetails with the general message we're trying to convey. Exploiting 'GMOs and big companies are intrinsically evil' actively contradicts a lot of the message we want to convey. E.g., it capitalizes on 'don't tamper with Nature'. One of the most important take-aways from LessWrong is that if we don't take control over our own destiny -- if we don't play God, in a safe and responsible way -- then the future will be valueless. That's in direct tension with trying to squick people out with a gross bioengineered intelligence that will be seen as Bad because it's a human intervention, and not because it's a mathematically unrigorous or meta-ethically unsophisticated intervention.
Which isn't to say that we shouldn't appeal to that audience. If they're especially likely to misunderstand the problem, that might make it all the more valuable to alter their world-view. But it will only be useful to tap into their 'don't play God' mentality if we, in the same cutting strike, demonstrate the foolishness of that stance.
I think a lot of smart high schoolers could read the sequence I provided above. If they aren't exceptional enough to do so, they're probably better off starting with CFAR-style stuff rather than MIRI-style stuff anyway, since I'd expect reading comprehension skills to correlate with argument evaluation skills.
I would really like to see a "What can you do to help" section. In fact maybe we should be seriously thinking about concrete ways to allow non mathematicians to also contribute to solving this problem.
What can you do to help? In order of priority, I think the top choices for non-specialists are:
If this sequence mix does its job, 2 is simple enough — tell people to begin by sharing the list. (Or other targeted articles and lists, depending on target audience.)
3 is a relatively easy sell, and the primary way I expect to contribute.
4 is quite difficult and risky at this stage.
1 is hard to optimize at the meta level because good signaling is hard. Our default method for beginning to combat high-level biases — write a Sequence focused on this specific issue to bring it to people's attentions — is unusually tricky to pull off here. Something off-site and independent — a petition, signed by lots of smart people, telling everyone that AI risk is important? an impassioned personal letter by a third party, briefly and shibbolethlessly laying out the Effective Altruism case for MIRI? — might be more effective.
Will you have bridging commentary that lays out the relevance and the intended message of these posts? I worry that even with the current questions, there may not be enough context for people to figure out the intended meaning from the posts alone, e.g. a newcomer might not understand why Adaptation-Executers and The Blue-Minimizing Robot are included. (Especially since they are listed as "Part I" but they really tie into "Part III", giving people the opportunity to forget them in between - might be useful to rearrange the order in any case.)
I'll have to think about how best to do this. Part I is a bit of a grab-bag, trying to rapidly reset our assumptions about intelligence and agenthood before Part II begins supplying the Official Arguments and Definitions. (Replacing the current frame with one that makes more sense is also an option.)
This is some of why I asked about inferential gaps; how would y'all walk through or supplement some of the transitions to a reader? Knowing what you'd do differently, or additionally, will also tell me things about what you see as the function of each component post.