Q&A with Richard Carrier on risks from AI

byXiXiDu8y13th Dec 201122 comments


[Click here to see a list of all interviews]

I am emailing experts in order to raise and estimate the academic awareness and perception of risks from AI.

Richard Carrier is a world-renowned author and speaker. As a professional historian, published philosopher, and prominent defender of the American freethought movement, Dr. Carrier has appeared across the country and on national television defending sound historical methods and the ethical worldview of secular naturalism. His books and articles have also received international attention. He holds a Ph.D. from Columbia University in ancient history, specializing in the intellectual history of Greece and Rome, particularly ancient philosophy, religion, and science, with emphasis on the origins of Christianity and the use and progress of science under the Roman empire. He is best known as the author of Sense and Goodness without God, Not the Impossible Faith, and Why I Am Not a Christian, and a major contributor to The Empty Tomb, The Christian Delusion, The End of Christianity, and Sources of the Jesus Tradition, as well as writer and editor-in-chief (now emeritus) for the Secular Web, and for his copious work in history and philosophy online and in print. He is currently working on his next books, Proving History: Bayes's Theorem and the Quest for the Historical Jesus, On the Historicity of Jesus Christ, The Scientist in the Early Roman Empire, and Science Education in the Early Roman Empire. To learn more about Dr. Carrier and his work follow the links below.

Homepage: richardcarrier.info

Blog: freethoughtblogs.com/carrier/ (old blog: richardcarrier.blogspot.com)

Selected articles:

The Interview:

Richard Carrier: Note that I follow and support the work of The Singularity Institute on precisely this issue, which you are writing for (if you are a correspondent for Less Wrong). And I believe all AI developers should (e.g. CALO). So my answers won't be too surprising (below). But also keep in mind what I say (not just on "singularity" claims) at:


Q1: Assuming no global catastrophe halts progress, by what year would you assign a 10%/50%/90% chance of the development of roughly human-level machine intelligence?

Richard Carrier: 2020/2040/2080

Explanatory remark to Q1:

P(human-level AI by (year) | no wars ∧ no disasters ∧ beneficially political and economic development) = 10%/50%/90%

Q2: What probability do you assign to the possibility of human extinction as a result of badly done AI?

Richard Carrier: Here the relative probability is much higher that human extinction will result from benevolent AI, i.e. eventually Homo sapiens will be self-evidently obsolete and we will voluntarily transition to Homo cyberneticus. In other words, we will extinguish the Homo sapiens species ourselves, voluntarily. If you asked for a 10%/50%/90% deadline for this I would say 2500/3000/4000.

However, perhaps you mean to ask regarding the extinction of all Homo, and their replacement with AI that did not originate as a human mind, i.e. the probability that some AI will kill us and just propagate itself.

The answer to that is dependent on what you mean by "badly done" AI: (a) AI that has more power than we think we gave it, causing us problems, or (b) AI that has so much more power than we think we gave it that it can prevent our taking its power away.

(a) is probably inevitable, or at any rate a high probability, and there will likely be deaths or other catastrophes, but like other tech failures (e.g. the Titanic, three mile island, hijacking jumbo jets and using them as guided missiles) we will prevail, and very quickly from a historical perspective (e.g. there won't be another 9/11 using airplanes as missiles; we only got jacked by that unforeseen failure once). We would do well to prevent as many problems as possible by being as smart as we can be about implementing AI, and not underestimating its ability to outsmart us, or to develop while we aren't looking (e.g. Siri could go sentient on its own, if no one is managing it closely to ensure that doesn't happen).

(b) is very improbable because AI function is too dependent on human cooperation (e.g. power grid; physical servers that can be axed or bombed; an internet that can be shut down manually) and any move by AI to supplant that requirement would be too obvious and thus too easily stopped. In short, AI is infrastructure dependent, but it takes too much time and effort to build an infrastructure, and even more an infrastructure that is invulnerable to demolition. By the time AI has an independent infrastructure (e.g. its own robot population worldwide, its own power supplies, manufacturing plants, etc.) Homo sapiens will probably already be transitioning to Homo cyberneticus and there will be no effective difference between us and AI.

However, given no deadline, it's likely there will be scenarios like: "god" AI's run sims in which digitized humans live, and any given god AI could decide to delete the sim and stop running it (and likewise all comparable AI shepherding scenarios). So then we'd be asking how likely is it that a god AI would ever do that, and more specifically, that all would (since there won't be just one sim run by one AI, but many, so one going rogue would not mean extinction of humanity).

So setting aside AI that merely kills some people, and only focusing on total extinction of Homo sapiens, we have:

P(voluntary human extinction by replacement | any AGI at all) = 90%+

P(involuntary human extinction without replacement | badly done AGI type (a)) = < 10^-20

[and that's taking into account an infinite deadline, because the probability steeply declines with every year after first opportunity, e.g. AI that doesn't do it the first chance it gets is rapidly less likely to as time goes on, so the total probability has a limit even at infinite time, and I would put that limit somewhere as here assigned.]

P(involuntary human extinction without replacement | badly done AGI type (b)) = .33 to .67

However, P(badly done AGI type (b)) = < 10^-20

Explanatory remark to Q2:

P(human extinction | badly done AI) = ?

(Where 'badly done' = AGI capable of self-modification that is not provably non-dangerous.)

Q3: What probability do you assign to the possibility of a human level AGI to self-modify its way up to massive superhuman intelligence within a matter of hours/days/< 5 years?

Richard Carrier: Depends on when it starts. For example, if we started a human-level AGI tomorrow, it's ability to revise itself would be hugely limited by our slow and expensive infrastructure (e.g. manufacturing the new circuits, building the mainframe extensions, supplying them with power, debugging the system). In that context, "hours" and "days" have P --> 0, but 5 years has P = 33%+ if someone is funding the project, and likewise 10 years has P=67%+; and 25 years, P=90%+. However, suppose human-level AGI is first realized in fifty years when all these things can be done in a single room with relatively inexpensive automation and the power demands of any new system were not greater than are normally supplied to that room. Then P(days) = 90%+. And with massively more advanced tech, say such as we might have in 2500, then P(hours) = 90%+.


Explanatory remark to Q3:

P(superhuman intelligence within hours | human-level AI running at human-level speed equipped with a 100 Gigabit Internet connection) = ?
P(superhuman intelligence within days | human-level AI running at human-level speed equipped with a 100 Gigabit Internet connection) = ?
P(superhuman intelligence within < 5 years | human-level AI running at human-level speed equipped with a 100 Gigabit Internet connection) = ?

Richard Carrier: Perhaps you are confusing intelligence with knowledge. Internet connection can make no difference to the former (since an AGI will have no more control over the internet than human operators do). That can only expand a mind's knowledge. As to how quickly, it will depend more on the rate of processed seconds in the AGI itself, i.e. if it can simulate human thought only at the same pace as non-AI, then it will not be able to learn any faster than a regular person, no matter what kind of internet connection it has. But if the AGI can process ten seconds time in one second of non-AI time, then it can learn ten times as fast, up to the limit of data access (and that is where internet connection speed will matter). That is a calculation I can't do. A computer science expert would have to be consulted to calculate reasonable estimates of what connection speed would be needed to learn at ten times normal human pace, assuming the learner can learn that fast (which a ten:one time processor could); likewise a hundred times, etc. And all that would tell you is how quickly that mind can learn. But learning in and of itself doesn't make you smarter. That would require software or circuit redesign, which would require testing and debugging. Otherwise once you had all relevant knowledge available to any human software/circuit design team, you would simply be no smarter than them, and further learning would not help you (thus humans already have that knowledge level: that's why we work in teams to begin with), thus AI is not likely to much exceed us in that ability. The only edge it can exploit is speed of a serial design thought process, but even that runs up against the time and resource expense of testing and debugging anything it designed, and that is where physical infrastructure slows the rate of development, and massive continuing human funding is needed. Hence my probabilities above.

Q4: Is it important to figure out how to make AI provably friendly to us and our values (non-dangerous), before attempting to solve artificial general intelligence?

Richard Carrier: Yes. At the very least it is important to take the risks very seriously, and incorporate it as a concern within every project flow. I believe there should always be someone expert in the matter assigned to any AGI design team, who is monitoring everything being done and assessing its risks and ensuring safeguards are in place before implementation at each step. It already concerns me that this might not be a component of the management of Siri, and Siri achieving AGI is a low probability (but not vanishingly low; I'd say it could be as high as 1% in 10 years unless Siri's processing space is being deliberately limited so it cannot achieve a certain level of complexity, or in other ways its cognitive abilities being actively limited).

Explanatory remark to Q4:

How much money is currently required to mitigate possible risks from AI (to be instrumental in maximizing your personal long-term goals, e.g. surviving this century), less/no more/little more/much more/vastly more?

Richard Carrier: Required is not very much. A single expert monitoring Siri who has real power to implement safeguards would be sufficient, so with salary and benefits and collateral overhead, that's no more than $250,000/year, for a company that has billions in liquid capital. (Because safeguards are not expensive, e.g. capping Siri's processing space costs nothing in practical terms; likewise writing her software to limit what she can actually do no matter how sentient she became, e.g. imagine an army of human hackers hacked Siri at the source and could run Siri by a million direct terminals, what could they do? Answering that question will evoke obvious safeguards to put on Siri's physical access and software; the most obvious is making it impossible for Siri to rewrite her own core software.)

But what actually is being spent I don't know. I suspect "a little more" needs to be spent than is, only because I get the impression AI developers aren't taking this seriously, and yet the cost of monitoring is not that high.

And yet you may notice all this is separate from the question of making AGI "provably friendly" which is what you asked about (and even that is not the same as "provably safe" since friendly AGI poses risks as well, as the Singularity Institute has been pointing out).

This is because all we need do now is limit AGI's power at its nascence. Then we can explore how to make AGI friendly, and then provably friendly, and then provably safe. In fact I expect AGI will even help us with that. Once AGI exists, the need to invest heavily in making it safe will be universally obvious. Whereas before AGI exists there is little we can do to ascertain how to make it safe, since we don't have a working model to test. Think of trying to make a ship safe, without ever getting to build and test any vessel, nor having knowledge of any other vessels, and without knowing anything about the laws of buoyancy. There wouldn't be a lot you could do.

Nevertheless it would be worth some investment to explore how much we can now know, particularly as it can be cross-purposed with understanding human moral decision making better, and thus need not be sold as "just AI morality" research. How much more should we spend on this now? Much more than we are. But only because I see that money benefiting us directly, in understanding how to make ordinary people better, and detect bad people, and so on, which is of great value wholly apart from its application to AGI. Having it double as research on how to design moral thought processes unrestrained by human brain structure would then benefit any future AGI development.

Q5: Do possible risks from AI outweigh other possible existential risks, e.g. risks associated with the possibility of advanced nanotechnology?

Explanatory remark to Q5:

What existential risk (human extinction type event) is currently most likely to have the greatest negative impact on your personal long-term goals, under the condition that nothing is done to mitigate the risk?

Richard Carrier: All existential risks are of such vastly low probability it would be beyond human comprehension to rank them, and utterly pointless to anyway. And even if I were to rank them, extinction by comet, asteroid or cosmological gamma ray burst vastly outranks any manmade cause. Even extinction by supervolcano vastly outranks any manmade cause. So I don't concern myself with this (except to call for more investment in earth impactor detection, and the monitoring of supervolcano risks).

We should be concerned not with existential risks, but ordinary risks, e.g. small scale nuclear or biological terrorism, which won't kill the human race, and might not even take civilization into the Dark Ages, but can cause thousands or millions to die and have other bad repercussions. Because ordinary risks are billions upon billions of times more likely than extinction events, and as it happens, mitigating ordinary risks entails mitigating existential risks anyway (e.g. limiting the ability to go nuclear prevents small scale nuclear attacks just as well as nuclear annihilation events, in fact it makes the latter billions of times less likely than it already is).

Thus when it comes to AI, as an existential risk it just isn't one (P --> 0), but as a panoply of ordinary risks, it is (P --> 1). And it doesn't matter how it ranks, it should get full attention anyway, like all definite risks do. It thus doesn't need to be ranked against other risks, as if terrorism were such a great risk we should invest nothing in earthquake safety, or vice versa.

Q6: What is the current level of awareness of possible risks from AI, relative to the ideal level?

Richard Carrier: Very low. Even among AI developers it seems.

Q7: Can you think of any milestone such that if it were ever reached you would expect human‐level machine intelligence to be developed within five years thereafter?

Richard Carrier: There will not be "a" milestone like that, unless it is something wholly unexpected (like a massive breakthrough in circuit design that allows virtually infinite processing power on a desktop: which development would make P(AGI within five years) > 33%). But wholly unexpected discoveries have a very low probability. Sticking only with what we already expect to occur, the five-year milestone for AGI will be AHI, artificial higher intelligence, e.g. a robot cat that behaved exactly like a real cat. Or a Watson who can actively learn on its own without being programmed with data (but still can only answer questions, and not plan or reason out problems). The CALO project is likely to develop an increasingly sophisticated Siri-like AI that won't be AGI but will gradually become more and more like AGI, so that there won't be any point where someone can say "it will achieve AGI within 5 years." Rather it will achieve AGI gradually and unexpectedly, and people will even debate when or whether it had.

Basically, I'd say once we have "well-trained dog" level AI, the probability of human-level AI becomes:

P(< 5 years) = 10%
P(< 10 years) = 25%
P(< 20 years) = 50%
P(< 40 years) = 90%