[Click here to see a list of all interviews]

Abram Demski is a computer science Ph.D student at the University of Southern California who has previously studied cognitive science at Central Michigan University. He is an artificial intelligence enthusiast looking for the logic of thought. He is interested in AGI in general and universal theories of intelligence in particular, but also probabilistic reasoning, logic, and the combination of the two ("relational methods"). Also, utility-theoretic reasoning.

I interviewed Abram Demski due to feedback from lesswrong. cousin_it, top contributer and research associate of the Singularity institute, wrote the following:

I'm afraid of Abram Demski who wrote brilliant comments on LW and still got paid to help design a self-improving AGI (Genifer).

Enough already, here goes....

The Interview:

Q1: Assuming no global catastrophe halts progress, by what year would you assign a 10%/50%/90% chance of the development of roughly human-level machine intelligence?

Explanatory remark to Q1:

P(human-level AI by (year) | no wars ∧ no disasters ∧ beneficially political and economic development) = 10%/50%/90%

Abram Demski:

10%: 5 years (2017).
50%: 15 years (2027).
90%: 50 years (2062).

Of course, just numbers is not very informative. The year numbers I gave are unstable under reflection, at a factor of about 2 (meaning I have doubled and halved these estimates in the past minutes while considering it). More relevant is the variance; I think the year of development is fundamentally hard to predict, so that it's rational to give a significant probability mass to within 10 years, but also to it taking another 50 years or more. However, the largest bulk of my probability mass would be roughly between 2020 and 2030, since (1) the computing hardware to simulate the human brain would become widely available and (2) I believe less than that will be sufficient, but the software may lag behind the hardware potential by 5 to 10 years. (I would estimate more lag, except that it looks like we are making good progress right now.)

Q2: What probability do you assign to the possibility of human extinction as a result of badly done AI?

Explanatory remark to Q2:

P(human extinction | badly done AI) = ?

(Where 'badly done' = AGI capable of self-modification that is not provably non-dangerous.)

Abram Demski: This is somewhat difficult. We could say that AIs matching that description have already been created (with few negative consequences). I presume that "roughly human-level" is also intended, though.

If the human-level AGI

0) is autonomous (has, or forms, long-term goals)
1) is not socialized
2) figures out how to access spare computing power on the internet
3) has a goal which is very bad for humans (ie, implies extinction)
4) is alone (has no similarly-capable peers)

then the probability of human extinction is quite high, though not 1. The probability of #0 is somewhat low; #1 is somewhat low; #2 is fairly high; #3 is difficult to estimate; #4 is somewhat low.

#1 is important because a self-modifying system will tend to respond to negative reinforcement concerning sociopathic behaviors resulting from #3-- though, it must be admitted, this will depend on how deeply the ability to self-modify runs. Not all architectures will be capable of effectively modifying their goals in response to social pressures. (In fact, rigid goal-structure under self-modification will usually be seen as an important design-point.)

#3 depends a great deal on just how smart the agent is. Given an agent of merely human capability, human extinction would be very improbable even with an agent that was given the explicit goal of destroying humans. Given an agent of somewhat greater intelligence, the risk would be there, but it's not so clear what range of goals would be bad for humans (many goals could be accomplished through cooperation). For a vastly more intelligent agent, predicting behavior is naturally a bit more difficult, but cooperation with humans would not be as necessary for survival. So, that is why #2 becomes very important: an agent that is human-level when run on the computing power of a single machine (or small network) could be much more intelligent with access to even a small fraction of the world's computing power.

#4 is a common presumption in singularity stories, because there has to be a first super-human AI at some point. However, the nature of software is such that once the fundamental innovation is made, creating and deploying many is easy. Furthermore, a human-like system may have a human-like training time (to become adult-level that is), in which case it may have many peers (which gets back to #1). In case #4 is *not* true, then condition #3 must be rewritten to "most such systems have goals which are bad for humans".

It's very difficult to give an actual probability estimate for this question because of the way "badly done AI" pushes around the probability. (By definition, there should be some negative consequences, or it wasn't done badly enough...) However, I'll naively multiply the factors I've given, with some very rough numbers:

= .1 * .1 * .9 * .5 * .1
= .00045

I described a fairly narrow scenario, so we might expect significant probability mass to come from other possibilities. However, I think it's the most plausible. So, keeping in mind that it's very rough, let's say .001.

I note that this is significantly lower than estimates I've made before, despite trying harder at that time to refute the hypothesis.

Q3: What probability do you assign to the possibility of a human level AGI to self-modify its way up to massive superhuman intelligence within a matter of hours/days/< 5 years?

Explanatory remark to Q3:

P(superhuman intelligence within hours | human-level AI running at human-level speed equipped with a 100 GB Internet connection) = ?
P(superhuman intelligence within days | human-level AI running at human-level speed equipped with a 100 GB Internet connection) = ?
P(superhuman intelligence within < 5 years | human-level AI running at human-level speed equipped with a 100 GB Internet connection) = ?

Abram Demski: Very near zero, very near zero, and very near zero. My feeling is that intelligence is a combination of processing power and knowledge. In this case, knowledge will keep pouring in, but processing power will become a limiting factor. Self-modification does not help this. So, such a system might become superhuman within 5 years, but not massively.

If the system does copy itself or otherwise gain more processing power, then I assign much higher probability; 1% within hours, 5% within days, 90% within 5 years.

Note that there is a very important ambiguity in the term "human-level", though. It could mean child-level or adult-level. (IE, a human-level system may take 20 years to train to the adult level.) The above assumes you mean "adult level". If not, add 20 years.

Q4: Is it important to figure out how to make AI provably friendly to us and our values (non-dangerous), before attempting to solve artificial general intelligence?

Abram Demski: "Provably non-dangerous" may not be the best way of thinking about the problem. Overall, the goal is to reduce risk. Proof may not be possible or may not be the most effective route.

So: is it important to solve the problem of safety before trying to solve the problem of intelligence?

I don't think this is possible. Designs for safe systems have to be designs for systems, so they must be informed by solutions to the intelligence problem.

It would also be undesirable to stall progress while considering the consequences. Serious risks are associated with many areas of research, but it typically seems better to mitigate those risks while moving forward rather than beforehand.

That said, it seems like a good idea to put some thought into safety & friendliness while we are solving the general intelligence problem.

Q4-sub: How much money is currently required to mitigate possible risks from AI (to be instrumental in maximizing your personal long-term goals, e.g. surviving this century), less/no more/little more/much more/vastly more?

Abram Demski: I am a bias authority for this question, but in general, increased funding to AI would be good news to me. My opinion is that the world has a lot of problems which we are slowly solving, but which could be addressed much more effectively if we had more intelligence with which to attack the problem. AI research is beginning to bear fruits in this way (becoming a profitable industry).

So, if I had my say, I would re-assign perhaps half the budget currently assigned to the military to AI research. (The other half would go to NASA and particle physicists...)

What amount of the AI budget should be concentrated on safety? Moderately more than at present. Almost no one is working on safety right now.

Q5: Do possible risks from AI outweigh other possible existential risks, e.g. risks associated with the possibility of advanced nanotechnology?

Abram Demski: I would rank nanotech as fairly low on my list of concerns, because cells are fairly close to optimal replicators in the present environment. (IE, I don't buy the grey-goo stories: the worst that I find plausible is a nanobot plague, and normal biological weapons would be easier to make.)

Anyway, AI would be lower on my list than global warming.

Q5-sub: What existential risk (human extinction type event) is currently most likely to have the greatest negative impact on your personal long-term goals, under the condition that nothing is done to mitigate the risk?

Abram Demski: Another world war with the present technology & weapon stockpiles (or greater).

Q6: What is the current level of awareness of possible risks from AI, relative to the ideal level?

Abram Demski: The general population seems to be highly aware of the risks of AI, with very little awareness of the benefits.

Within the research community, the situation was completely opposite until recently. I would say present awareness levels in the research community is roughly optimal...

Q7: Can you think of any milestone such that if it were ever reached you would expect human-level machine intelligence to be developed within five years thereafter?

Abram Demski: No. Predictions about AI research have historically been mostly wrong, so it would be incorrect to make such predictions.

Q8: Are you familiar with formal concepts of optimal AI design which relate to searches over complete spaces of computable hypotheses or computational strategies, such as Solomonoff induction, Levin search, Hutter's algorithm M, AIXI, or Gödel machines?

Abram Demski: Yes.


Abram Demski: I have been, and am, generally conflicted about these issues. I first encountered the writings of Eliezer three or four years ago. I was already a Bayesian at the time, and I was working with great conviction on the problem of finding the "one true logic" which could serve as a foundation of reasoning. Central to this was finding the correct formal theory of truth (a problem which is hard thanks to the Liar paradox). Reading Eliezer's material, it was clear that he would be interested in the same things, but wasn't writing a great deal about them in public.

I sent him an email about it. (I had the good habit of emailing random researchers, a habit I recommend to anyone.) His response was that he needed to speak to me on the phone before collaborating with me, since much more about a person is conveyed by audio. So, we set up a phone call.

I tried to discuss logic on the phone, and was successful for a few minutes, but Eliezer's purpose was to deliver the argument for existential risk from AI as a set-up for the central question which would determine whether he would be willing to work with me: If I found the correct logic, would I publish? I answered yes, which meant that we could not work together. The risk for him was too high.

Was I rational in my response? What reason should I have to publish, that would outweigh the risk of someone taking the research and misusing it? (Eliezer made the comment that with the correct logic in hand, it might take a year to implement a seed AI capable of bootstrapping itself to superhuman intelligence.) My perception of my own thought process is much closer to "clinging to normality" than one of "carefully evaluating the correct position". Shortly after that, trying to resolve my inner conflict (prove to myself that I was being rational) I wrote this:


The numbers there just barely indicate that I should keep working on AI (and I give a cut-off date, beyond which it would be too dangerous). Despite trying hard to prove that AI was not so risky, I was experiencing an anchoring bias. AI being terribly risky was the start position from which estimates should be improved.

Still, although I attempted to be fair in my new probability estimates for this interview, it must be admitted that my argument took the form of listing factors which might plausibly reduce the risk and multiplying them together so that the risk gets smaller and smaller. Does this pattern reflect the real physics of the situation, or does it still reflect anchor-and-reduce type reasoning? Have I encountered more factors to reduce the probability because that's been what I've looked for?

My central claims are:

  • In order for destruction of humanity to be a good idea, the AI would have to be so powerful that it could simply brush humanity aside, or have a very malevolent goal, or both.
  • In order to have massively superhuman intelligence, massively more processing power is needed (than is needed to achieve human-level intelligence).
  • It seems unlikely that a malevolent system would emerge in an environment empty of potential rivals which may be more pro-human, which further increases the processing power requirement (because the malevolent system should be much more powerful than rivals in order for cooperation to be an unimportant option), while making massive acquisition of such processing power (taking over the internet single-handedly) less plausible.
  • In any case, these negative singularity scenarios tend to assume an autonomous AI system (one with persistent goals). It is more likely that the industry will focus on creating non-autonomous systems in the near-term, and it seems like these would have to become autonomous to be dangerous in that way. (Autonomous systems will be created more for entertainment & eventually companionship, in which case much more attention will be paid to friendly goal systems.)

Some danger comes from originally non-autonomous systems which become autonomous (ie, whose capabilities are expanded to general planning to maximize some industrially useful utility function such as production quality, cash flow, ad clicks, etc). These are more likely to have strange values not good for us. The hope here lies in the possibility that these would be so numerous that cooperation with society & other AIs would be the only option. A scenario where the first AI capable of true recursive self improvement became an industrial AI and rose to power before its makers could re-sell it to many different jobs seems unlikely (because the loop between research and industry is not usually like that). More likely, by the time the first human-level system is five years old (old enough to start thinking about world domination), different versions of it with different goals and experiences will be in many places having many experiences, cooperating with humankind more than each other.

But, anyway, all of this is an argument that the probability is low, not that it would be impossible or that the consequences aren't hugely undesirable. That's why I said the level of awareness in the AI community is good and that research into safe AI could use a bit more funding.

New Comment
71 comments, sorted by Click to highlight new comments since: Today at 6:50 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

All interviews: ...

Maybe collect on a wiki page ("Kruel's interviews on AI risks"?), so that you can link to that page, and/or apply a unique tag to all posts in the series?

Thanks, good idea. I will add an unique tag and create a wiki page soon.
Tag: ai_risks_qa Wiki: http://wiki.lesswrong.com/wiki/Interview_series_on_risks_from_AI

Thanks a lot for your work on all this Xixidu, it's nice to have all these answers from a wide variety of experts. Thanks also to Demski and the others, of course.

It's very difficult to give an actual probability estimate for this question because of the way "badly done AI" pushes around the probability. (By definition, there should be some negative consequences, or it wasn't done badly enough...) However, I'll naively multiply the factors I've given, with some very rough numbers:


= .1 .1 .9 .1 .5 * .1

= .000045

I described a fairly narrow scenario, so we might expect significant probability mass to come from other possibilities. However, I think it's the most plausible. So, keeping in mind that it's very rough, let's say .0001.

There's an extra "* .1" in your expression, between your .9 and the .5, resulting in an extra zero, so your rough final tally should be something more like .001.

Oh, gosh. Xixidu, could you correct that?

Done. Sorry for waiting so long, should have fixed it right away. Thanks to Emile for noticing it.

If the human-level AGI

0) is autonomous (has, or forms, long-term goals)
1) is not socialized

#1 is important because a self-modifying system will tend to respond to negative reinforcement concerning sociopathic behaviors resulting from #3-- though, it must be admitted, this will depend on how deeply the ability to self-modify runs. Not all architectures will be capable of effectively modifying their goals in response to social pressures. (In fact, rigid goal-structure under self-modification will usually be seen as an important design-point.)

Abram: Coul... (read more)

Steve, The idea here is that if an agent is able to (literally or effectively) modify its goal structure, and grows up in an environment in which humans deprive it of what it wants when it behaves badly, an effective strategy for getting what it wants more often will be to alter its goal structure to be closer to the humans. This is only realistic with some architectures. One requirement here is that the cognitive load of keeping track of the human goals and potential human punishments is a difficulty for the early-stage system, such that it would be better off altering its own goal system. Similarly, it must be assumed that during the period of its socialization, it is not advanced enough to effectively hide its feelings. These are significant assumptions.
9Wei Dai12y
Interesting! Have you written about this idea in more detail elsewhere? Here are my concerns about it: 1. The AI has to infer the human's goals. Given the assumed/required cognitive limitations, it may not do a particularly good job of this. 2. What if the human doesn't fully understand his or her own goals? What does the AI do in that situation? 3. The AI could do something like plant a hidden time-bomb in its own code, so that its goal system reverts from the post-modification "close to humans" back to its original goals at some future time when it's no longer punishable by humans. Given these problems and the various requirements on the AI for it to be successfully socialized, I don't understand why you assign only 0.1 probability to the AI not being socialized.
A little out of context, but do you know whether Paul Rosenbloom has similar ideas to you about AI existential risk, AI cooperation and autonomy, or strong limitations on super-human AI without commensurate scaling of resources?

A lot can be boiled down to the question - How much space is above our intelligence level?

If there is a plenty of intelligence space above, then the problem can be huge. In that case the AI can surprise us anytime.

If there is not, it is not a problem and the AI will not be very much smarter than we are.

I think that there IS a plenty of room up there.

It's not just how much space there is, it's also how easy it is to reach. I think there is a lot of space, but that it requires a lot of computing power to reach. If a system is merely superhuman, then destroying humankind will not be a good strategy. A system would have to climb quite high before cooperation with humans started to look suboptimal. (There are also Mark Waser's arguments that cooperation looks more and more optimal as a system gets smarter, but he hasn't established that mathematically...)
I understand your point. We agree that there is a lot of space above, we don't agree how easy it is to climb up there. With a LOT of CPU, it would be almost trivial. But what can we do (what can AI do) with much less? We have done many amazing things in mathematics mostly at the state of the CPU starvation. We have accomplished a lot with a very limited calculation resources. So much so, that we assumed how "thinking is not equal calculating". A substantial CPU usage optimization may be possible for an AI and by an AI. I guess.
One question which is pertinent here: is there anything dramatically better than Levin search for general problem solving? (That's a real question, not intended rhetorically.)
Some approximate method could be as nearly as good and a lot cheaper to compute. I am at least 60% confident it exists. Some highly optimized implementation might be acceptable. Only 30% confident that there is one such. For the case, that there is something different and radically better, I give 25%. And there is another optic. Say, that I have 10^20 CPU cycles every second. How much of a Levin search I can afford with this? Maybe it is enough. I give 80% chance for this one. What if I have only 10^15 cps and still enough? I give 50% chance for this. And maybe even 10^10 cps is quite all right. So that I could do it today, had been smarter. I give it 10%. Of course, those are all wild guesses, but it's my best answer I can give.

I would rank nanotech as fairly low on my list of concerns, because cells are fairly close to optimal replicators in the present environment. (IE, I don't buy the grey-goo stories: the worst that I find plausible is a nanobot plague, and normal biological weapons would be easier to make.)

I'm not sure whether this response is "within the spirit of the question" - but the primary actual problem with nanotechnology - if it is somehow magically delivered into our hands without superintelligence - is that it then massively facilitates the construct... (read more)

Anyway, AI would be lower on my list than global warming.

That is pretty ridiculous prioritisation - if you ask me.

It seems like a very reasonable position that * global warming is more likely to cause massive deaths than AI, but * AI is more likely to exterminate mankind than global warming
The term "existential risks" is in the question being asked. I think it should count as context.
True - though maybe some consider "major catastrophe causing the collapse of civilization as we know it" as falling under existential risk, even if it would take much more than that to actually put mankind in danger. I wonder if Demski would actually give a high probability of human extinction because of global warming, or whether it's just that he used a broad interpretation of "existential risk".
Yea, I have to admit that when I wrote that I meant "lower on my list of concerns for the next century".
Global warming is surely fluff - even reglaciation poses a bigger risk.

One of the major problems I see regarding the power of a singleton superintelligence is that unknown unknowns are not sign-posted.

If people like Benoît B. Mandelbrot would have never decided to research Fractals then many modern movies wouldn't be possible, as they rely on fractal landscape algorithms. Yet, at the time Benoît B. Mandelbrot conducted his research it was not foreseeable that his work would have any real-world applications.

Important discoveries are made because many routes with low or no expected utility are explored at the same time. And to... (read more)

Addendum (via 17 Equations that changed the world) (emphasis mine)
Addendum (via Basic science is about creating opportunities) * Studying monkey social behaviors and eating habits lead to insights into HIV (Radiolab: Patient Zero) * Research into how algae move toward light paved the way for optogenetics: using light to control brain cells (Nature 2010 Method of the Year). * Black hole research gave us WiFi (ICRAR award) * Optometry informs architecture and saved lives on 9/11 (APA Monitor) * Certain groups HATE SETI, but SETI's development of cloud-computing service SETI@HOME paved the way for citizen science and recent breakthroughs in protein folding (Popular Science) * Astronomers provide insights into medical imaging (TEDxBoston: Michell Borkin) * Basic physics experiments and the Fibonacci sequence help us understand plant growth and neuron development (References)
* I agree that a sufficiently simple-valued optimizer pretty much destroys most of what I value in any environment it controls. * You are implicitly equating "singleton" and "simple values" here in a way that doesn't seem at all justified to me. * I agree that for all X, as long as you don't try X, you don't know what X might lead to. I would also add that for all X, even if you do try X, you don't know what X might lead to. (For example, suppose I spend ten years working on a research project. It doesn't follow that I know what spending ten years on that research project leads to; perhaps if I'd gone about it slightly differently, I'd have gotten different results.) * I agree that when we can't think of a high-expected-utility route, we try low-expected-utility routes, because that's what there is. And if enough of us do that, we often discover unexpected utility on those routes. That said, if there's two routes I can take, and path A has a high chance of getting me what I want, and path B has a low chance of getting me what I want, I take path A.. So does basically every higher mammal I've ever met. * I agree that unlike mammals, self-replicating DNA with sources of random mutation are as likely to explore path A as path B. I don't think it's a coincidence that mammals as a class achieve their goals faster than self-replicating DNA with sources of random mutation.
No, I don't. What I am saying is that you need to have various different agents with different utility-functions around to get the necessary diversity that can give rise to enough selection pressure. I am further saying that a "singleton" won't be able to predict the actions of new and improved versions of itself by just running sandboxed simulations. Not just because of logical uncertainty but also because it is computationally intractable to predict the real-world payoff of changes to its decision procedures. I am also saying that you need complex values to give rise to the necessary drives to function in a complex world. You can't just tell an AI to protect itself. What would that even mean? What changes are illegitimate? What constitutes "self"? That are all unsolved problems that are just assumed to be solvable when talking about risks from AI. What I am talking about is concurrence. What I claim won't work is the kind of arguments put forth by people like Steven Landsburg that you should contribute to just one charity that you deem most important. The real world is not that simple. Much progress is made due to unpredictable synergy. "Treating rare diseases in cute kittens" might lead to insights on solving cancer in humans. If you are an AI with simple values you will simply lack the creativity, due to a lack of drives, to pursue the huge spectrum of research that a society of humans does pursue. Which will allow an AI to solve some well-defined narrow problems, but it will be unable to make use of the broad range of synergetic effects of cultural evolution. Cultural evolution is a result of the interaction of a wide range of utility-functions. The difference is that mammals have goals, complex values, which allows them to make use of evolutionary discoveries and adapt them for their purposes.

In order to have massively superhuman intelligence, massively more processing power is needed (than is needed to achieve human-level intelligence

Who says isn't available? First, you have other devices in the neighborhood. Second, you have some physical process which can be initiated and used as a computing source.

Well, maybe you don't have them, but you cannot count on that. I am just a stupid human. But if I was an AI, I might try to use even a CAPTCHA variation for some clever calculation. I might try to harness the video info streams for computing, i... (read more)

I am just a stupid human. But if I was an AI, I might...

I think it is a great idea to be very cautious about the possible capabilities of hypothetical AI's. Yet the point of disagreement I voice all the time is that some people seem to be too quick to assign magical qualities to AI's.

I just don't see that a group of 100 world-renowned scientists and military strategists could easily wipe away the Roman empire when beamed back in time. And even if you gave all of them a machine gun, the Romans would quickly adapt and the people from the future would run out of ammunition.

It takes a whole technological civilization to produce a modern smartphone.

Claiming that an AI could use some magic to take over the earth is a serious possibility, but not a fact.

Magic has to be discovered, adapted and manufactured first. It doesn't just emerge out of nowhere from the computation of certain algorithms.

I still don't see enough skepticism here when it comes to what an AI could possible do.

With more processing power you can do more different things, not just more of the same things. If your goal is to send 100 people to past to destroy Roman empire, don't send too many scientists and strategists. Send specialists of many kinds. Send charismatic people to start a new religion (make it compatible with the existing religions), so you can make local people work for you. Send artists, healers and architects to show them some miracles. Send diplomats to bribe and convert important people. Send technicians and managers to start efficient production of war machines and electric power generators. Bring the conquered tribes to the next level of civilization; and bring the teachers to educate their young (if possible, teach them to read, and bring a lot of textbooks). Yes, the Romans will adapt, but probably not quickly enough, if you plan to conquer them in 5-10 years. Don't meet them on the battlefield... remove loyalty of their allies, corrupt their leaders, ruin their economy, and actually let them join you -- you can conquer them without destroying them. There are many resources available. Many people use computers that are easy to hack and connected to Internet. The AI could start with hacking millions of PCs worldwide. It could create fake e-mail accounts and communicate with people pretending to be a real person or organization. It could pretend to be a business organization, a secret society, a religious group; many different facades for many different people. It could hack bank accounts and bribe people with real money. If it convinces a few people to act in its name, it can legally start a company, buy property, build machines. It could hack police computers, learn about any human suspicions, plant false information, or pay assassins to kill people who know too much. It could do thousand different things at the same time. It could gain a lot of power without anyone suspecting what happened. And it only needs one unguarded Internet connection. Basic
This might be a good strategy for an AI to use, but it is not an existential risk. An even better strategy may be to openly cooperate, increase loyalty and allies, educate their leaders, bolster their economy, and actually join them. (Depending on goals, & resources.)
The risk is that AI may pretend to be friendly in self-defence, to avoid conflict during its early fragile phase. The cooperation with humans may be only partial; for example AI may give us useful things that will make us happy (for example cure for cancer), but withold things that would make us stronger (for example its new discoveries about self-modification and self-improvement). Later, if the AI grows stronger faster than humans, and its goals are incompatible with human goals, it may be too late for humans to do anything about it. The AI will use the time to gain power and build backup systems. Even if AI's utility value is maximizing the total number of paperclips, it may realise that the best strategy for increasing the number of papierclips includes securing its survival, and this is best done by pretending to be human-friendly, and leave the open conflict for later.
In my example "100 people" were analogous to the resources an AI has at the beginning. "The Roman Empire" is analogous to our society today. The knowledge that "100 people" from today would have is analogous to what an AI could come up with by simply "thinking" about it given its current resources. "Machine guns" are analogous to the supercomputer it runs on. You can't just say "with more processing power you can do more different things", that would be analogous to saying that "100 people" from today could just build more "machine guns". But they can't! They can't use all their knowledge and magic from the future to defeat the Roman empire. This doesn't change anything. You just replaced "technological magic" with "social magic". If the AI isn't already hard-coded to be a dark arts specialist then it can't just squeeze it out of its algorithms. That's not as easy as it sounds in English. People could notice it and bomb the AI. The global infrastructure is very fragile and not optimized for running a GAI. Magic! You would need a computer the size of the moon to control a global conspiracy.
Wait a minute, bomb the AI ? Assuming it took over a fraction of the Internet, that is bound to be very difficult, as you'd have to sever every computer it controls out the network. Yes, currently, the network is still over-centralized, and quite fragile. But if wireless mesh networking live up to its promises (I believe it will, in a decade or two), this won't be the case any more. As for the "not optimized for a Global AI" part, you could just have the AI split (or duplicate) itself, and let a gazillion sub-processes take care of the gazillion task is may want to perform. It probably doesn't need more bandwidth that a human organization does now (except of course to copy itself).
I object to the "assuming" part. It needs to acquire resources to acquire resources. If it can't take over the Internet with its initial resources then it won't be able to make use of further resources. You assume that everyone computer can run a seed AI and that the global infrastructure is very stable under attack. I object to the "just" part.
It's likely very easy to take over enough computers (e.g. hundreds), so that (1) they can run an AI capable enough to work on taking over more computers, perhaps slowly and (2) it's not possible to shut it down without severely disrupting the Internet, since you don't know which computers are involved (if it's not possible yet to run an AI on hundreds of computers, that capability is at most a few decades in the future, and at some point one computer might suffice). Provider/company networks may be big enough to run the AI and almost impossible to shut down in coordination all over the world. If AI has sense enough to avoid making any convincingly threatening moves, it won't be possible to convince people to essentially disrupt the whole economy in an attempt to exterminate it (even better, nobody notices at all). If AI creates some relatively simple free-ranging backup viruses that re-assemble a working AI whenever they can (e.g. forming something like a decentralized p2p network that redundantly stores its data when AI can't run), even shutting down all instances of AI in the world won't cure the infection, it'll come back whenever you restore the Internet or even local networks, letting any (enough of) previously infected computers in. And given enough time, the disease will fester.
I don't believe this analysis. People talk about computer security as though it's an arms race where the smarter side always wins. This is just wrong. Once I've written a correct program (for some set of correctness properties), it'll stay correct. If I have a secure operating system, it'll still be secure no matter how smart the attacker is. This is somewhat beyond current industrial practice, but we have verified operating systems and compilers as research prototypes. We know how to write secure software today. We might not reliably achieve it, but it seems pretty much settled that it's achievable without superhuman skill. Wide area peer-to-peer isn't a good platform for general computing; you have severe reliability and connectivity problems at the edge of the network. If you give me 100 random network-connected machines, it doesn't give me 100 times the real computational power. I'm not sure it gives me 10x, for most problems of interest. In particular, my machine-learning colleagues tell me that their learning algorithms don't parallelize well. Apparently, good learning algorithms need to combine results from examining different subsets of the data, and that's intrinsically communication-intensive and therefore not efficient in parallel. You could presumably write software to automatically craft exploits and use them to re-establish itself elsewhere. This would be a highly resource intensive and therefore non-stealthy process. All exploits only work on some subset of the machines out there; therefore, an attacker firing off attacks across the network will be highly visible. We have honeypots, internet telescopes, and suchlike today. I don't think this process could be kept hidden now, and the defensive technology is steadily improving. I'm not qualified to assess all possible AI-risk scenarios, but I think "the AI will take over all our computers" is overrated as a risk. That window is closing now, and given current trends I expect it to be closed within 10
Security is possible in principle (barring the cases like stupid/careless users manually launching content sent to them or found somewhere and granting it undue privileges), but very unlikely to become sufficiently reliable in practice anytime soon. At present, breaking into more and more computers is a matter of continuously applying some creative effort to the task, researching vulnerabilities and working around existing recognition-type defenses. In any case, earning money to buy additional computing power is similar for our purposes. Yes. What matters is when several hundred (thousand) haphazardly connected computers is enough for the system to be capable enough to successfully work on its continued survival. This is mildly plausible to succeed in permanently inhibiting stupid backup after AI is terminated by disrupting the Internet and most big networks. But it takes only one backup system, and there's incentive to create many, with different restoration strategies. And when only a few computers are sufficient to run an AI, all this becomes irrelevant, as it necessarily remains active somewhere.
How soon is soon? I would bet on most systems not being vulnerable to remote exploits without user involvement within the next 10 years. I would not bet on dangerous self-improving AI within that timeframe. Once the rogue-AI-in-the-net is slower at self-improvement than human civilization, it's not so much of a threat. The world in which there's a rogue-AI out there is probably also the world in which we have powerful-but-reliable automation for lots of human-controlled software development, too... This assumption strikes me as far-fetched. There presumably is some minimum quantity of code and data for the thing to be effective. It would be surprising if that subset fit on one machine, since that would imply that an effective self-modifying AI has low resource needs and that you can fit an effective natural-language processor into a memory much smaller than those used by today's natural-language-processing systems.
By a few computers being sufficient I mean that computers become powerful enough, not that AI gets compressed (feasibility of which is less certain). Other contemporary AI tech won't be competitive with rogue AI when we can't solve FAI, because any powerful AI will in that case itself be a rogue AI and won't be useful for defense (it might appear useful though).
"AI" is becoming a dangerously overloaded term here. There's AI in the sense of a system that does human-like tasks as well as humans (Specialized artificial intelligence), and there's AI in the sense of a highly-self-modifying system with long-range planning, AGI. I don't know what "powerful" means in this context, but it doesn't seem clear to me that humans + ASI can't be competitive with an AGI. And I am skeptical that there will be radical improvements in AGI without corresponding improvements to ASI. it might easily be the case that humans + ASI support for high-productivity software engineering are enough to build secure networked systems, even in the presence of AGI. I would bet on humans + proof systems + higher-level developer tools being able to build secure systems, before AGI becomes good enough to be dangerous.
By "powerful AI" I meant AGI (terminology seems to have drifted there in this thread). Humans+narrow AI might be powerful, but can't become very powerful without AGI, while AGI in principle could. AGI could work on its own narrow AIs if that potentially helps. You keep talking about security, but as I mentioned above, earning money works as well or probably better for accumulating power. Security was mostly relevant in the discussion of quickly infecting the world and surviving an (implausibly powerful) extermination attempt, which only requires being able to anonymously infect a few hundred or thousands of computers worldwide, which even with good overall security seems likely to remain possible (perhaps through user involvement alone, for example after the first wave that recruits enough humans).
Hmm. I'm now imagining a story in which there's a rogue AI out there with a big bank account (attained perhaps from insider trading), hiring human proxies to buy equipment, build things, and gradually accumulate power and influence, before, some day, deciding to turn the world abruptly into paperclips. It's an interesting science fiction story. I still don't quite buy it as a high-probability scenario or one to lie awake worrying about. An AGI able to do this without making any mistakes is awfully far from where we are today. An AGI able to write an AGI able to do this, seems if anything to be a harder problem. We know that the real world is a chaotic messy place and that most interesting problems are intractable. Any useful AGI or ASI is going to be heavily heuristic. There won't be any correctness proofs or reliably shortcuts.Verifying that a proposed modification is an improvement is going to have to be based on testing, not just cleverness. I don't believe you can construct a small sandbox and train an AGI in that sandbox, and then have it work well in the wider world. I think training and tuning an AGI means lots of involvement with actual humans, and that's going to be a human-scale process. If I did worry about the science fiction scenario above, I would look for ways to thwart it that also have high payoff if AGI doesn't happen soon or isn't particularly effective at first. I would think about ways to do high-assurance financial transparency and auditing. Likewise technical auditing and software security.
But it is not easy to use the money. You can't "just" build huge companies with fake identities, or a straw man, to create revolutionary technologies easily. Running companies with real people takes a lot of real-world knowledge, interactions and feedback. But most importantly, it takes a lot of time. I just don't see that an AI could create a new Intel or Apple over a few years without its creators noticing anything. The goals of an AI will be under scrutiny at any time. It seems very implausible that scientists, a company or the military are going to create an AI and then just let it run without bothering about its plans. An artificial agent is not a black box, like humans are, where one is only able to guess its real intentions. A plan for world domination seems like something that can't be concealed from its creators. Lying is no option if your algorithms are open to inspection.
Could you elaborate on "even better, nobody notices at all". Any AI capable of efficient self-modification must be able to grasp its own workings and make predictions about improvements to various algorithms and its overall decision procedure. If an AI can do that, why would the humans who build it be unable to notice any malicious intentions? Why wouldn't the humans who created it not be able to use the same algorithms that the AI uses to predict what it will do? If humans are unable to predict what the AI will do, how is the AI able to predict what improved versions of itself will do? In other words, could you elaborate on why you believe that what the AI is going to do will be opaque to its creators but predictable to its initial self? I am also rather confused about how an AI is believed to be able to hide its attempts to build molecular nanotechnology. It doesn't seem very inconspicuous to me. If you assume a world/future in possession of vastly more advanced technology than our current world, then I don't disagree with you. If it takes very long for the first GAI to be created and if it is then created by means of a single breakthrough that somehow combines all previous discoveries and expert systems into a much more powerful single entity, with huge amounts of hard-coded knowledge, a complex utility-function and various dangerous drives, then I agree. It wouldn't even take the strong version of recursive self-improvement to pretty much take over the world under those assumptions.
I meant not noticing that it escaped to the Internet. But "noticing malicious intentions" is a rather strange thing to say. You notice behavior, not intentions. It's stupid to signal your true intentions if you'll be condemned for them. What what will do, predict in what sense to what end? AI in the wild acts depending on what it encounters, all instances are unique (and beware of the watchers). I didn't talk of this. I don't see how those assumptions are relevant. Also, all drives are dangerous, to the extent their combination differs from ours. Utility is not temper or personality or tendency to act in a certain way. Utility is what shapes long-term plans, any of whose elements might have arbitrary appearance, as necessary to dominate the circumstances.
Maybe I misunderstood you. But I still believe that it is an important question. To be able to self-improve efficiently an AI has to make some sort of predictions on how modifications will affect its behavior. The desired solution is actually much stronger than that. The AI will have to prove the friendliness of its modified self, respectively its successor, with respect to its utility-function. The question is, if the AI can make such predictions about the behavior of improved versions of itself, why wouldn't humans be able to do the same? The fear is that an AI will do something that eventually leads to the extinction of all human value. But the AI must have the same fear about improved versions of itself. The AI must fear that its successor will cause the demise of what it values. Therefore it has to be able to make sure that this won't happen. But why wouldn't humans not be able to do the same? An AI is not a black box to itself. It won't be a black box to its creators. Inventing molecular nanotechnology and taking over the world in its spare time seems like something that should be noticeable.
What if the AI makes mistakes? Meaning, it mistakenly believes the successor it has just wrote has the same utility function? The same way a human could mistakenly believe the AI he has just build is friendly? In the same vein, what if the AI cannot accurately assess its own utility function, but go on optimizing anyway? Such a badly done AI may automatically flatline, and not be able to improve itself. I don't know. But even if the AI is friendly to itself, we humans could still botch the utility function (even if that utility function is as meta as CEV).
Yes I do. But it may not be as probable as I thought. I said as much. And this one seems more plausible. If we uphold freedom, a sensible politic for the internet is to make it as resilient and uncontrollable as possible. If we don't, well… Now if those two assumptions are correct, and we further assume the AI already controls a single computer with an internet connection, then it has plenty of resources to take over a second one. It would need to : * Find a security flaw somewhere (including convincing someone to run arbitrary code), upload itself there, then rinse and repeat. * Or, find and exploit credit card numbers, (or convince someone to give them away), then buy computing power. * Or, find and convince someone (typically a lawyer) to set up a company for it, then make money (legally or not), then buy computing power. * Or, … Humans do that right now. (Credit card theft, money laundering, various scams, legit offshore companies…) Of course, if the first computer isn't connected, the AI would have to get out of the box first. But Eliezer can do that already (and he's not alone). It's a long shot, but if several equally capable AIs pop up in different laboratories worldwide, then eventually one of them will be able to convince its way out.
But humans are optimized to do all that, to work in a complex world. And humans are not running on a computer being watched by their creators who are eager to write new studies on how your algorithms behave. I just don't see it being a plausible scenario that all this could happen unnoticed. Also, simple credit card theft etc. isn't enough. At some point you'll have to buy Intel or create your own companies to manufacture your new substrate or build your new particle accelerator.
OK, let this AI be safely contained, and let the researchers publish. Now, what's stopping some idiot to write a poorly specified goal system, then deliberately let the AI out of the box so it can take over the world? It only takes one idiot among the many that could read the publication. And of course credit card theft isn't enough by itself. But it is enough to bootstrap yourself into something more profitable. There are many ways to acquire money, and the AI, by duplicating itself, can access many of them at the same time. If the AI does nothing stupid, its expansion should be both undetectable and exponential. I give it a year to buy Intel or something. Sure, in the mean time, there will be other AIs with different poorly specified goal systems. Some of them could even be genuinely Friendly. But then we're screwed anyway, for this will probably end up in something like a Hansonian Nighmare. At this point, the only thing that could stop it would be a genuine Seed AI that can outsmart them all. You have less than a year to develop it, and ensure its Friendliness.
Humans are not especially optimized to work in the environment loup-vaillant describes.
It's not trivial, no, but there are at least dozens of humans who've managed it by themselves. And even if the humans do notice, and the AI is confined to a single computer cluster that could be bombed, that doesn't mean the AI has to give away its location; perfect anonymity online is easy.
Partial anonymity online is easy. Perfect anonymity against sufficiently well-resourced and determined adversaries is difficult or impossible. Packets do have to come from somewhere. Speed-of-light puts bounds on location. If you can convince the network operators to help, you can trace paths back hop by hop. You might find a proxy or a bot, but you can thwack that and/or keep tracing backwards in the network. If there were some piece of super-duper malware (the rogue AI) loose on the network, I suspect it could be contained by a sufficiently determined response.
No, you can't. You should read some documents about how Tor works; this is a well-studied question and unfortunately, the conclusions are the opposite of what you have written. The problem is that there are lots of proxies around, most of which don't keep logs, and you can set up a chain so that if any one of them refuses to keep logs then the connection can't be traced. If people knew there was a rogue AI around, they could go around visiting datacenters and use physical tricks to try to detect its presence. But if it maintained the pretense of being an anonymous human or anonymous humans, this probably wouldn't happen.
I understand Tor quite well. Whether connections can be traced depends how powerful you think the attacker is. You can potentially get somewhere doing global timing attacks -- though this depends on the volume and timing properties of the traffic of interest. Maybe more importantly, if enough of the Tor nodes cooperate with the attacker, you can break the anonymity. If you could convince enough Tor operators there was a threat, you could mount that attack. Sufficiently scary malware communicating over Tor ought to do the trick. Alternatively, the powerful attacker might try to compromise the Tor nodes. In the scenario we're discussing, there are powerful AIs capable of generating exploits. Seems strange to assume that the other side (the AGI-hunters) haven't got specialized software able to do similarly. Automatic exploit finding and testing is more or less current state-of-the-art. It does not require superhuman AGI.
I am not sure if we even disagree. I did not say that the possibility can be ruled out. I meant that it is not trivial. For an GAI to take over the Internet the following needs to be true in conjunction: * 1) Its actions must be opaque to its creators for long enough to make use of the resources. * 2) The computational resources that it can hack must be suitable for its tasks to be instrumentally useful. * 2b) Tasks must be parallelizable. * 2c) The available resources must predictably be enough to be worth the effort and the risk of being detected. * 2d) There must be enough resources to only use a fraction of them to not cause a noticeable slow-down. * 2e) The gained resources must allow it to be able to take over even more resources without hitting diminishing returns. * 3) It must be able to break the security of its time in an unsuspicious way. * 3b) The initial AI must be able to use its given resources to become an expert in computer security. * 3c) The initial AI must be able to use its available resources to figure out how to make its algorithms parallelizable and possible adapt them to work efficiently on various architectures. And that's just a few problems right off the top of my head. And I am nowhere near being an expert. I do not suspect that it is that easy.
I think you're greatly overestimating the difficulty of all of these things. My initial reactions are: 1) Of course it makes itself opaque, first thing. That requires local root, and fast generation of fake data for the debug interfaces. There are some theoretical strategies for preventing this (cold-boot attacks with memory image analysis), but putting them into practice would be a major project. 2b) That was already a prerequisite for reaching human level 2c) Detection isn't so bad; it just needs to look enough like the other malware to not stand out. (And the other malware all goes to great lengths to make itself opaque, so opacity will not be suspicious.) 2d) There's a botnet mining Bitcoin today, which uses tons of resources. The actual giveaway is not slowdown (it can set priority levels so it doesn't slow anything else down), but heat and electricity usage. 3b) Easier than it sounds for humans, and much easier for what I think are likely AI architectures than for humans. 3c) Parallelism is already taken care of, porting is generally not a big deal, and the trend is for the programming languages and tools to take care of that as much as possible.
Well, I have to take your word for it. You are throwing concepts at me like "fast generation of fake data for the debug interfaces" and make claims like "Parallelism is already taken care of, porting is generally not a big deal...". If you are right then risks from AI are more probable than I thought. But try to take a look at it from my perspective. I have been a baker and building worker. I currently work as part-time gardener. You are, someone I don't know, claiming in a comment on a blog that some sort AI is likely to be invented that will then easily able to take over the Internet and in addition cares to do so. Given my epistemic state, what you are saying seems to be highly specific, conjunctive, non-evidence-backed speculations on possible bad outcomes. Most experts tell me that what you and others are predicting won't happen. Even those who mostly agree about the possible capabilities of hypothetical AI are nowhere near as worried as you. So what am I to make of a group of people who tell me that all those people are either stupid or haven't thought about it the way you did? Try to take my perspective, someone who doesn't have all those deep insights about recursively self-improving AI and computer science in general.
Both our models depend on many assumptions. Whether these asumptions are in reality true or false, the chances of AI attack following my scenario can be bigger or smaller. (Note: There may be other scenarios.) Here are some of my assumptions: -- The human-level AI is thinking like a smart human; more precisely like a racional human with strategic, social, and programming skills. (AI without strategic skills would forget to make a backup. AI without social skills would forget to keep secret. AI without programming skills could not hack the computers on Internet. It is not automatic that AI has programming skills just because it is itself a computer -- human's aren't automatically biochemical experts just because they are built on biochemistry.) -- Resources required to run a human-level AI are relatively small, so AI can find them on the Internet. If more than a high-end home PC is required to run a human-level AI, then the algorithm can be parallelized. (If AI needs more capacity than it can gain by hacking unsafe computers on Internet, or if it cannot split itself onto many computers, then the original computer and it data line to the rest of the world remain the weak point. At least until AI gains enough resources to build a copy somewhere else; but such copy may be extremely expensive and would inevitably draw attention.) -- AI is thinking at real-time, or at least near real-time so by gaining more resources it can make itself faster than human. (Very slow AI incapable of increasing its speed could lose in a time race.) I don't know if any of this is true. I imagined a situation where a human-level AI is run on an average computer; I imagined that with a correct algorithm one does not need extreme amount of resources. This may be completely wrong. Actually, now I would bet it is wrong. However it seems to me that you overestimate humans. It is not obvious that humans would immediately notice that something is wrong. It is not obvious that they would make the
But how? Are those social skills hard-coded or learnt? To hard-code social skills good enough to take over the world seems like something that will take millennia. And I don't see how an AI was going to acquire those skills either. Do you think it is computationally tractable to learn how to talk with a nice voice, how to write convincing emails etc, just by reading a few studies and watching YouTube videos? I don't know of any evidence that would support such a hypothesis. The same is true for physics and technology. You need large scale experiments like CERN to make insights in physics and large scale facilities like the Intel chip manufactures to create new processors. Both statements are highly speculative. The questionable assumptions here are 1) that all available resources can efficiently run a GAI 2) that available resources can be easily hacked without being noticed 3) that throwing additional computational resources at important problems solves them proportionally faster 4) that important problems are parallelizable. The arguments that humans are not perfect general intelligences is an important one and should be seriously considered. But I haven't seen any evidence that most evolutionary designs are vastly less efficient than their technological counterparts. A lot of the apparent advantages of technological designs is a result of making wrong comparisons like between birds and rockets. We haven't been able to design anything that is nearly as efficient as natural flight. It is true that artificial flight can overall carry more weight. But just because a train full of hard disk drives has more bandwidth than your internet connection does not imply that someone with trains full of HDD's would be superior at data transfer. To launch a new company that builds your improved computational substrate you need massive amounts of influence. I don't perceive it to be at all plausible that such a feat would go unnoticed.
I couldn''t have said better. I'll think about it if I ever have to explain the issue to laypeople. The key point I take is, it makes little matter if the AI has no limbs, as long as it can have humans do its bidding. By the way, your scenario sounds both vastly more probable than a fully fledged hard take off, and nearly as scary. To take over the world, one doesn't need superhuman intelligence, nor self modification, nor faster thoughts, nor even nanotech or other SciFi technology. No, one just needs to be around the 90th human percentile in various domains (typically those relevant to take over the Roman Empire), and be able to duplicate oneself. This is as weak as a "human-level" AI one could think of. Yet it sounds like it could probably set up a singleton before we could stop it (that would mean something like shutting down the Internet, or building another AI before the first takes over the entire network). And the way I see it, it is even worse: * If an AI demonstrate "human-level" optimization power on a single computer, I have no reason to think it will not be able to think much faster when unleashed on the network. This effect could be amplified if it additionally takes over (or collaborate with) a major chip manufacturer, and Moore's law somehow still applies. * The exact same scenario can apply to a group of human uploads. Now just a caveat: I assumed the AI (or upload) would start right away with enough processing power to demonstrate human-level abilities in "real time". We could on the other hand imagine an AI for which we can demonstrate that if it ran a couple of orders of magnitude faster, then it would be as capable as a human mind. That would delay a hard take-off, and make it more predictable (assuming no self-modification). It may also let us prevent the rise of a Singleton.
I'm thinking the second is probable. A single AI seems unlikely.
It takes very little imagination to see that discovery and adaption do emerge 'out of nowhere' via the execution of certain algorithms. Not a fact, it's "just a theory".
It emerges from a society of agents with various different goals and heuristics like "Treating Rare Diseases in Cute Kittens". It is an evolutionary process that relies on massive amounts of real-world feedback and empirical experimentation. Assuming that all that can happen because some simple algorithm is being computed is like believing it will emerge 'out of nowhere', it is magical thinking. Just like antimatter weapons, it does sound superficially possible. And indeed, just like antimatter weapons, sometimes such ideas turn out to be physically possible, but not economically realizable.
No it isn't. I reject the categorization. I suggest that the far more common 'magical thinking' here occurs when people assume there is something special about thinking, discovery, adapatation or optimization in general just because it is a human doing it not an 'algorithm'. As though the human isn't itself just some messy inefficient algorithm. I reject the reference class.
I agree with you that there is no real magic. But there might be an apparent magic. Something that looks like a magic. 100 Navy Seals MIGHT be enough to bring down the Roman empire. Or might not. But scale up the expedition corp and the demise of the Rome is more and more probable. The same goes with the SAI. Maybe it can catch us off guards easily. IFF all the circumstances are just right. The positions about this are something like: The SIAI: Very likely! The academics: We don't see any surprise to come. Yours: Keep our heads cool, there is nothing like magic. Mine: We can expect the Hell, but only if we are not smart enough.
99 Navy Seals, 1 biologist and a few boxes of disease samples. Piece of cake.
I completely agree, the risk has to be taken seriously. My point is that just because you can formulate the prediction "the AI will take over the Internet and use its resources" it doesn't make it more likely. You personally can't take over the Internet and if the AI isn't much more intelligent from the very beginning then it won't be able to do so either. It can't make use of magic to acquire additional resources because magic depends on additional resources. It has to use what it has under the hood.
Don't even need to go that far - just create a bunch of online identities and get a bunch of free Google App Engine accounts. Use them to make a bit of money, buy more CPU, etc.
The sentence you quote was not meant to imply that extra processing power is not available. (I mention the common idea of finding extra processing power by "escaping on to the internet" so to speak.)