Q&A with Abram Demski on risks from AI

byXiXiDu7y17th Jan 201271 comments

22


[Click here to see a list of all interviews]

Abram Demski is a computer science Ph.D student at the University of Southern California who has previously studied cognitive science at Central Michigan University. He is an artificial intelligence enthusiast looking for the logic of thought. He is interested in AGI in general and universal theories of intelligence in particular, but also probabilistic reasoning, logic, and the combination of the two ("relational methods"). Also, utility-theoretic reasoning.

I interviewed Abram Demski due to feedback from lesswrong. cousin_it, top contributer and research associate of the Singularity institute, wrote the following:

I'm afraid of Abram Demski who wrote brilliant comments on LW and still got paid to help design a self-improving AGI (Genifer).

Enough already, here goes....

The Interview:

Q1: Assuming no global catastrophe halts progress, by what year would you assign a 10%/50%/90% chance of the development of roughly human-level machine intelligence?

Explanatory remark to Q1:

P(human-level AI by (year) | no wars ∧ no disasters ∧ beneficially political and economic development) = 10%/50%/90%

Abram Demski:

10%: 5 years (2017).
50%: 15 years (2027).
90%: 50 years (2062).

Of course, just numbers is not very informative. The year numbers I gave are unstable under reflection, at a factor of about 2 (meaning I have doubled and halved these estimates in the past minutes while considering it). More relevant is the variance; I think the year of development is fundamentally hard to predict, so that it's rational to give a significant probability mass to within 10 years, but also to it taking another 50 years or more. However, the largest bulk of my probability mass would be roughly between 2020 and 2030, since (1) the computing hardware to simulate the human brain would become widely available and (2) I believe less than that will be sufficient, but the software may lag behind the hardware potential by 5 to 10 years. (I would estimate more lag, except that it looks like we are making good progress right now.)

Q2: What probability do you assign to the possibility of human extinction as a result of badly done AI?

Explanatory remark to Q2:

P(human extinction | badly done AI) = ?

(Where 'badly done' = AGI capable of self-modification that is not provably non-dangerous.)

Abram Demski: This is somewhat difficult. We could say that AIs matching that description have already been created (with few negative consequences). I presume that "roughly human-level" is also intended, though.

If the human-level AGI

0) is autonomous (has, or forms, long-term goals)
1) is not socialized
2) figures out how to access spare computing power on the internet
3) has a goal which is very bad for humans (ie, implies extinction)
4) is alone (has no similarly-capable peers)

then the probability of human extinction is quite high, though not 1. The probability of #0 is somewhat low; #1 is somewhat low; #2 is fairly high; #3 is difficult to estimate; #4 is somewhat low.

#1 is important because a self-modifying system will tend to respond to negative reinforcement concerning sociopathic behaviors resulting from #3-- though, it must be admitted, this will depend on how deeply the ability to self-modify runs. Not all architectures will be capable of effectively modifying their goals in response to social pressures. (In fact, rigid goal-structure under self-modification will usually be seen as an important design-point.)

#3 depends a great deal on just how smart the agent is. Given an agent of merely human capability, human extinction would be very improbable even with an agent that was given the explicit goal of destroying humans. Given an agent of somewhat greater intelligence, the risk would be there, but it's not so clear what range of goals would be bad for humans (many goals could be accomplished through cooperation). For a vastly more intelligent agent, predicting behavior is naturally a bit more difficult, but cooperation with humans would not be as necessary for survival. So, that is why #2 becomes very important: an agent that is human-level when run on the computing power of a single machine (or small network) could be much more intelligent with access to even a small fraction of the world's computing power.

#4 is a common presumption in singularity stories, because there has to be a first super-human AI at some point. However, the nature of software is such that once the fundamental innovation is made, creating and deploying many is easy. Furthermore, a human-like system may have a human-like training time (to become adult-level that is), in which case it may have many peers (which gets back to #1). In case #4 is *not* true, then condition #3 must be rewritten to "most such systems have goals which are bad for humans".

It's very difficult to give an actual probability estimate for this question because of the way "badly done AI" pushes around the probability. (By definition, there should be some negative consequences, or it wasn't done badly enough...) However, I'll naively multiply the factors I've given, with some very rough numbers:

P(#0)P(#1)P(#2)P(#3)P(#4)
= .1 * .1 * .9 * .5 * .1
= .00045

I described a fairly narrow scenario, so we might expect significant probability mass to come from other possibilities. However, I think it's the most plausible. So, keeping in mind that it's very rough, let's say .001.

I note that this is significantly lower than estimates I've made before, despite trying harder at that time to refute the hypothesis.

Q3: What probability do you assign to the possibility of a human level AGI to self-modify its way up to massive superhuman intelligence within a matter of hours/days/< 5 years?

Explanatory remark to Q3:

P(superhuman intelligence within hours | human-level AI running at human-level speed equipped with a 100 GB Internet connection) = ?
P(superhuman intelligence within days | human-level AI running at human-level speed equipped with a 100 GB Internet connection) = ?
P(superhuman intelligence within < 5 years | human-level AI running at human-level speed equipped with a 100 GB Internet connection) = ?

Abram Demski: Very near zero, very near zero, and very near zero. My feeling is that intelligence is a combination of processing power and knowledge. In this case, knowledge will keep pouring in, but processing power will become a limiting factor. Self-modification does not help this. So, such a system might become superhuman within 5 years, but not massively.

If the system does copy itself or otherwise gain more processing power, then I assign much higher probability; 1% within hours, 5% within days, 90% within 5 years.

Note that there is a very important ambiguity in the term "human-level", though. It could mean child-level or adult-level. (IE, a human-level system may take 20 years to train to the adult level.) The above assumes you mean "adult level". If not, add 20 years.

Q4: Is it important to figure out how to make AI provably friendly to us and our values (non-dangerous), before attempting to solve artificial general intelligence?

Abram Demski: "Provably non-dangerous" may not be the best way of thinking about the problem. Overall, the goal is to reduce risk. Proof may not be possible or may not be the most effective route.

So: is it important to solve the problem of safety before trying to solve the problem of intelligence?

I don't think this is possible. Designs for safe systems have to be designs for systems, so they must be informed by solutions to the intelligence problem.

It would also be undesirable to stall progress while considering the consequences. Serious risks are associated with many areas of research, but it typically seems better to mitigate those risks while moving forward rather than beforehand.

That said, it seems like a good idea to put some thought into safety & friendliness while we are solving the general intelligence problem.

Q4-sub: How much money is currently required to mitigate possible risks from AI (to be instrumental in maximizing your personal long-term goals, e.g. surviving this century), less/no more/little more/much more/vastly more?

Abram Demski: I am a bias authority for this question, but in general, increased funding to AI would be good news to me. My opinion is that the world has a lot of problems which we are slowly solving, but which could be addressed much more effectively if we had more intelligence with which to attack the problem. AI research is beginning to bear fruits in this way (becoming a profitable industry).

So, if I had my say, I would re-assign perhaps half the budget currently assigned to the military to AI research. (The other half would go to NASA and particle physicists...)

What amount of the AI budget should be concentrated on safety? Moderately more than at present. Almost no one is working on safety right now.

Q5: Do possible risks from AI outweigh other possible existential risks, e.g. risks associated with the possibility of advanced nanotechnology?

Abram Demski: I would rank nanotech as fairly low on my list of concerns, because cells are fairly close to optimal replicators in the present environment. (IE, I don't buy the grey-goo stories: the worst that I find plausible is a nanobot plague, and normal biological weapons would be easier to make.)

Anyway, AI would be lower on my list than global warming.

Q5-sub: What existential risk (human extinction type event) is currently most likely to have the greatest negative impact on your personal long-term goals, under the condition that nothing is done to mitigate the risk?

Abram Demski: Another world war with the present technology & weapon stockpiles (or greater).

Q6: What is the current level of awareness of possible risks from AI, relative to the ideal level?

Abram Demski: The general population seems to be highly aware of the risks of AI, with very little awareness of the benefits.

Within the research community, the situation was completely opposite until recently. I would say present awareness levels in the research community is roughly optimal...

Q7: Can you think of any milestone such that if it were ever reached you would expect human-level machine intelligence to be developed within five years thereafter?

Abram Demski: No. Predictions about AI research have historically been mostly wrong, so it would be incorrect to make such predictions.

Q8: Are you familiar with formal concepts of optimal AI design which relate to searches over complete spaces of computable hypotheses or computational strategies, such as Solomonoff induction, Levin search, Hutter's algorithm M, AIXI, or Gödel machines?

Abram Demski: Yes.

Addendum

Abram Demski: I have been, and am, generally conflicted about these issues. I first encountered the writings of Eliezer three or four years ago. I was already a Bayesian at the time, and I was working with great conviction on the problem of finding the "one true logic" which could serve as a foundation of reasoning. Central to this was finding the correct formal theory of truth (a problem which is hard thanks to the Liar paradox). Reading Eliezer's material, it was clear that he would be interested in the same things, but wasn't writing a great deal about them in public.

I sent him an email about it. (I had the good habit of emailing random researchers, a habit I recommend to anyone.) His response was that he needed to speak to me on the phone before collaborating with me, since much more about a person is conveyed by audio. So, we set up a phone call.

I tried to discuss logic on the phone, and was successful for a few minutes, but Eliezer's purpose was to deliver the argument for existential risk from AI as a set-up for the central question which would determine whether he would be willing to work with me: If I found the correct logic, would I publish? I answered yes, which meant that we could not work together. The risk for him was too high.

Was I rational in my response? What reason should I have to publish, that would outweigh the risk of someone taking the research and misusing it? (Eliezer made the comment that with the correct logic in hand, it might take a year to implement a seed AI capable of bootstrapping itself to superhuman intelligence.) My perception of my own thought process is much closer to "clinging to normality" than one of "carefully evaluating the correct position". Shortly after that, trying to resolve my inner conflict (prove to myself that I was being rational) I wrote this:

http://dragonlogic-ai.blogspot.com/2009/04/some-numbers-continues-risk-estimate.html

The numbers there just barely indicate that I should keep working on AI (and I give a cut-off date, beyond which it would be too dangerous). Despite trying hard to prove that AI was not so risky, I was experiencing an anchoring bias. AI being terribly risky was the start position from which estimates should be improved.

Still, although I attempted to be fair in my new probability estimates for this interview, it must be admitted that my argument took the form of listing factors which might plausibly reduce the risk and multiplying them together so that the risk gets smaller and smaller. Does this pattern reflect the real physics of the situation, or does it still reflect anchor-and-reduce type reasoning? Have I encountered more factors to reduce the probability because that's been what I've looked for?

My central claims are:

  • In order for destruction of humanity to be a good idea, the AI would have to be so powerful that it could simply brush humanity aside, or have a very malevolent goal, or both.
  • In order to have massively superhuman intelligence, massively more processing power is needed (than is needed to achieve human-level intelligence).
  • It seems unlikely that a malevolent system would emerge in an environment empty of potential rivals which may be more pro-human, which further increases the processing power requirement (because the malevolent system should be much more powerful than rivals in order for cooperation to be an unimportant option), while making massive acquisition of such processing power (taking over the internet single-handedly) less plausible.
  • In any case, these negative singularity scenarios tend to assume an autonomous AI system (one with persistent goals). It is more likely that the industry will focus on creating non-autonomous systems in the near-term, and it seems like these would have to become autonomous to be dangerous in that way. (Autonomous systems will be created more for entertainment & eventually companionship, in which case much more attention will be paid to friendly goal systems.)

Some danger comes from originally non-autonomous systems which become autonomous (ie, whose capabilities are expanded to general planning to maximize some industrially useful utility function such as production quality, cash flow, ad clicks, etc). These are more likely to have strange values not good for us. The hope here lies in the possibility that these would be so numerous that cooperation with society & other AIs would be the only option. A scenario where the first AI capable of true recursive self improvement became an industrial AI and rose to power before its makers could re-sell it to many different jobs seems unlikely (because the loop between research and industry is not usually like that). More likely, by the time the first human-level system is five years old (old enough to start thinking about world domination), different versions of it with different goals and experiences will be in many places having many experiences, cooperating with humankind more than each other.

But, anyway, all of this is an argument that the probability is low, not that it would be impossible or that the consequences aren't hugely undesirable. That's why I said the level of awareness in the AI community is good and that research into safe AI could use a bit more funding.