[ Question ]

[timeboxed exercise] write me your model of AI human-existential safety and the alignment problems in 15 minutes

by Quinn1 min read4th May 20212 comments

6

AI RiskAI
Frontpage

Motivation

I had a 15 minute interview last night in which I was asked "why do you believe in xrisk, and what does AI have to do with it?" I thought it was too big a question for a 15 minute interview, but nevertheless dove in to my inside view from first principles. Since diving into one's inside view from first principles even outside of a fifteen minute interview is really hard, I did a bad job, mostly rambled and babbled.

A broader motivation is that I'm interested in studying peoples' inside views / gears-level models as to hone my own.

Rules

In this exercise, you're allowed premises just try to point at them. It is not a "from first principles" sort of exercise. You're also allowed jargon without being too worried about how well the audience knows jargon (for example, in mine which I'll paste below I assume familiarity with the single & multi quadrants from ARCHES).

The only real rule is to limit yourself to 15 minutes. That's fifteen minutes wall time, with a literal clock.

Suggestion: don't read until you write!

New Answer
Ask Related Question
New Comment

2 Answers

(This was about 14:30 of writing time. I think it would probably fit into a 15-minute chunk of interview time. I deliberately avoided deleting or changing much as I went. I don't think anything in it is in any way original.)

So, first of all, why believe in existential risk? We know that sometimes species, empires, and the like come to an end. The human race is ingenious and adaptable, but there's no reason to think we're magically immune to every catastrophe that could wipe us out. (Which wouldn't necessarily mean killing every single human being; I would consider something an existential risk if it wiped out our civilization in a way that made it difficult to start over successfully.)

Since humanity has been around for a long time, it's reasonable to suppose that existential risks are rare. So why care about them? Because the possible impact is very large. For instance, suppose you believe the following things (I am not claiming they're right, only that a reasonable person could believe them): 1. We are probably alone in the universe, or at least in this galaxy. (Because otherwise we'd expect that some other intelligent species would have expanded to take up a non-negligible fraction of the galaxy, and we would expect to see signs of that.) 2. If all goes well for us, there's a reasonable chance that we will travel to, and colonize, other star systems, and eventually maybe most of this galaxy. If these things are true then it is possible that the number of future humans will be vastly greater than the present population, and that the future of the human race is the future of sentient life in our galaxy. In that case, if something wipes us out early on, the potential loss is staggeringly great.

One might also simply care a lot about the fate of the human race, and feel that its extinction would be a terrible thing whether or not the alternative involves billions of billions of lives.

Now, what about AI? The following is highly speculative, and I don't think anyone should be very confident about it. But: first of all, if we are able to make human-level intelligences then it is likely not very difficult to make substantially-smarter-than-human intelligences. After all, humans have varied a lot in intellectual capabilities, and I know of no reason to think that although dogs and ants are much less smart than humans nothing can be much more smart than humans. And at least some forms of smarterness seem easy; e.g., it's often possible to take a computer program and run it much faster, by means of more expensive hardware or more attention to optimizing the software. So, if we make human-level AI then soon afterwards we will have better-than-human-level AI. Now the smarter-than-human AI can work on making better AI, and will do it better and faster than we do, and naive mathematical models of the resulting progress suggest that after a short while progress will be infinitely fast. Obviously that is unlikely to happen, but it does suggest that we shouldn't be too surprised if soon after the first human-level AI there are artificial intelligences that are as much cleverer than Einstein as Einstein was cleverer than a dog.

And, as the history of humanity on earth shows, intelligence and power are closely related. A vastly superhuman intelligence might be astonishingly effective at persuading people of things. It might be astonishingly effective at predicting markets. It might be astonishingly effective at inventing new technology. One way or another, it would not be surprising if soon after the first human-level AI we had AI that vastly outstrips us in power.

So we had better, somehow, do a really good job of making sure that any AI we build, and any AI that builds, behaves in ways that don't wipe us out. Unfortunately, that seems to be a hard problem.

Given that systems of software which learn can eventually bring about 'transformative' impact (defined as 'impact comparable to the industrial revolution'), the most important thing to work on is AI. Given that the open problems in learning software between now and its transformativity can be solved in a multitude of ways, some of those solutions will be more or less beneficial, less or more dangerous, meaning there's a lever that altruistic researchers can use to steer outcomes in these open problems. Given the difficulty of social dilemmas and coordination, we need research that is aimed at improving single-multi, multi-single, and multi-multi capabilities until those capabilities outpace single-single capabilities. Given the increase in economic and military power implied by transformative systems, civilization could be irrevocably damaged by simple coordination failures.