The problem/solution matrix: Calculating the probability of AI safety "on the back of an envelope"

3Gordon Seidoh Worley

1avturchin

1Isnasene

3John_Maxwell

New Comment

4 comments, sorted by Click to highlight new comments since: Today at 10:14 PM

Assuming that we have no less than 20 problems and for each problem we have 80 per cent chances of success (if we know more, it is not a problem) we have total only 1 per cent of the total probability of success.

So, this method produces very pessimistic expectations even if problems themselves seems solvable. EY wrote somewhere that multiplying probabilities is bad way to estimate the chances of success of cryonics, as this method underestimate the growth of experience of the problem solver.

Another takeaway could be that we should search total AI safety solutions where we have less unknowns.

Suppose we had a matrix where each column corresponds to an AI safety problem and each row corresponds to a proposed safety measure.

Anyone have a sense of what distinct set of AI safety problems we're faced with? I've tried my hand at this but the way I usually think about AI safety is a mapping between one very big problem (getting AI not to do things we don't want it to do) and tens or so of low-probability-of-success safety measures trying to address it (in the veins of defining the things we don't want AI to do and limiting the number of things that AI can do).

Should I be doing this differently? Maybe we could try to explicitly define the kinds of superintelligent AI we expect humanity to build and then design specific safety measures for each use-case. This would let us build a matrix but I'm also worried that superintelligent AI applications are too general to actually carve out a finite set of use-cases from the spectrum of possible AIs we could develop.

Anyone have a sense of what distinct set of AI safety problems we're faced with?

See section 4, "Problems with AGI", in this review for a list of lists.

However, I suspect the thing would work best in conjunction with a particular proposed FAI design where each column corresponds to a potential safety problem people are worried about with it.

-- Scott Alexander

My earlier post Three Stories for How AGI Comes Before FAI, was an "AI research strategy" post. "AI research strategy" is not the same as "AI strategy research". "AI research strategy" relates to the Hamming question for technical AI alignment researchers: What are the most important questions in technical AI alignment and why aren't you trying to answer them?

Here's a different way of looking at these questions (which hopefully has a different set of flaws).

Suppose we had a matrix where each column corresponds to an AI safety problem and each row corresponds to a proposed safety measure. Each cell contains an estimate of the probability of that safety measure successfully addressing that problem. If we assume measure successes are independent, we could estimate the probability that any given problem gets solved if we build an AGI which applies all known safety measures. If we assume problem successes are independent, we could estimate the probability that all known problems will be solved.

What about unknown problems? Suppose that God has a list of all AI safety problems. Suppose every time an AI alignment researcher goes out to lunch, there's some probability that they'll think of an AI safety problem chosen uniformly at random from God's list. In that case, if we know the number of AI alignment researchers who go out to lunch on any given day, and we also know the date on which any given AI safety problem was first discovered, then this is essentially a card collection process which lets us estimate the length of God's list as an unknown parameter (say, using maximum likelihood on the intervals between discovery times, or you could construct a prior based on the number of distinct safety problems in other engineering fields). In lay terms, if no one has proposed any new AI safety problems recently, it's plausible there aren't that many problems left.

How to estimate the probability that unknown problems will be successfully dealt with? Suppose known problems are representative of the entire distribution of problems. Then we can estimate the probability that an unknown problem will be successfully dealt with as follows: For each known problem, cover up rows corresponding to the safety measures inspired by that problem, then compute its new probability of success. That gives you a distribution of success probabilities for "unknown" problems. Assume success probabilities for actually-unknown problems are sampled from that distribution. Then you could compute the probability of any given actually-unknown probability being solved by computing the expected value of that distribution. Which also lets you compute the probability that your AGI will solve all the unknown problems too (assuming problem probabilities are independent).

What could this be useful for? Doing this kind of analysis could help us know whether it's more valuable to discover more problems or discover more safety measures on the current margin. It could tell us which problems are undersolved and would benefit from more people attacking them. Even without doing the analysis, you can see that trying to solve multiple problems with the same measure could be a good idea, since those measures are more likely to generalize to unknown problems. If overall success odds aren't looking good, we could make our first AI some kind of heavily restricted tool AI which tries to augment the matrix with additional rows and columns. If success odds

arelooking good, we could compare success odds with background odds of x-risk and try to figure out whether to actually turn this thing on.Obviously there are many simplifying assumptions being made with this kind of "napkin math". For example, it could be that the implementation of safety measure A interacts negatively with the implementation of safety measure B and reverses its effect. It could be that we aren't sampling from God's list uniformly at random and some problems are harder to think of than others. Whether this project is actually worth doing is an "AI research strategy strategy" question, and thus above my pay grade. If it's possible to generate the matrix automatically using natural language processing on a corpus including e.g. the AI Alignment Forum, I guess that makes the project look more attractive.