The simple picture on AI safety

Alex Flint

At every company I've ever worked at, I've had to distill whatever problem I'm working on to something incredibly simple. Only then have I been able to make real progress. In my current role, at an autonomous driving company, I'm working in the context of a rapidly growing group that is attempting to solve at least 15 different large engineering problems, each of which is divided into many different teams with rapidly shifting priorities and understandings of the problem. The mission of my team is to make end-to-end regression testing rock solid so that the whole company can deploy code updates to cars without killing anyone. But that wasn't the mission from the start: at the start it was a collection of people with a mandate to work on a bunch of different infrastructural and productivity issues. As we delved into mishmash after mishmash of complicated existing technical pieces, the problem of fixing it all became ever more abstract. We built long lists of ideas for fixing pieces, wrote extravagant proposals, and drew up vast and complicated architecture diagrams. It all made sense, but none of it moved us closer to solving anything.

At some point, we distilled the problem down to a core that actually resonated with us and others in the company. It was not some pithy marketing-language mission statement; it was not a sentence at all -- we expressed it a little differently every time. It represented an actual comprehension of the core of the problem. We got two kinds of reactions: to folks who thought the problem was supposed to be complicated, out distillation sounded childishly naive. They told us the problem was much more complicated. We told them it was not. To folks who really wanted to solve the problem with their own hands, the distillation was energizing. They said that yes, this is the kind of problem that can be solved.

I have never encountered a real problem without a simple distillation at its core. Some problems have complex solutions. Some problems are difficult or impossible to solve. But I have never encountered a problem that did not have a deeply simple core to it, and I have always found that enunciating this core is helpful.

So here is my distillation of the AI safety problem.

First, there is the technical engineering problem of building a safe superintelligence. This is a technical engineering problem, and nothing more. Yes, it is very difficult. Yes, it has a deep and unusual philosophical component. But it is an engineering problem no fundamentally different from building an electric car or deploying a cryptocurrency. Do not put this part of the problem into some special category in your mind. Stare at it until it looks like a very very difficult version of building a house from scratch.

Second, there is the coordination problem of making sure that an unsafe superintelligence is not built first. This is a coordination problem. It is also very difficult. It is in the same category as getting the whole world to agree to stop using CFCs in airconditioners, or to reduce carbon emissions.

The problem really is that simple: build a safe superintelligence; coordinate to not build an unsafe one first. It is not at all easy, but it really is simple. Solving the problem will require research, engineering, funding, management, and other more and less familiar ingredients you may or may not recognize from elsewhere in the world.

If the problem seems scary to you, that is a fact about your mind, not about the problem. The problem itself is merely difficult.

It is a problem that you can work on.

Couldn't you say the same thing about basically any problem? "Problem X is really quite simple. It can be distilled down to these steps: 1. Solve problem X. There, wasn't that simple?"

I think the distillation needs to (1) be correct and (2) resonate with people. It's really hard to find a distillation that meets these two criteria. Finding such distillations is a good part of what a tech sector product manager spends their time doing.

I'm not at all sure that my distillation of AI safety meets those two.

The "correct" part is what everyone is concerned about, though.

So are you saying that my distillation didn't unpack the problem sufficiently to be helpful (in which case I agree but that wasn't my goal), or are you saying that I missed something important / included something unimportant?

I agree that a distillation of a complex problem statement to a simple technical problem represents real understanding and progress, and is valuable thereby. But I don't think your summary of the first half of the AI safety problem is one of these.

The central difficulty that stops this from being a "mere" engineering problem is that we don't know what "safe" is to mean in practice; that is, we don't understand in detail *what properties we would desire a solution to satisfy*. From an engineering perspective, that marks the difference between a hard problem, and a confused (and, usually, confusing) problem.

When people were first trying to build an airplane, they could write down a simple property that would characterize a solution to the problem they were solving: (thing) is heavier than air yet manages to stay out of contact with the ground, for, let say at least minutes at a time. Of course this was never the be-all end-all of what they were trying to accomplish, but this was the central hard problem a solution of which they expected to be able to build on incrementally into the unknown direction of Progress.

I can say the same for, for example, the "intelligence" part of the AI safety problem. Using Eliezer Yudkowsky's optimization framework, I think I have a decent idea of what properties I would want a system to have when I say I want to build an "intelligence". That understanding may or may not be the final word on the topic for all time, but at least it is a distillation that can function as a "mere" engineering problem, a solution for which I can recognize as such and which we can then improve on.

But for the "safe" part of the problem, I don't have a good idea about what properties I want the system to achieve at all. I have a lot of complex intuitions on the problem, including simple-ish ideas that seem to be an important part of it and some insight of what is definitely *not* what I want, but I can't distill this down to a technical requirement that I could push towards. If you were to just hand me a candidate safe AI on a platter, I don't think I could recognize it for what it is; I could definitely reject *some* failed attempts, but I could not tell whether your candidate solution is actually correct or whether it has a flaw I did not see yet. Unless your solution comes with a mighty lecture series explaining exactly why your solution is what I actually want, it will not count as a "solution". Which makes the "safe" part of your summary, in my mind, neither really substantive understanding, nor a technical engineering problem yet.

I parse you as pointing to the clarification of a vague problem like "flight" or "safety" or "heat" into an incrementally more precise concept or problem statement. I agree this type of clarification is ultra important and represents real progress in solving a problem, and I agree that my post absolutely did not do this. But I was actually shooting for something quite different.

I was shooting for a problem statement that (1) causes people to work on the problem, and (2) causes them to work on the right part of the problem. I claim it is possible to formulate such a problem statement without doing any clarification in the sense that you pointed at, and additionally that it is useful to do so because (1) distilled problem statements can cause additional progress to be made on a problem, and (2) clarification is super hard, so we definitely shouldn't block causing additional work to happen until clarification happens, since addition work could be a key ingredient in getting to key clarifications.

To many newcomers to the AI safety space, the problem feels vast and amorphous, and it seems to take a long time before newcomers have confidence that they know what exactly other people in the space are actually trying to accomplish. During this phase, I've noticed that people are mostly not willing to work directly on the problem, because of the suspicion that they have completely misunderstood where the core of the problem actually is. This is why distillation is valuable even absent clarification.

What is the core problem of your autonomous driving group?!

It doesn't matter! :P

"safe" Is a value loaded word. To make a "safe" car, all you need is the rough approximation to human values that says humans value not being injured. To make a "safe" super intelligence, you need a more detailed idea of human values. This is where the philosophy comes in, to specify exactly what we want.