If we knew how to build a machine that chooses its outputs as to maximize some property of the surrounding universe, such a machine would be very dangerous, because maximizing almost any easily defined property leads to a worthless universe (without humans, or with humans living pointless lives, etc.) I believe the preceding statement is uncontroversial, and most arguments around the necessity of Friendly AI are really about how likely we are to build such a machine, or maybe something else will happen first, etc.

Instead of adding to the existing arguments, I want to reframe the question thus: what course of action would you recommend to a small group of smart people, assuming for the moment that the danger is real? In other words, what should SingInst do on an alternate Earth where normal human science will eventually build unfriendly AI? In particular:

- How do you craft your message to the public?

- What's your hiring policy?

- Do you keep your research secret?

- Do you pursue alternate avenues like uploads, or focus only on FAI?

For the sake of inconvenience, assume that many (though not all) of the insights required for developing FAI can also be easily repurposed to hasten the arrival of UFAI.

Thanks to Wei Dai for the conversation that sparked this post.

New to LessWrong?

New Comment
27 comments, sorted by Click to highlight new comments since: Today at 11:25 PM
[-][anonymous]13y150

If the danger is close (<50 years) you probably can't wait for biology to catch up enough to work on uploads.

Your goal is to build a friendly AI quickly. (There's no way to enforce that everybody else's AI should be "required" to be friendly.)

You probably should ignore the general public and focus on those who can do the most to help you: rich futurist donors, the government, and the scientific community. The donors are the easiest to reach, but you can't stop there -- you don't just need money, you need good researchers, and almost all the right kinds of brains are in science. I think if you can't connect with scientists, you'll wind up hiring mostly people who aren't quite good enough to be scientists. Getting mainstream scientific credibility is a drag on speed, but I really don't see how you can succeed without it. Convince a PhD or two to join your institute, get some papers published, try to get into conferences. In the process, you'll not only pick up top scientists to hire, you'll get valuable criticism.

Some research probably must be secret, but there's "basic research" that doesn't have to be. A policy of total secrecy makes you very, very likely to be wrong, and in this scenario that means we all die.

A policy of total secrecy makes you very, very likely to be wrong

Thanks, that's a new point to me. But it's not always true, remember the Manhattan project.

[-]asr13y220

The Manhattan Project is a very misleading example. Yes, it was "secret", in that nothing was published for outside review. But the project had a sizeable fraction of all the physics talent in the western world associated with it. Within the project, there was a great deal of information sharing and discussion; the scientific leadership was strongly against "need to know policies."

At that scale, having outside review is a lot less necessary. Nobody in AI research is contemplating an effort of that scale, so the objection to secrecy is valid.

Also, the Manhattan project did a poor job of maintaining total secrecy. Reliably secret projects are possible only on a much smaller scale, and the likelihood of information leaking out grows very rapidly as soon as more than a handful of people are involved.

The Manhattan project was facing a huge coordinated enemy who could pay spies, etc. SingInst isn't facing such an enemy yet, so secrecy should be easier for them.

Actually, most of the WW2-era Soviet spies were communists who spied out of genuine conviction, not as paid traitors. This makes the parallel even more interesting, considering that people engaged in a secret AI project might develop all sorts of qualms.

Some research probably must be secret, but there's "basic research" that doesn't have to be. A policy of total secrecy makes you very, very likely to be wrong, and in this scenario that means we all die.

I don't know about being wrong, but secrecy makes it less likely that people will trust you.

To add another question to the list: Would you keep internal discussions on "what should we do" secret?

(Right now it appears as if there is little disagreement within the SIAI "inner circle" regarding how best to achieve a positive Singularity, but is it because there is actually little disagreement, or because they think such disagreements should be kept from public view for PR reasons?)

Also, would any SIAI people like to chime in and say whether they see outside discussions like this one as being productive or counter-productive?

[-][anonymous]13y10

I'd make anything except the actual technical details of the AI work public. I would also make any technical details that could not be used by others to create an unfriendly AI public. Information hoarding is always counterproductive in the medium to long term.

What would you do if AI were dangerous?

Apparently, I'd be drinking absinthe and playing Eastside Hockey Manager 2007 at 2am while browsing the LW discussion section and obsessively glancing at my inbox to see if the girl I met while walking my dog today has emailed me back and wondering how the hell I'm going to wake up in time for work later today.

what course of action would you recommend to a small group of smart people, assuming for the moment that the danger is real?

An entirely different question! I will attempt to answer below.

What's your hiring policy?

Having a hiring policy under which only people who wouldn't betray the project get hired is a hack not dissimilar from securing the project's prospective FAI in a sandbox from which it "can't escape" by virtue of it being unconnected from the internet, etc. People are not secure systems, and no one should know enough to betray the project by having enough knowledge to create the technically easier unfriendly general AI.

To the extent this is not feasible, researchers who know the whole picture should monitor each other, the way Mormon missionaries do. It is the buddy system that allows them to discuss their religion with thousands with few enough of them losing faith.

How do you craft your message to the public?

You start from scratch with basic rationality. Otherwise, pull the levers of irrationality hard, with the dark arts, and be called out on it, losing prestige with the important people.

What would you want from the public, besides money, anyway? Prestige so people go into FAI instead of, say, string theory? There are more effective ways to achieve that than public belief that this research is important, notice that the public doesn't think impractical research is important in other areas.

Do you keep your research secret?

Will it be advantageous to announce research projects before committing them, so you are obliged to share results? Science should work on this model, by precommitting to publishing experiments before they are conducted, rather than have completed experiments compete for journal space, but it doesn't. If not, if you were to judge each individual piece of your knowledge and review it for publicization, would you be too secretive? If so, fix that incongruency of yours, and then judge each piece of information individually, after you acquire it.

Do you pursue alternate avenues like uploads, or focus only on FAI?

The most important thing is to figure out exactly how much easier GAI is than FAI. The less they differ in difficulty, the less important unrelated approaches are.

If we knew how to build a machine that chooses its outputs as to maximize some property of the surrounding universe, such a machine would be very dangerous, because maximizing almost any easily defined property leads to a worthless universe (without humans, or with humans living pointless lives, etc.) I believe the preceding statement is uncontroversial [...]

You also need a machine to be powerful for it to be dangerous. A weak maximiser is not likely to be dangerous.

Do you keep your research secret?

No. Even in an alternate universe where anachronistic AI is easy, fallible humans trying to do research in secret without feedback from a scientific community just isn't going to work. There might possibly be a case for keeping the last stages of development secret; if you're convinced you're on an end run, that you're writing the actual code for an FAI, it would not necessarily be irrational to keep the code under wraps. But secret research is a nonstarter.

Note that the Manhattan project is not a counterexample. By the time it began, the research had already been done; all that remained was development, which was why they could go from start to success in only a handful of years. (One physicist asked in the 1930s about the possibility of an atomic bomb, replied that it was physically possible but hopelessly impractical because you would need to turn a whole country into a uranium refinery. Nor was he wrong: the Manhattan project threw resources exceeding those of many countries at the problem, something that would indeed have been impractical for anyone except the US.)

How do you craft your message to the public?

What exactly are you are trying to achieve by sending a message to the public? Bear in mind that if they believe you, you've just committed suicide; it'll get turned into a political issue, any hope of successfully building FAI will disappear, and all the other forms of death that have nothing to do with AI will have an uncontested field. I wouldn't try to keep it secret, but I wouldn't go out of my way to play public relations Russian roulette with the future of the universe either.

As for what I would do, I'll follow up with that in another comment.

In the scenario described, what you want most of all is a way to turn UFAI projects into FAI projects. Fortunately, this should be possible.

Think of it this way: you could postulate a universe where the dangerous stuff is pure magic, where there is a particular incantation that will summon Azathoth while every similar sequence of symbols does nothing more than you would expect, so the particular incantation can be known only by divine revelation. But that's not what you're postulating, right? We are talking about a universe where this stuff is science, not magic? In that case it should be scientifically knowable.

So what I would try to do is figure out:

  1. What should we expect to see if the superintelligent AI is easier than most experts think, so FAI is important theory is true? What predictions does the theory make about what we should observe, that we would not observe if mainstream science is correct?

  2. What should we do in response? How exactly do you aim a project at FAI instead of UFAI? (For example, should researchers move away from brute force techniques like genetic programming, in favor of greater emphasis on techniques like logical reasoning?)

And I'd write these up in an ongoing dialogue aimed at making surprising and successful predictions and thereby convincing relevant parties to take the correct actions for building FAI.

A lot of it would depend how hard it was to make safe AI.

I think the first thing I would do would be to try and look at intelligence to get a better idea of the problems involved and likely time scales. Get the brightest, best and trustworthy to discuss it in secret to discover whether intelligence is neat or messy and a way of pursuing the research scientifically*. Once a path to creating intelligence has been found you can make judgements on things like size of team and whether uploads are a good bet compared to it or not.

In terms of hiring I would look for not overly ambitious or patriotic people, as well as the requisite intelligence. You'd need a pretty good screening process as well.

For presenting it to the public, I would present it as an exploratory process to lay the ground work on the most important question in the history of humanity, how to deal with AI. I'd stress the dangers, that we are improving knowledge through neuroscience, that even the least dangerous possible AIs will turn human society on its head, so we need to answer the question in an expedient fashion.

*Part of the problem with AI as a science is that you can't tell whether you have created something useful or not. There are lots of ways to solve problems in machine learning that don't seem to be that useful for full scale AGI. And chatbots can give the illusion of intelligence with not much behind it.

Aside: You managed to make me realise one of the things that is different from my world view to the majority here: I wouldn't build an AI to model and maximise something external instead parts of programming inside the AI would tend to persist if it was better than other bits of programming at getting a signal, so a satisficer of that signal. Anyone interested in a satisficer vs maximizer discussion post?

I think any sufficiently rigorous insight that allows you to build a satisficer of some property will also allow you (or someone else who knows your insight) to build a maximizer of the same property, so research into satisficers doesn't seem to be safe from a FAI point of view.

[-]asr13y50

any sufficiently rigorous insight that allows you to build a satisficer of some property will also allow you (or someone else who knows your insight) to build a maximizer of the same property

I'm sure this is wrong. There are cases where provably good heuristics are known, and precise solutions are believed intractable. Traveling salesman comes to mind. It follows necessarily that there are tasks that can be done "well enough", but not perfectly.

You're right, of course. My comment was wrong and I should've used some other word (perhaps "optimizer") in place of "maximizer", because I actually wanted to make a slightly different point...

Imagine someone comes up with a rigorous way to write a program that, when run on any computer anywhere, inspects the surrounding universe and then manipulates it to somehow cause the production of 100 paperclips. This requires insight that we don't have yet, but it seems to me that any such insight should be easy to weaponize (e.g. ask it to produce a trillion paperclips instead of 100) once it's discovered. It seems weird to hope that 100 paperclips will be a tractable problem but a trillion would be intractable. That would require an amazing accidental correspondence between "tractable" and "safe".

Ah, you meant satisficer in this sense of the word. I meant to use it in the sense of what type of system humans are. From the variety of goals we pursue we are clearly not maximizer or satisficers of any external property of the universe. People regularly avoid reproducing for example and don't choose actions that might maximise it even when they do reproduce (e.g. not donating to sperm banks).

"The type of system humans are" has a big disadvantage compared to mathematically simpler systems like maximizers: it seems harder to reason about such "fuzzy" systems, e.g. prove their safety. How do you convince yourself that a "fuzzy" superintelligence is safe to run?

We have an existence proof of intelligences based upon "The type of systems humans are", we don't for pure maximizers. It is no good trying to develop friendliness theory based upon a pure easily reasoned about system if you can't make an intelligence out of it.

So while it is harder, this may be the sort of system we have to deal with. It is these sorts of questions I wanted to try to answer with the group in my original post.

I'll try to explain why I am sceptical of maximizer based intelligences in a discussion post. It is not because they are inhuman.

Well, it's also hard to prove the safety of maximizers. Proving the danger, on the other hand...

In practice, maximizers are not things that actually find the maximum value. They are typically hill-climbers, of some kind or another. They try and find better values - i.e. local maxima. Maximizers do not have to be perfect to warrant the name.

I'd build satisficers for theoretical reasons, not safety ones. Maximizers to me have problems with modifying/improving the model of the world that they are trying to maximize*. Satisficers don't tend to use models of the world at the lowest level, instead they get proximate signals for the things they are supposed to me maximizing (e.g. dopamine for evolutionary fitness in animal biology) and have to build models of the world that are good at getting the signals. But they really don't want to maximise those signals because they are not want they actually are supposed to maximise.

Every time I try to say more than this I lapse into a big long post. I'll see if I can marshal my thoughts somewhat.

*Things like AIXI don't have this problem because they don't have to decide how to best to modify their model as they keep all possible models in mind at once. Which is one reason I don't think it is a good guide for AI.

so research into satisficers doesn't seem to be safe from a FAI point of view.

Failing to do research on safety grounds isn't going to help very much either - that just means that another team will make the progress instead.

If we knew how to build a machine that chooses its outputs as to maximize some property of the surrounding universe, such a machine would be very dangerous, because maximizing almost any easily defined property leads to a worthless universe (without humans, or with humans living pointless lives, etc.)

A sufficiently anachronistic such machine, yes, i.e. a maximizer much more powerful than the tech level of the world in which it exists should be able to support. It's like asking whether a microwave oven would be dangerous if its power output was, say, 10^50 watts instead of the more typical 800 watts. The answer is that such a device would probably be safe for a civilization capable of building it, but if somehow created in today's world it would indeed be dangerous.

That leads to the answer to one of your questions:

Do you pursue alternate avenues like uploads, or focus only on FAI?

Are you postulating a world where anachronistic technology in general is easy, or where AI is easy but other things like uploads remain realistically difficult? If the former, uploads might well be worth going for; if the latter, you're probably better off to forget about everything else and focus only on FAI.

[+][anonymous]13y-70