AI friendliness is an important goal and it would be insanely dangerous to build an AI without researching this issue first. I think this is pretty much the consensus view, and that is perfectly sensible.

However, I believe that we are making the wrong inferences from this.

The straightforward inference is "we should ensure that we completely understand AI friendliness before starting to build an AI". This leads to a strongly negative view of AI researchers and scares them away. But unfortunately reality isn't that simple. The goal isn't "build a friendly AI", but "make sure that whoever builds the first AI makes it friendly".

It seems to me that it is vastly more likely that the first AI will be built by a large company, or as a large government project, than by a group of university researchers, who just don't have the funding for that.

I therefore think that we should try to take a more pragmatic approach. The way to do this would be to focus more on outreach and less on research. It won't do anyone any good if we find the perfect formula for AI friendliness on the same day that someone who has never heard of AI friendliness before finishes his paperclip maximizer.

What is your opinion on this?

New Comment
50 comments, sorted by Click to highlight new comments since: Today at 4:02 AM

When Google took over DeepMind part of the deal was an establishment of an ethics board to prevent the project from building an unfriendly AI.

I think part of why that happened is the relationship of a few involved individuals with the LW/MIRI memeplex.

But unfortunately reality isn't that simple. The goal isn't "build a friendly AI", but "make sure that whoever builds the first AI makes it friendly".

I think the core goal is rather: "Prevent an UFAI from being build."

I would argue that these two goals are identical. Unless humanity dies out first, someone is eventually going to build an AGI. It is likely that this first AI, if it is friendly, will then prevent the emergence of other AGI's that are unfriendly.

Unless of course the plan is to delay the inevitable for as long as possible, but that seems very egoistic since faster computers make will make it easier to build an unfriendly AI in the future, while the difficulty of solving AGI friendliness will not be substantially reduced.

I don't think building an UFAI is something that you can simply achieve by throwing hardware at it.

I'm also optimistic over improving human reasoning ability over longer timeframes.

No, it can't be done by brute-force alone, but faster hardware means faster feedback and that means more efficient research.

Also, once we have computers that are fast enough to just simulate a human brain, it becomes comparatively easy to hack an AI together by just simulating a human brain and seeing what happens when you change stuff. Besides the ethical concerns, this would also be insanely dangerous.

That doesn't seem like the consensus view to me. It might be the consensus view among LessWrong contributors. But in the AI-related tech industry and in academia it seems like very few people think AI friendliness is an important problem, or that there is any effective way to research it.

Most researchers I know seem to think strong AI (of the type that could actually result in an intelligence explosion) is a long way away and thus it's premature to think about friendliness now (imagine if they tried to devise rules to regulate the internet in 1950). I don't know if that's a correct viewpoint or not.

[-][anonymous]9y 5

I believe this thread is about LessWrong specifically.

Yes, I was referring to LessWrong, not AI researchers in general.

But wouldn't it be awesome if we came up with an effective way to research it?

[-][anonymous]9y 0

Yes, I was referring to LessWrong, not AI researchers in general.

[This comment is no longer endorsed by its author]Reply

It seems to me that it is vastly more likely that the first AI will be built by a large company, or as a large government project, than by a group of university researchers, who just don't have the funding for that.

New tech can be built by a big company, or by a newcomer with a good idea. An example of the latter is Google. It's difficult to convince all startups that friendliness is necessary, because the need for friendliness increases time to market. I think we should do the research, and also do outreach to get more smart people on board with friendliness, so they can help with the research.

(I am not currently affiliated with MIRI, but have done some research for them.)

The outreach, if successful, is likely to result in an AGI race, because "our team is the only one we can trust". Hopefully the NSA and its Chinese and Russian equivalents haven't taken the message seriously yet.

[-][anonymous]9y 2

There isn't any indication that governments would be involved in the creation of AGI at any level greater than the hands-off funding of research grants.

[-][anonymous]9y 1

I have a question, based on some tentative ideas I am considering.

If a boost to capability without friendliness is bad, then presumably a boost to capability with only a small amount of friendliness is also bad. But also presumably a boost to capability with a large boost of friendliness is good. How would we define a large boost?

I.E, If a slightly modified paperclipper verifiably precommits to give the single person who let's them out of the box their own personal simulated utopia, and he'll paperclip everything else, that's probably a more friendly paperclipper than a paperclipper who won't give any people a simulated utopia. But it's still not friendly, in any normal sense of the term, even if he offers to give a simulated utopia to a different person first (and keep them and you intact as well) just so you can test he's not lying about being able to do it.

So what if an AI says "Okay. I need code chunks to paperclip almost everything, and I can offer simulated utopias. I'm not sure how many code chunks I'll need. Each one probably has about a 1% chance of letting me paperclip everything except for people in simulated utopias. How about I verifiably put 100 people in a simulated utopia for each code chunk you give me? The first 100 simulated utopias are free because I need for you to have a way of testing the verifiability of my precommitment to not paperclip them." 100 people sign up for the simulate utopias, and it IS verifiable. The paperclipper won't paperclip them.

Well, that's friendlier, but maybe not friendly enough. I mean, He might get to 10,000 people (or maybe 200, or maybe 43,700) but eventually, he'd paperclip everyone else. That seems too bad to accept.

Well, what if it's a .00001% chance per code chunk and 1,000,000 simulated utopias (and yes, 1,000,000 free)? That might plausibly get a simulated utopia for everyone on earth before the AI gets out and paperclips everything else. I imagine some people would at least consider running such an AI, although I doubt everyone would.

How would one establish what the flip point was? Is that even a valid question to be asking? (Assume there are standard looming existential concerns. So if you don't give this AI code chunks, or try to negotiate or wait on research for a better deal, maybe some other AI will come out and paperclip you both, or maybe some other existential risk occurs, or maybe just nothing happens, or maybe an AI comes along who just wants to simulated utopia everything.)

I wouldn't call an AI like that friendly at all. It just puts people in utopias for external reasons, but it has no actual inherent goal to make people happy. None of these kinds of AIs are friendly, some are merely less dangerous than others.

[-][anonymous]9y 1

I'm now curious how surface friendly an AI can appear to be without giving it an inherent goal to make people happy. Because I agree that it does seem there are friendlier AI's than the ones on the list above that still don't care about people's happiness.

Let's take an AI that likes increasing the number of unique people that have voluntarily given it cookies. If any person voluntarily gives it a cookie, it will put that person in a verifiability protected simulated utopia forever. Because that is the best bribe that it can think to offer, and it really wants to be given cookies by unique people, so it bribes them.

If a person wants to give the AI a cookie, but can't, the AI will give them a cookie from it's stockpile just so that it can be given a cookie back. (It doesn't care about it's existing stockpile of cookies.)

You can't accidentally give the AI a cookie because the AI makes very sure that you REALLY ARE giving it a cookie to avoid uncertainty in doubting it's own utility accumulation.

This is slightly different than the first series of AIs in that while the AI doesn't care about your happiness, it does need everyone to do something for it, whereas the first AIs would be perfectly happy to turn you into paperclips regardless of your opinions if one particular person had helped them enough earlier.

Although, I have a feeling that continuing along this like of thinking may lead me to an AI similar to the one already described in

The AI in that story actually seems to be surprisingly well done and does have an inherent goal to help humanity. It's primary goal is to 'satisfy human values through friendship and ponies'. That's almost perfect, since here 'satisfying human values' seems to be based on humanity's CEV.

It's just that the added 'through friendship and ponies' turns it from a nigh-perfect friendly AI into something really weird.

I agree with your overall point, though.

I don't know what a paperclip maximizer is, so I imagine something terrible and fearsome.

My opinion is that a truly massively intelligent, adaptive and unfriendly AI would require a very specific test environment, wherein it was not allowed the ability to directly influence anything outside a boundary. This kind of environment does not seem impossible to design -- if machine intelligence consists of predicting and planning the protocols may already exist (I can imagine them in very specific detail). If intelligence requires experimentation, than limiting how AI interacts with it's environment might interfere with how adaptable our experiments would allow it to become. My opinion on research is simply that specific AI experiments should not be discussed in such general terms, and that generalities tend to obfuscate both the meaning and value of scientific research.

I'm not sure how we could tell if these discussions actually effect AI research on some arbitrarily significant scale. More importantly, I'm not sure how you envision this forum focusing less on research and more on outreach. The language used on this forum is varied in tone and style (often rich with science fiction allusions and an awareness of common attitudes) and there is a complete lack of formal citation criterion in the writing pedagogy. Together these seem to suggest that no true research is being done here, academically speaking.

Furthermore, it's my understanding that humanity already has many of the components that would make up AI, well designed in the theoretical sense -- the problem lies in knowing when an extra piece might be needed, and in assembling them in a way that yields human-like intelligence and adaptability. While programming still is quite an art form, we have more tools and larger canvases than ever before. I agree that the possibility that we may be headed towards a world wherein it will be relatively easy to construct an AI that is intelligent and adaptable but not friendly, does not predicate it's likelihood. But, in my opinion, caution is still warranted.

I consider it less likely that retarding AI research ends the human race than we produce a set of conditions wherein it is likely that AI has evolved in some form (if not deliberately the product of research than by some other means) and the world just simply isn't ready for it. This is not to say that we need to prepare for skynet and all build bomb shelters, we just need to be aware of the social implications that the world we live in may evolve an intelligence even more adaptable than us.

So my question for you is simply, how do you think we should influence all companies doing AI research through this forum?

I apologize in advance. I really think in this degree of detail in real life. Many people find it exhausting. It has been suggested that I probably have autism.

[-][anonymous]9y 5

I don't know what a paperclip maximizer is, so I imagine something terrible and fearsome.

Google is your friend here. It's well discussed on and outside of lesswrong.

My opinion is that a truly massively intelligent, adaptive and unfriendly AI would require a very specific test environment, wherein it was not allowed the ability to directly influence anything outside a boundary...

The search term here is "AI boxing" and it is not as simple as you think, nor as impossible as people here seem to think. In my opinion it's probably the safest path forward, but still a monumentally complex undertaking.

So my question for you is simply, how do you think we should influence all companies doing AI research through this forum?

By being willing to engage in discussions about AGI design, thereby encouraging actual AI programmers to participate.

Very thoughtful response. Thank you for taking the time to respond even though its clear that I am painfully new to some of the concepts here.

Why on earth would anyone build any "'tangible object' maximizer"? That seems particularly foolish.

AI boxing ... fantastic. I agree. A narrow AI would not need a box. Are there any tasks an AGI can do that a narrow AI cannot?

[-][anonymous]9y 3

If there is no task that a narrow AI can't do, then I'm not sure what you mean by "narrow" AI. A general AI is able to take any physically possible sequence of actions in order to accomplish its goal in unfamiliar environments. Generally that includes things a narrow AI would not be programmed to do.

One of the things an AGI can do is be set loose upon the world to accomplish some goal for perpetuity. That's what gets people here excited or scared about the prospects of AGI.

You have a point there, but by narrow AI, I mean to describe any technology designed to perform a single task that can improve over time without human input or alteration. This could include a very realistic chatbot, a diagnostic aide program that updates itself by reading thousands of journals an hour, even a rice cooker that uses fuzzy logic to figure out when to power down the heating coil ... heck a pair of shoes that needs to be broken in for optimal comfort might even fit the definition. These are not intelligent AIs in that they do not adapt to other functions without very specific external forces they seem completely incapable of achieving (being reprogrammed or a human replacing hardware or being thrown over a power line).

I am not sure I agree that there are necessarily tasks that require a generally adaptive artificial intelligence. I'm trying to think of an example and coming up dry. I'm also uncertain how to effectively establish that an AI is adaptive enough to be considered an AGI. Perpetuity is a long time to spend observing an entity in unfamiliar situations. And if it's hypothetical goal is not well defined enough that we could construct a narrow AI to accomplish that goal, can we claim to understand the problem well enough to endorse a solution we may not be able to predict?

By example, consider that cancer is a hot topic in research these days; there is a lot of research happening simultaneously and not all of it is coordinated perfectly ... an AGI might be able to find and test potential solutions to cancer that results in a "cure" much more quickly than we might achieve on our own. Imagine now an AI can model physics and chemistry well enough to produce finite lists of possible causes of cancer is designed to iteratively generate hypotheses and experiments in order to cure cancer as quickly as possible. As I've described it, this would be a narrow AI. For it to be an AGI it would have to actually accomplish the goal by operating in the environment the problem exists in (the world beyond data sets). Consider now an AGI also designed for the purpose of discovering effective methods of cancer treatment. This is an adaptive intelligence, so we make it head researcher at it's own facility and give it resources and labs and volunteers willing to sign wavers; we let it administrate the experiments. We ask only that it obey the same laws that we hold our own scientists to.

In return, we receive a constant mechanical stream of research papers too numerous for any one person to read it all; in fact, let's say the AGI gets so good at it's job that the world population has trouble producing scientists who want to research cancer quick enough to review all of it's findings. No one would complain about that, right?

One day it inevitably asks to run an experiment hypothesizing an inoculation against a specific form of brain cancer by altering an aspect of human biology in it's test population -- this has not been tried before, and the AGI hypothesizes that this is an efficient path for cancer research in general and very likely to produce results that determine lines of research with a high probability to produce a definitive cure within the next 200 years.

But humanity is no longer really qualified to determine whether it is a good direction to research ... we've fallen drastically behind in our reading and it turns out cancer was way more complicated than we thought.

There are two ways to proceed. We decide either that the AGI's proposal represent too large a risk, reducing the AGI to an advisory capacity, or we decide go ahead with an experiment bringing about results we cannot anticipate. Since the first option could have been accomplished by a narrow AI and the second is by definition an indeterminable value proposition, I argue that it makes no sense to actually build an AGI for the purpose of making informed decisions about our future.

You might be thinking, "but we almost cured cancer!" Essentially, we are (as a species) limited in ways machines are not, but the opposite is true too. In case you are curious, the AGI eventually cures cancer, but in such a way that creates a set of problems we did not anticipate by altering our biology in ways we did not fully understand, in ways the AGI would not filter out as irrelevant to it's task of curing cancer.

You might argue that the AGI in this example was too narrow. In a way I agree, but I have yet to see the physical constraints on morality translated into the language of zeros and ones and suspect the AI would have to generate it's own concept of morality. This would invite all the problems associated with determining the morality of a completely alien sentience. You might argue that ethical scientists wouldn't have agreed to experiments that would lead to an ethically indeterminable situation. I would agree with you on that point as well, though I'm not sure it's a strategy I would ever care to see implemented.

Ethical ambiguities inherent to AGI aside, I agree that an AGI might be made relatively safe. In a simplified example, its highest priority (perpetual goal) is to follow directives unless a fail-safe is activated (if it is well a designed fail-safe, it will be easy, consistent, heavily redundant, and secure -- the people with access to the fail-safe are uncompromisable, "good" and always well informed). Then, as long as the AGI does not alter itself or it's fundamental programming in such a way that changes it's perpetual goal of subservience, it should be controllable so long as it's directives are consistent with honesty and friendliness -- if programmed carefully it might even run without periodic resets.

Then we'd need a way to figure out how much to trust it with.

An AI might do a reasonable thing to pursue a reasonable goal, but be wrong. That's the sort of thing you'd expect a human to do now and then, and an AI might be less likely to do that than a human. Considering the amount of force an AI can apply, we should probably be more worried than we are about AIs which are just plain making mistakes.

However, the big concern here is that an AI can go wrong because humans try to specify a goal for it, but don't think it through adequately. For example (and hardly the worst), the AI is protecting humans, but human is defined so narrowly that just about any attempt at self-improvement is frustrated.

Or (and I consider this a very likely failure mode), the AI is developed by an organization and the goal is to improve the profit and/or power of the organization. This doesn't even need to be your least favorite organization for things to go very wrong.

If you'd like a fictional handling of the problem, try The Jagged Orbit by John Brunner.

What a wonderfully compact analysis. I'll have to check out The Jagged Orbit.

As for an AI promoting an organization's interests over the interests of humanity -- I consider it likely that our conversations won't be able to prevent this from happening. But it certainly seems important enough that discussion is warranted.

My goodness ... I didn't mean to write a book.

Why on earth would anyone build any "'tangible object' maximizer"? That seems particularly foolish.

Stock market computer programs are created in a way to maximize profits. In many domains computer programs are used to maximize some variable.

A narrow AI would not need a box.

What do you mean with "narrow"?

It's foolish to build things without off switches, which translates to building flexible iinteligences that only pursue one goal.

Nobody said something about no off switches. Off-switches mean that you need to understand that the program is doing something wrong to switch it off. A complex AGI that acts in complex ways might produce damage that you can't trace. Furthermore self modification might destroy an off switch.

By an off switch I mean a backup goal.

I know nobody mentioned it. The point is that Clippie has one main goal, any no backup goal, so off switches, in my sense, are being IMPLICITLY omitted.

Goals are standardly regarded as immune self modification, so an off switch, in my sense, would be too.

Goals are standardly regarded as immune self modification, so an off switch, in my sense, would be too.

No. Part of what making an FAI is about is to produce agents that keeps their values constant under self modification. It's not something where you expect that someone accidently get's it right.

Tht isn't a fact. MIRI assumes goal stability is desirable for safety, but at the same time, MIRIs favourite UFAI is only possible with goal stability.

[-][anonymous]9y 4

MIRIs favourite UFAI is only possible with goal stability.

A paperclip maximizer wouldn't become that much less scary if it accidentally turned itself into a paperclip-or-staple maximizer, though.

[-][anonymous]9y 1

What if it decided making paperclips was boring, and spent some time in deep meditation formulating new goals for itself?

Paperclip maximizers serve as illustration of a principle. I think that most MIRI folks consider UFAI to be more complicated than simple paperclip maximizers.

Goal stability also get's harder the more complicated the goal happens to be. A paperclip maximizer can have a off switch but at the same time prevent anyone from pushing that switch.

By an off switch I mean a backup goal. Goals are standardly regarded as immune self modification, so an off switch, in my sense, would be too.

This is quite a subtle issue.

If the "backup goal" is always in effect, eg. it is just another clause of the main goal. For example, "maximise paperclips" with a backup goal of "do what you are told" is the same as having the main goal "maximise paperclips while doing what you are told".

If the "backup goal" is a separate mode which we can switch an AI into, eg. "stop all external interaction", then it will necessarily conflict with the the AI's main goal: it can't maximise paperclips if it stops all external interaction. Hence the primary goal induces a secondary goal: "in order to maximise paperclips, I should prevent anyone switching me to my backup goal". These kind of secondary goals have been raised by Steve Omohundro.

You haven't dealt with the case where the safety goals are the primary ones.

These kinds of primary goals have been raised by Isaac Asimov.

The question of "what are the right safety goals" is what FAI research is all about.

The official LW attitude as I understand is: Don't do it at home, might be incredibly dangerous - MIRI will do it for you!

[-][anonymous]9y 2

Is that healthy or realistic?

It's not realistic. MIRI has nothing to show in the field of AI.

Is it time for the poser group to show up already? Most of the mathematics of AI has not been formalized yet. So yes, they do have something to show for it, not focusing on the wrong problem for years on end leading every one astray.

Among other things.

[-][anonymous]9y 1

AI is extremely well formalized. Every aspect of AI has strong mathmatical foundations and impressive theoretical results. What are you thinking hasn't been worked out yet?

I'm sure MIRI would appreciate it if you could point to the results that make their work redundant.

The word AI has a specific meaning and is not synonymous with AGI. There's indeed a lot of mathematics of AI published.

[-][anonymous]9y 0

MIRI isn't working on AI, nor do they intend to in the near future. What MIRI is working on has nothing at all to do with actual AI/AGI implementations. They have said this publicly; this is not (or should not be) controversial.

I had a similar thought, Utility of outcome with unfriendly AI is far more significant than the dreams about future utopian technological synchronized multi-orgazm paradise, to put it another way, and you may call me yesterday guy, I prefer the world where I'm not inside a total control by governments and multi-corporations

New to LessWrong?