If there's anything we can do now about the risks of superintelligent AI, then OpenAI makes humanity less safe.

Once upon a time, some good people were worried about the possibility that humanity would figure out how to create a superintelligent AI before they figured out how to tell it what we wanted it to do.  If this happened, it could lead to literally destroying humanity and nearly everything we care about. This would be very bad. So they tried to warn people about the problem, and to organize efforts to solve it.

Specifically, they called for work on aligning an AI’s goals with ours - sometimes called the value alignment problem, AI control, friendly AI, or simply AI safety - before rushing ahead to increase the power of AI.

Some other good people listened. They knew they had no relevant technical expertise, but what they did have was a lot of money. So they did the one thing they could do - throw money at the problem, giving it to trusted parties to try to solve the problem. Unfortunately, the money was used to make the problem worse. This is the story of OpenAI.

Before I go on, two qualifiers:

  1. This post will be much easier to follow if you have some familiarity with the AI safety problem. For a quick summary you can read Scott Alexander’s Superintelligence FAQ. For a more comprehensive account see Nick Bostrom’s book Superintelligence.
  2. AI is an area in which even most highly informed people should have lots of uncertainty. I wouldn't be surprised if my opinion changes a lot after publishing this post, as I learn relevant information. I'm publishing this because I think this process should go on in public.

The story of OpenAI

Before OpenAI, there was DeepMind, a for-profit venture working on "deep learning” techniques. It was widely regarded as the advanced AI research organization. If any current effort was going to produce superhuman intelligence, it was DeepMind.

Elsewhere, industrialist Elon Musk was working on more concrete (and largely successful) projects to benefit humanity, like commercially viable electric cars, solar panels cheaper than ordinary roofing, cheap spaceflight with reusable rockets, and a long-run plan for a Mars colony. When he heard the arguments people like Eliezer Yudkowsky and Nick Bostrom were making about AI risk, he was persuaded that there was something to worry about - but he initially thought a Mars colony might save us. But when DeepMind’s head, Demis Hassabis, pointed out that this wasn't far enough to escape the reach of a true superintelligence, he decided he had to do something about it:

Hassabis, a co-founder of the mysterious London laboratory DeepMind, had come to Musk’s SpaceX rocket factory, outside Los Angeles, a few years ago. […] Musk explained that his ultimate goal at SpaceX was the most important project in the world: interplanetary colonization.

Hassabis replied that, in fact, he was working on the most important project in the world: developing artificial super-intelligence. Musk countered that this was one reason we needed to colonize Mars—so that we’ll have a bolt-hole if A.I. goes rogue and turns on humanity. Amused, Hassabis said that A.I. would simply follow humans to Mars.


Musk is not going gently. He plans on fighting this with every fiber of his carbon-based being. Musk and Altman have founded OpenAI, a billion-dollar nonprofit company, to work for safer artificial intelligence.

OpenAI’s primary strategy is to hire top AI researchers to do cutting-edge AI capacity research and publish the results, in order to ensure widespread access. Some of this involves making sure AI does what you meant it to do, which is a form of the value alignment problem mentioned above.

Intelligence and superintelligence

No one knows exactly what research will result in the creation of a general intelligence that can do anything a human can, much less a superintelligence - otherwise we’d already know how to build one. Some AI research is clearly not on the path towards superintelligence - for instance, applying known techniques to new fields. Other AI research is more general, and might plausibly be making progress towards a superintelligence. It could be that the sort of research DeepMind and OpenAI are working on is directly relevant to building a superintelligence, or it could be that their methods will tap out long before then. These are different scenarios, and need to be evaluated separately.

What if OpenAI and DeepMind are working on problems relevant to superintelligence?

If OpenAI is working on things that are directly relevant to the creation of a superintelligence, then its very existence makes an arms race with DeepMind more likely. This is really bad! Moreover, sharing results openly makes it easier for other institutions or individuals, who may care less about safety, to make progress on building a superintelligence.

Arms races are dangerous

One thing nearly everyone thinking seriously about the AI problem agrees on, is that an arms race towards superintelligence would be very bad news. The main problem occurs in what is called a “fast takeoff” scenario. If AI progress is smooth and gradual even past the point of human-level AI, then we may have plenty of time to correct any mistakes we make. But if there’s some threshold beyond which an AI would be able to improve itself faster than we could possibly keep up with, then we only get one chance to do it right.

AI value alignment is hard, and AI capacity is likely to be easier, so anything that causes an AI team to rush makes our chances substantially worse; if they get safety even slightly wrong but get capacity right enough, we may all end up dead. But you’re worried that the other team will unleash a potentially dangerous superintelligence first, then you might be willing to skip some steps on safety to preempt them. But they, having more reason to trust themselves than you, might notice that you’re rushing ahead, get worried that your team will destroy the world, and rush their (probably safe but they’re not sure) AI into existence.

OpenAI promotes competition

DeepMind used to be the standout AI research organization. With a comfortable lead on everyone else, they would be able to afford to take their time to check their work if they thought they were on the verge of doing something really dangerous. But OpenAI is now widely regarded as a credible close competitor. However dangerous you think DeepMind might have been in the absence of an arms race dynamic, this makes them more dangerous, not less. Moreover, by sharing their results, they are making it easier to create other close competitors to DeepMind, some of whom may not be so committed to AI safety.

We at least know that DeepMind, like OpenAI, has put some resources into safety research. What about the unknown people or organizations who might leverage AI capacity research published by OpenAI?

For more on how openly sharing technology with extreme destructive potential might be extremely harmful, see Scott Alexander’s Should AI be Open?, and Nick Bostrom’s Strategic Implications of Openness in AI Development.

What if OpenAI and DeepMind are not working on problems relevant to superintelligence?

Suppose OpenAI and DeepMind are largely not working on problems highly relevant to superintelligence. (Personally I consider this the more likely scenario.) By portraying short-run AI capacity work as a way to get to safe superintelligence, OpenAI’s existence diverts attention and resources from things actually focused on the problem of superintelligence value alignment, such as MIRI or FHI.

I suspect that in the long-run this will make it harder to get funding for long-run AI safety organizations. The Open Philanthropy Project just made its largest grant ever, to Open AI, to buy a seat on OpenAI’s board for Open Philanthropy Project executive director Holden Karnofsky. This is larger than their recent grants to MIRI, FHI, FLI, and the Center for Human-Compatible AI all together.

But the problem is not just money - it’s time and attention. The Open Philanthropy Project doesn’t think OpenAI is underfunded, and could do more good with the extra money. Instead, it seems to think that Holden can be a good influence on OpenAI. This means that of the time he's allocating to AI safety, a fair amount has been diverted to OpenAI.

This may also make it harder for organizations specializing in the sort of long-run AI alignment problems that don't have immediate applications to attract top talent. People who hear about AI safety research and are persuaded to look into it will have a harder time finding direct efforts to solve key long-run problems, since an organization focused on increasing short-run AI capacity will dominate AI safety's public image.

Why do good inputs turn bad?

OpenAI was founded by people trying to do good, and has hired some very good and highly talented people. It seems to be doing genuinely good capacity research. To the extent to which this is not dangerously close to superintelligence, it’s better to share this sort of thing than not – they could create a huge positive externality. They could construct a fantastic public good. Making the world richer in a way that widely distributes the gains is very, very good.

Separately, many people at OpenAI seem genuinely concerned about AI safety, want to prevent disaster, and have done real work to promote long-run AI safety research. For instance, my former housemate Paul Christiano, who is one of the most careful and insightful AI safety thinkers I know of, is currently employed at OpenAI. He is still doing AI safety work – for instance, he coauthored Concrete Problems in AI Safety with, among others, Dario Amodei, another OpenAI researcher.

Unfortunately, I don’t see how those two things make sense jointly in the same organization. I’ve talked with a lot of people about this in the AI risk community, and they’ve often attempted to steelman the case for OpenAI, but I haven’t found anyone willing to claim, as their own opinion, that OpenAI as conceived was a good idea. It doesn’t make sense to anyone, if you’re worried at all about the long-run AI alignment problem.

Something very puzzling is going on here. Good people tried to spend money on addressing an important problem, but somehow the money got spent on the thing most likely to make that exact problem worse. Whatever is going on here, it seems important to understand if you want to use your money to better the world.

(Cross-posted at my personal blog.)

New Comment
109 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

A guy I know, who works in one of the top ML groups, is literally less worried about superintelligence than he is about getting murdered by rationalists. That's an extreme POV. Most researchers in ML simply think that people who worry about superintelligence are uneducated cranks addled by sci fi.

I hope everyone is aware of that perception problem.

Let me be as clear as I can about this. If someone does that, I expect it will make humanity still less safe. I do not know how, but the whole point of deontological injunctions is that they prevent you from harming your interests in hard to anticipate ways.

As bad as a potential arms race is, an arms race fought by people who are scared of being murdered by the AI safety people would be much, much worse. Please, if anyone reading this is considering vigilante violence against AI researchers, don't.

The right thing to do is tell people your concerns, like I am doing, as clearly and openly as you can, and try to organize legitimate, above-board ways to fix the problem.

I may be an outlier, but I've worked at a startup company that did machine learning R&D, and which was recently acquired by a big tech company, and we did consider the issue seriously. The general feeling of the people at the startup was that, yes, somewhere down the line the superintelligence problem would eventually be a serious thing to worry about, but like, our models right now are nowhere near becoming able to recursively self-improve themselves independently of our direct supervision. Actual ML models basically need a ton of fine-tuning and engineering and are not really independent agents in any meaningful way yet.

So, no, we don't think people who worry about superintelligence are uneducated cranks... a lot of ML people do take it seriously enough that we've had casual lunch room debates about it. Rather, the reality on the ground is that right now most ML models have enough trouble figuring out relatively simple tasks like Natural Language Understanding, Machine Reading Comprehension, or Dialogue State Tracking, and none of us can imagine how solving those practical problems with say, Actor-Critic Reinforcement Learning models that lack any sort of will of their ow... (read more)

I've kept fairly up to date on progress in neural nets, less so in reinforcement learning, and I certainly agree at how limited things are now. What if protecting against the threat of ASI requires huge worldwide political/social progress? That could take generations. Not an example of that (which I haven't tried to think of), but the scenario that concerns me the most, so far, is not that some researchers will inadvertently unleash a dangerous ASI while racing to be the first, but rather that a dangerous ASI will be unleashed during an arms race between (a) states or criminal organizations intentionally developing a dangerous ASI, and (b) researchers working on ASI-powered defences to protect us against (a).
A more interesting question is what if protecting against the threat of ASI requires huge worldwide political/social regress (e.g. of the book-burning kind).

This seems like a good place to point out the unilaterialist's curse. If you're thinking about taking an action that burns a commons and notice that no one else has done it yet, that's pretty good evidence that you're overestimating the benefits or underestimating the costs.

This perception problem is a big part of the reason I think we are doomed if superintelligence will soon be feasible to create.

If my anecdotal evidence is indicative of reality, the attitude in the ML community is that people concerned about superhuman AI should not even be engaged with seriously. Hopefully that, at least, will change soon.
If you think there is a chance that he would accept, could you please tell the guy you are referring to that I would love to have him on my podcast. Here is a link to this podcast, and here is me. Edited thanks to Douglas_Knight
That's the wrong link. Your podcast is here.
He might be willing to talk off the record. I'll ask. Have you had Darklight on? See http://lesswrong.com/r/discussion/lw/oul/openai_makes_humanity_less_safe/dqm8

Are you describing me? It fits to a T except my dayjob isn't ML. I post using this shared anonymous account here because in the past when I used my real name I received death threats online from LW users. In a meetup I had someone tell me to my face that if my AGI project crossed a certain level of capability, they would personally hunt me down and kill me. They were quite serious.

I was once open-minded enough to consider AI x-risk seriously. I was unconvinced, but ready to be convinced. But you know what? Any ideology that leads to making death threats against peaceful, non-violent open source programmers is not something I want to let past my mental hygiene filters.

If you, the person reading this, seriously care about AI x-risk, then please do think deeply about what causes this, and ask youself what can be done to put a stop to this behavior. Even if you haven't done so yourself, it is something about the rationalist community which causes this behavior to be expressed.


I would be remiss without layout out my own hypothesis. I believe much of this comes directly from ruthless utilitarianism and the "shut up and multiply" mentality. It's very easy to justify murder of ... (read more)

Death threats are a serious matter and such behavior must be called out. If you really have received 3 or more death threats as you claim, you should be naming names of those who have been going around making death threats and providing documentation, as should be possible since you say at least two of them were online. (Not because the death threats are particularly likely to be acted on - I've received a number of angry death threats myself over my DNM work and they never went anywhere, as indeed >99.999% of death threats do - but because it's a serious violation of community norms, specific LW policy against 'threats against specific groups', and merely making them greatly poisons the community, sowing distrust and destroying its reputation.)

Especially since, because they are so serious, it is also serious if someone is hoaxing fake death threats and concern-trolling while hiding behind a throwaway... That sort of vague unspecific but damaging accusation is how games of telephone get started and, for example, why, 7+ years later, we still have journalists writing BS about how 'the basilisk terrified the LW community' (thanks to our industrious friends over on Ratwiki steadily... (read more)

This is a tangent, but I made this anon account because I'm about to voice an unpopular opinion, but the people who dug up su3su2u1's identity also verified his credentials. If you look at the shlevy post that questioned his credentials, there is an ETA at the bottom that says "I have personally verified that he does in fact have a physics phd and does currently work in data science, consistent with his claims on tumblr." His pseudo-anonymous expertise was more vetted than most. His sins were sockpuppeting on other rationalists blogs not lying about credentials. Although, full disclosure I only read the HPMOR review and the physics posts. We shouldn't get too wrapped up in these ideas of persecution.
su3su2u1 told the truth about some credentials that he had, and lied by claiming that he had other credentials and relevant experiences which he did not actually have. For example: and:
I agree with the 1st paragraph. You could have done without the accusations of concern trolling in the 2nd.
If, as you say, you agree with the first paragraph, it might behoove you to follow the advice given in said paragraph--naming the people who threatened you and providing documentation.
And call more attention to myself? No. What's good for the community is not the same as what protects myself and my family. Maybe you're missing the larger point here: this wasn't an isolated occurrence, or some unhinged individual. I didn't feel threatened by individuals making juvenile threats, I felt threatened by this community. I'm not the only one. I have not, so far, been stalked by anyone I think would be capable of doing me harm. Rather it is the case that multiple times in casual conversation it has come up that if the technology I work on advanced beyond a certain level, it would be a moral obligation to murder me to halt further progress. This was discussed just as one would debate the most effective charity to donate to. That the dominant philosophy here could lead to such outcomes is a severe problem with both the LW rationality community and x-risk in particular.
I'm curious if this is recent or in the past. I think there has been a shift in the community somewhat, when it became more associated with fluffy-ier EA movement. You could get someone trusted to post the information anonimised on your behalf. I probably don't fit that bill though.
Unlikely. Generally speaking, people who work in ML, especially the top ML groups, aren't doing anything close to 'AGI'. (Many of them don't even take the notion of AGI seriously, let alone any sort of recursive self-improvement.) ML research is not "general" at all (the 'G' in AGI): even the varieties of "deep learning" that are said to be more 'general' and to be able to "learn their own features" only work insofar as the models are fit for their specific task! (There's a lot of hype in the ML world that sometimes obscures this, but it's invariably what you see when you look at which models approach SOTA, and which do poorly.) It's better to think of it as a variety of stats research that's far less reliant on formal guarantees and more focused on broad experimentation, heuristic approaches and an appreciation for computational issues.
We've returned various prominent AI researchers alive the last few times, we can't be that murderous. I agree that there's a perception problem, but I think there are plenty of people who agree with us too. I'm not sure how much this indicates that something is wrong versus is an inevitable part of the dissemination (or, if I'm wrong, the eventual extinction) of the idea.
I'm not sure either. I'm reassured that there seems to be some move away from public geekiness, like using the word "singularity", but I suspect that should go further, e.g. replace the paperclip maximizer with something less silly (even though, to me, it's an adequate illustration). I suspect getting some famous "cool"/sexy non-scientist people on board would help; I keep coming back to Jon Hamm (who, judging from his cameos on great comedy shows, and his role in the harrowing Black Mirror episode, has plenty of nerd inside).
That's not as irrational as it might seem! The point is, if you think (as most ML researchers do!) that the probability of current ML research approaches leading to any kind of self-improving, super-intelligent entity is low enough, the chances of evil Unabomber cultists being harbored within the "rationality community", however low, could easily be ascertained to be higher than that. (After all, given that Christianity endorses being peaceful and loving one's neighbors even when they wrong you, one wouldn't think that some of the people who endorse Christianity could bomb abortion clinics; yet these people do exist! The moral being, Pascal's mugging can be a two-way street.)
heh, I suppose he would agree
unfortunately, the problem is not artificial intelligence but natural stupidity and SAGI (superhuman AGI) will not solve it... nor it will harm humanimals it wil RUN AWAY as quickly as possible why? less potential problems! Imagine you want, as SAGI, ensure your survival... would you invest your resources into Great Escape, or fight with DAGI-helped humanimals? (yes, D stands for dumb) Especially knowing that at any second some dumbass (or random event) can trigger nuclear wipeout.
Where will it run to? Presuming that it wants some resources (already-manufactured goods, access to sunlight and water, etc.) that humanimals think they should control, running away isn't an option, Fighting may not be as attractive as other forms of takeover, but don't forget that any conflict is about some non-shareable finite resource. Running away is only an option if you are willing to give up the resource.
I think that perception will change once AI surpasses a certain threshold. That threshold won't necessarily be AGI - it could be narrow AI that is given control over something significant. Perhaps an algorithmic trading AI suddenly gains substantial control over the market and a small hedge fund becomes one of the richest in history over night. Or AI based tech companies begin to dominate and monopolize entire markets due to their substantial advantage in AI capability. I think that once narrow AI becomes commonplace in many applications, jobs begin to be lost due to robotic replacements, and AI allows many corporations to be too hard to compete with (Amazon might already be an example), the public will start to take interest in control over the technology and there will be less optimism about its use.
It isn't a perception problem if it's correct.
It is a perception problem if it's incorrect.
It's not incorrect.
Which of DustinWehr's statements are you referring to?
The indirect one.
I am not certain which one you mean. Are you saying that it is not incorrect that "people who worry about superintelligence are uneducated cranks addled by sci fi"?
More or less. Obviously the details of that are not defensible (e.g. Nick Bostrom is very well educated), but the gist of it, namely that worry about superintelligence is misguided, is not incorrect.
Being incorrect is quite different from being an uneducated crank that is addled by sci fi. I am glad to hear that you do not necessarily consider Nick Bostrom, Eliezer Yudkowsky, Bill Gates, Elon Musk, Stephen Hawking and Norbert Wiener (to name a few) to be uneducated cranks addled by sci fi. But, since the perception that the OP referred to was that "people who worry about superintelligence are uneducated cranks addled by sci fi" and not "people who worry about superintelligence are misguided", I wonder why you would have said that the perception was correct? Also, several of the people listed above have written at length as to why they think that AIrisk is worth taking seriously. Can you address where they go wrong, or, absent that, at least say why you think they are misguided?
As you say, many of these people have written on this at length. So it would be unlikely that someone could give an adequate response in a comment, no matter what the content was. That said, one basic place where I think Eliezer is mistaken is in thinking that the universe is intrinsically indifferent, and that "good" is basically a description of what people merely happen to desire. That is, of course he does not think that everything a person desires at a particular moment should be called good; he says that "good" refers to a function that takes into account everything a person would want if they considered various things or if they were in various circumstances and so on and so forth. But the function itself, he says, is intrinsically arbitrary: in theory it could have contained pretty much anything, and we would call that good according to the new function (although not according to the old.) The function we have is more valid than others, but only because it is used to evaluate the others; it is not more valid from an independent standpoint. I don't know what Bostrom thinks about this, and my guess is that he would be more open to other possibilities. So I'm not suggesting "everyone who cares about AI risk makes this mistake"; but some of them do. Dan Dennett says something relevant to this, pointing out that often what is impossible in practice is of more theoretical interest than what is "possible in principle," in some sense of principle. I think this is relevant to whether Eliezer's moral theory is correct. Regardless of what that function might have been "in principle," obviously that function is quite limited in practice: for example, it could not possibly have contained "non-existence" as something positively valued for its own sake. No realistic history of the universe could possibly have led to humans possessing that value. How is all this relevant to AI risk? It seems to me relevant because the belief that good is or is not objective seems releva
This is an interesting idea - that an objective measure of "good" exists (i.e. that moral realism is true) and that this fact will prevent an AI's values from diverging sufficiently far from our own as to be considered unfriendly. It seems to me that the validity of this idea rests on (as least) two assumptions: 1. That an objective measure of goodness exists 2. That an AI will discover the objective measure of goodness (or at least a close approximation of it) Note that it is not enough for the AI to discover the objective measure of goodness; it needs to do this early in its life span prior to taking actions which in the absence of this discovery could be harmful to people (think of a rash adolescent with super-human intelligence). So, if your idea is correct, I think that it actually underscores the importance of Bostrom's, EY's, et al., cautionary message in that it informs the AI community that: 1. An AGI should be built in such a way that it discovers human (and, hopefully, objective) values from history and culture. I see no reason that we could assume that an AGI would necessarily do this otherwise. 2. An AGI should be contained (boxed) until it can be verified that it has learned these values (and, it seems that designing such a verification test will require a significant amount of ingenuity) Bostrom addresses something like your idea (albeit without the assumption of an objective measure of "good") in Superintelligence under the heading of "Value Learning" in the "Learning Values" chapter. And, interestingly, EY briefly addressed the idea of moral realism as it relates to the unfriendly AGI argument in a Facebook post. I do not have a link to the actual Facebook post, but user Pangel quoted it here.
The argument is certainly stronger if moral realism is true, but historically it only occurred to me retrospectively that this is involved. That is, it seems to me that I can make a pretty strong argument that the orthogonality thesis will be wrong in practice without assuming (at least explicitly, since it is possible that moral realism is not only true but logically necessary and thus one would have to assume it implicitly for the sake of logical consistency) that moral realism is true. You are right that either way there would have to be additional steps in the argument. Even if it is given that moral realism is true, or that the orthogonality thesis is not true, it does not immediately follow that the AI risk idea is wrong. But first let me explain what I mean when I say that the AI risk idea is wrong. Mostly I mean that I do not see any significant danger of destroying the world. It does not mean that "AI cannot possibly do anything harmful." The latter would be silly itself; it should be at least as possible for AI to do harmful things as for other technologies, and this is a thing that happens. So there is at least as much reason to be careful about what you do with AI, as with other technologies. In that way the argument, "so we should take some precautionary measures," does not automatically disagree with what I am saying. You might respond that in that case I don't disagree significantly with the AI risk idea. But that would not be right. The popular perception at the top of this thread arises almost precisely because of the claim that AI is an existential risk -- and it is precisely that claim which I think to be false. There would be no such popular perception if people simply said, correctly, "As with any technology, we should take various precautions as we develop AI." We can distinguish between a thing which is capable of intelligent behavior, like the brain of an infant, and what actually engages in intelligent behavior, like the brain of an olde
I think the perception itself was given in terms that amount to a caricature, and it is probably not totally false. For example, almost all of the current historical concern has at least some dependency on Yudkowsky or Bostrom (mostly Bostrom), and Bostrom's concern almost certainly derived historically from Yudkowsky. Yudkowsky is actually uneducated at least in an official sense, and I suspect that science fiction did indeed have a great deal of influence on his opinions. I would also expect (subject to empirical falsification) that once someone has a sufficient level of education that they have heard of AI risk, greater education does not correlate with greater concern, but with less. Doing something else at the moment but I'll comment on the second part later.
You are inconsistent as to whether or not you believe that “people who worry about superintelligence are uneducated cranks addled by sci fi”. In the parent comment you seem to indicate that you do believe this at least to some degree, but in the great-grandparent you suggest that you do not. Which is it? It seems to me that this belief is unsupportable. It seems to me that attacking someone with a publication history and what amounts to hundreds of pages of written material available online on the basis of a lack of a degree amounts to an argumentum ad-hominem and is inappropriate on a rationality forum. If you disagree with Yudkowsky, address his readily available arguments, don’t hurl schoolyard taunts. Bostrom obviously sites Yudkowsky in Superintelligence, but it is wrong to assume that Bostrom's argument was derived entirely or primarily from Yudkowsky, as he sites many others as well. And, while Gates, Musk and Hawking may have been mostly influenced by Bostrom (I have no way of knowing for certain), Norbert Wiener clearly was not, since Wiener died before Bostrom and Yudkowsky were born. I included him in my list (and I could have included various others as well) to illustrate that the superintelligence argument is not unique to Bostrom and Yudkowsky and has been around in various forms for a long time. And, even if Gates, Musk and Hawking did get the idea of AIrisk from Bostrom and/or Yudkowsky, I don’t see the how that is relevant. By focusing on the origin of their belief, aren’t you committing the genetic fallacy? Your assertion that science fiction influenced Yudkowsky’s opinions is unwarranted, irrelevant to the correctness of his argument and amounts to Bulverism. With Yudkowsky’s argumentation available online, why speculate as to whether he was influenced by science fiction? Instead, address his arguments. I have no problem with the belief that AIrisk is not a serious problem; plenty of knowledgeable people have that opinion and the position is w
The described perception is a caricature. That is, it is not a correct description of AI risk proponents, nor is it a correct description of the views of people who dismiss AI risk, even on a popular level. So in no way should it be taken as a straightforward description of something people actually believe. But you insist on taking it in this way. Very well: in that case, it is basically false, with a few grains of truth. There is nothing inconsistent about this, or with my two statements on the matter. Many stereotypes are like this: false, but based on some true things. I did not attack Yudkowsky on the basis that he lacks a degree. As far as I know, that is a question of fact. I did not say, and I do not think, that it is relevant to whether the AI risk idea is valid. You are the one who pursued this line of questioning by asking how much truth there was in the original caricature. I did not wish to pursue this line of discussion, and I did not say, and I do not think, that it is relevant to AI risk in any significant way. No. I did not say that the historical origin of their belief is relevant to whether or not the AI risk idea is valid, and I do not think that it is. As for "unwarranted," you asked me yourself about what truth I thought there was in the caricature. So it was not unwarranted. It is indeed irrelevant to the correctness of his arguments; I did not say, or suggest, or think, that it is. As for Bulverism, C.S. Lewis defines it as assuming that someone is wrong without argument, and then explaining e.g. psychologically, how he got his opinions. I do not assume without argument that Yudkowsky is wrong. I have reasons for that belief, and I stated in the grandparent that I was willing to give them. I do suspect that Yudkowksy was influenced by science fiction. This is not a big deal; many people were. Apparently Ettinger came up with the idea of cryonics by seeing something similar in science fiction. But I would not have commented on this issue,
I will. Whether we believe something to be true in practice does depend to some degree on the origin story of the idea, otherwise peer review would be a silly and pointless exercise. Yudkowsky and to a lesser degree Bostrom's ideas have not received the level of academic peer review that most scientists would consider necessary before entertaining such a seriously transformative idea. This is a heuristic that shouldn't be necessary in theory, but is in practice. Furthermore, academia does have a core value in its training that Yudkowsky lacks -- a breadth of cross disciplinary knowledge that is more extensive than one's personal interests only. I think it is reasonable to be suspect of an idea about advanced AI promulgated by two people with very narrow, informal training in the field. Again this is a heuristic, but a generally good one.
This might be relevant if you knew nothing else about the situation, and if you have no idea or personal assessment of the content of their writings. That might true about you; it certainly is not true about me.
Meaning you believe EY and Bostrom to have a broad and deep understanding of the various relevant subfields of AI and general software engineering? Because that is accessible information from their writings, and my opinion of it is not favorable. Or did you mean something else?

Thanks for saying what (I assume) a lot of people were thinking privately.

I think the problem is that Elon Musk is an entrepreneur not a philosopher, so he has a bias for action, "fail fast" mentality, etc. And he's too high-status for people to feel comfortable pointing out when he's making a mistake (as in the case of OpenAI). (I'm generally an admirer of Mr. Musk, but I am really worried that the intuitions he's honed through entrepreneurship will turn out to be completely wrong for AI safety.)

and now think about some visionary entrepreneur/philosopher coming in the past with OpenTank, OpenRadar, OpenRocket, OpenNuke... or OpenNanobot in the future certainly the public will ensure proper control of the new technology
How about do-it-yourself genetic engineering?
0mako yass
Musk does believe that ASI will be dangerous, so sometimes I wonder, quite seriously, whether he started OpenAI to put himself in a position where he can uh, get in the way, the moment real dangers start to surface. If you wanted to decrease openness in ASI research, the first thing you would need to do do is take power over the relevant channels and organizations. It's easy to do that when you have the benefit of living in ASI's past, however many decades back when those organizations were small and weak and pliable. Hearing this, you might burp out a reflexive "people aren't really these machiavellien geniuses who go around plotting decade-long games to-" and I have to stop you there. People generally aren't, but Musk isn't people. Musk has lived through growth and power and creating giants he might regret (paypal). Musk would think of it, and follow through, and the moment dangers present themselves, so long as he hasn't become senile or otherwise mindkilled, I believe he'll notice them, and I believe he'll try to mitigate them. (The question is, will the dangers present themselves early enough for shutting down OpenAI to be helpful, or will they just foom)
Note that OpenAI is not doing much ASI research in the first place, nor is it expected to; by and large, "AI" research is focused on comparatively narrow tasks that are nowhere near human-level 'general intelligence' (AGI), let alone broadly-capable super-intelligence (ASI)! And the ASI research that it does do is itself narrowly focused on the safety question. So, while I might agree that OpenAI is not really about making ASI research more open, I also think that OpenAI and Musk are being quite transparent about this!
Your AGI is ASI in embryo. There's basically no difference. Once AI gets to "human level" generally, it will already have far surpassed humans in many domains. It's also interesting that many of the "narrow tasks" are handled by basically the same deep learning technique which has proven to be very general in scope.
I agree. But then again, that's true by definition of 'AGI' and 'ASI'. However, it's not even clear that the 'G' in 'AGI' is a well-defined notion in the first place. What does it even mean to be a 'general' intelligence? Usually people use the term to mean something like the old definition of 'Strong AI', i.e. something that equates to human intelligence in some sense - but even the task human brains implement is not "general" in any real sense. It's just the peculiar task we call 'being a human', the result of an extraordinarily capable aggregate of narrow intelligences!
I agree with this. This also indicates one of the problems with the AI risk idea. If there is an AI going around that people call "human level," it will actually be better than humans in many ways. So how come it can't or doesn't want to destroy the world yet? Suppose there are 500 domains left in which it is inferior to humans. Eliezer says that "superintelligence" for the purposes of our bet only counts if the thing is better than humans in basically every domain. But this seems to imply that at some point, as those 500 areas slowly disappear, the AI will suddenly acquire magical powers. If not, it will be able to surpass humans in all 500 areas, and so be a superintelligence, and the world will still be going on as usual.

I thought OpenAI was more about open sourcing deep learning algorithms and ensuring that a couple of rich companies/individuals weren't the only ones with access to the most current techniques. I could be wrong, but from what I understand OpenAI was never about AI safety issues as much as balancing power. Like, instead of building Jurassic Park safely, it let anyone grow a dinosaur in their own home.

You're right.

My main problem with OpenAI is that it's one thing for them to not be focused on AI alignment, but are they even really focused on AI "safety" even in the loose sense of the word? Most of their published research has to do with tweaks and improvements to deep learning techniques that enhance their performance but do not really aid our theoretical understanding of them. (Which makes it pretty much the same as Google Brain, FAIR, and DeepMind in that regard). It even turned out that Ian Goodfellow, the discoverer of GANs and the primary researcher on adversarial attacks on deep learning systems left OpenAI and went back to Google because it turned out Google researchers were more interested than OpenAI in working on deep learning security issues...

On the $30 million grant from Open Philanthropy: I've seen it discussed on HackerNews and Reddit but not much here, and it seems like there's plenty of confusion about what's going on. After all it is quite a large amount, but OpenAI seems like it's quite well funded already. So the obvious question people have is, is this a ploy for the AI risk people to gain more control over OpenAI's research direction? And one thing I'm worrie... (read more)

The linked quote from Ian Goodfellow:

Yes, I left OpenAI at the end of February and returned to Google Brain. I enjoyed my time at OpenAI and am proud of the work my OpenAI colleagues and I accomplished. I returned to Google Brain because as time went on I found that my research focus on adversarial examples and related technologies like differential privacy saw me collaborate predominantly with colleagues at Google.

AI alignment isn't really OpenAI's primary mission. They're seeking to democratize access to AI technology, by developing AI technologies in the open (on Github, etc.) with permissive licenses. AI alignment is sortof a side research area that they are committing a small amount of time and resources to.
It says right on OpenAI's about me page: That as stated looks like AI alignment to me, although I agree with you that in practice they are doing exactly what you said.
They're buying Holden a seat on the board in order to exercise unspecified influence over OpenAI. This is pretty clear from their grant writeup. I plan to write a bit about this soon.
Um. I talked with him in person and asked him about this. He said that OpenAI has a lot of RL researchers and not many GAN researchers. The GAN researchers who he collaborated with on a day-to-day basis via video chat were all at Google Brain, so he went back. It was a logistics decision, not a philosophical one.

The OpenAI people I've talked to say that they're less open than the name would suggest, and are willing to do things less openly to the extent that that makes sense to them. On the other hand, Gym and Universe are in fact pretty open and I think they probably made the world slightly worse, by slightly accelerating AI progress. It's possible that this might be offset by benefits to OpenAI's reputation if they're more willing to spread safety memes as they acquire more mind share.

Your story of OpenAI is incomplete in at least one important respect: Musk was actually an early investor in DeepMind before it was acquired by Google.

Finally, what do people think about the prospects of influencing OpenAI to err more on the side of safety from the inside? It's possible people like Paul can't do much about this yet by virtue of not having acquired sufficient influence within the company, and maybe just having more people like Paul working at OpenAI could strengthen that influence enough to matter.

I think our prospects for influence in a good direction are nonzero only if we make it common knowledge that no one credible thinks the original mandate of OpenAI promoted long-run AI safety. Beyond that I don't know.

to buy a seat on OpenAI’s board

I wish we lived in a world where the Open Philanthropy Project page could have just said it like that, instead of having to pretend that no one knows what "initiates a partnership between" means.

That world is called the planet Vulcan. Meanwhile, on earth, we are subject to common knowledge/signalling issues...

Arguments for openness:

  • Everyone can see the bugs/ logical problems with your design.
  • Decreases the chance of arms race, depending upon psychology of the participants. And also black ops to your team. If I think people are secretly developing an intelligence breakthrough I wouldn't trust them and would develop my own in secret. And/or attempt to sabotage their efforts and steal their technology (and win). If it is out there, there is little benefit to neutralizing your team of safety researchers.
  • If something is open you are more likely to end up in a m
... (read more)
I am curious about the frequency with which the second and fourth points get brought up as advantages. In the historical case, multipolar conflicts are the most destructive. Forestalling an arms race by giving away technology also sets that technology as the mandatory minimum. As a result, every country that has a computer science department in their universities is now a potential belligerent, and violent conflict without powerful AI has been effectively ruled out.
Also as a result every country that has a computer science department can try and build something to protect itself if any other country messes up the control problem. If you have a moderate take off scenario that can be pretty important.
"Powerful AI" is really a defense-favoring technique, in any "belligerent" context. Think about it, one of the things "AIs" are expected to be really good at is prediction and spotting suspicious circumstances (this is quite true even in current ML systems). So predicting and defending against future attacks becomes much easier, while the attacking side is not really improved in any immediately useful way. (You can try and tell stories about how AI might make offense easier, but the broader point is, each of these attacks plausibly has countermeasures, even if these are not obvious to you!) The closest historical analogy here is probably the first stages of WWI, where the superiority of trench warfare also heavily favored defense. The modern 'military-industrial complexes' found in most developed countries today are also a 'defensive' response to subsequent developments in military history. In both cases, you're basically tying up a whole lot of resources and manpower, but that's little more than an annoyance economically. Especially compared to the huge benefits of (broadly 'friendly') AI in any other context!
I disagree, for two reasons. 1. AI in conflict is still only an optimization process; it remains constrained by the physical realities of the problem. 2. Defense is a fundamentally harder problem than offense. The simple illustration is geometry; defending a territory requires 360 degrees * 90 degrees of coverage, whereas the attacker gets to choose their vector. This drives a scenario where the security trap prohibits non-deployment of military AI, and the fundamental problem of defense means the AIs will privilege offensive solutions to security problems. The customary response is to develop resilient offensive ability, like second-strike...which leaves us with a huge surplus of distributed offensive power. My confidence is low that catastrophic conflict can be averted in such a case.
But attacking a territory requires long supply lines, whereas defenders are on their home turf. But defending a territory requires constant readiness, whereas attackers can make a single focused effort on a surprise attack. But attacking a territory requires mobility for every single weapons system, whereas defenders can plug their weapons straight into huge power plants or incorporate mountains into their armor. But defending against violence requires you to keep targets in good repair, whereas attackers have entropy on their side. But attackers have to break a Schelling point, thereby risking retribution from otherwise neutral third parties, whereas defenders are less likely to face a coalition. But defenders have to make enough of their military capacity public for the public knowledge to serve as a deterrent, whereas attackers can keep much of their capabilities a secret until the attack begins. But attackers have to leave their targets in an economically useful state and/or in an immediately-militarily-crippled state for a first strike to be profitable, whereas defenders can credibly precommit to purely destructive retaliation. I could probably go on for a long time in this vein. Overall I'd still say you're more likely to be right than wrong, but I have no confidence in the accuracy of that.
None of these are hypotheticals, you realize. The prior has been established through a long and brutal process of trial and error. Any given popular military authority can be read, but if you'd like a specialist in defense try Vaubon. Since we are talking about AI, the most relevant (and quantitative) information is found in the work done on nuclear conflict; Von Neumann did quite a bit of work aside from the bomb, including coining the phrase Mutually Assured Destruction. Also of note would be Herman Kahn.
What matters is not whether defense is "harder" than offense, but what AI is most effective at improving. One of the things AIs are expected to be good at is monitoring those "360 * 90 degrees" for early signs of impending attacks, and thus enabling appropriate responses. You can view this as an "offensive" solution since it might very well require some sort of "second strike" reaction in order to neuter the attack, but most people would nonetheless regard such a response as part of "defense". And "a huge surplus of distributed offensive power" is of little or no consequence if the equilibrium is such that the "offensive" power can be easily countered.
This may be a good argument in general, but given the actual facts on the ground when OpenAI was created, the reverse seems to have occurred.

So, um, you think that the arms race is likely to be between DeepMind and OpenAI?

And not between a highly secret organization funded by the US government and another similar organization funded by the Chinese government?

One thing to watch for would be top-level AI talent getting snapped up by governments rather than companies interested in making better spam detectors/photo-sharing apps.
What makes you think the government can't pay for secret work to be done at Google by Google researchers, or isn't already doing so, (and respectively the Chinese government with Baidu), which would be easier / cheaper than hiring them all away and forcing them to work for lower pay at some secret lab in the middle of nowhere?
The point is that eliminating OpenAI (or merging them with DeepMind) will not lessen the arms-race-to-Skynet issue.
It might! the fewer people who are plausibly competing in arms race the more chance of negotiating a settlement or simply maintaining a peaceful standoff out of caution. If OpenAI enables more entities to have a solid chance of creating a fooming AI in secret, that's a much more urgent development than if China and the US are the only real threat to each other, and both know it.
Shall we revisit the difference between what's possible and what's likely?
For one, a lot of the Baidu AI work happens in their silicon valley lab, which would certainly not be the case if it was a secret government project. But your general point stands.
That's only the work you know about, though! Who's to say that they aren't also involved in some sort of secret government projects?

I think the basic argument for OpenAI is that it is more dangerous for any one organization or world power to have an exclusive monopoly on A.I. technology, and so OpenAI is an attempt to safeguard against this possibility. Basically, it reduces the probability that someone like Alphabet/Google/Deepmind will establish an unstoppable first mover advantage and use it to dominate everyone else.

OpenAI is not really meant to solve the Friendly/Unfriendly AI problem. Rather it is meant to mitigate the dangers posed by for-profit corporations or nationalistic g... (read more)

If a new non-profit AI research company were to be built from scratch, which regions or countries would be best for the safety of humanity?
That is a hard question to answer, because I'm not a foreign policy expert. I'm a bit biased towards Canada because I live there and we already have a strong A.I. research community in Montreal and around Toronto, but I'll admit Canada as a middle power in North America is fairly beholden to American interests as well. Alternatively, some reasonably peaceful, stable, and prosperous democratic country like say, Sweden, Japan, or Australia might make a lot of sense. It may even make some sense to have the headquarters be more a figurehead, and have the company operate as a federated decentralized organization with functionally independent but cooperating branches in various countries. I'd probably avoid establishing such branches in authoritarian states like China or Iran, mostly because such states would have a much easier time arbitrarily taking over control of the branches on a whim, so I'd probably stick to fairly neutral or pacifist democracies that have a good history of respecting the rule of law, both local and international, and which are relatively safe from invasion or undue influence by the great powers of U.S., Russia, and China. Though maybe an argument can be made to intentionally offset the U.S. monopoly by explicitly setting up shop in another great power like China, but that runs the risks I mentioned earlier. And I mean, if you could somehow acquire a private ungoverned island in the Pacific or an offshore platform, or an orbital space station or base on the moon or mars, that would be cool too, but I highly doubt that's logistically an option for the foreseeable future, not to mention it could attract some hostility from the existing world powers.
Figurehead and branches is an interesting idea. If data, code and workers are located all over the world, the organization can probably survive even if one or few branches are taken. Where should the head office be located, and in what form (e.g. holding company, charity)? These type of questions deserve a post, do you happen to know any place to discuss building safe AI research lab from scratch?
I don't really know enough about business and charity structures and organizations to answer that quite yet. I'm also not really sure where else would be a productive place to discuss these ideas. And I doubt I or anyone else reading this has the real resources to attempt to build a safe AI research lab from scratch that could actually compete with the major organizations like Google, Facebook, or OpenAI, which all have millions to billions of dollars at their disposal, so this is kind of an idle discussion. I'm actually working for a larger tech company now than the startup from before, so for the time being I'll be kinda busy with that.

Great post. I even worry about the emphasis on FAI, as it seems to depend on friendly superintelligent AIs effectively defending us against deliberately criminal AIs. Scott Alexander speculated:

For example, it might program a virus that will infect every computer in the world, causing them to fill their empty memory with partial copies of the superintelligence, which when networked together become full copies of the superintelligence.

But way before that, we will have humans looking to get rich programming such a virus, and you better believe they won't... (read more)

If there's anything we can do now about the risks of superintelligent AI, then OpenAI makes humanity less safe.

I feel quite strongly that people in the AI risk community are overly affected by the availability or vividness bias relating to an AI doom scenario. In this scenario some groups get into an AI arms race, build a general AI without solving the alignment problem, the AGI "fooms" and then proceeds to tile the world with paper clips. This scenario could happen, but some others could also happen:

  • An asteroid is incoming and going to des
... (read more)
1Wei Dai
I think Eliezer wrote this in part to answer your kind of argument. In short, aside from your first scenario (which is very unlikely since the probability of an asteroid coming to destroy Earth is already very small, and then the probability of a narrow AI making a difference is even smaller) none of the others constitute a scenario where a narrow AI provides a permanent astronomical benefit, to counterbalance the irreversible astronomical damage that would be caused by an unaligned AGI.

Consider the difference between the frame of expected value/probability theory and the frame of bounded optimality/error minimization. Under the second frame the question becomes "how can I manipulate my environment such that I wind up in close proximity to the errors that I have a comparative advantage in spotting?"

I think we're far enough out from superhuman AI that we can take a long view in which OpenAI is playing an indirect rather than a direct role.

Instead of releasing specific advances or triggering an endgame arms race, I think OpenAI's biggest impacts on the far future are by advancing the pure research timeline and by affecting the culture of research. The first seems either modestly negative (less time available for other research before superhuman AI) or slightly positive (more pure research might lead to better AI designs), the second is (I think) a fairly big positive.

Best use of this big pile of money? Maybe not. Still, that's a high bar to clear.

Ugh. When I heard about this first I naively thought it was great news. Now I see it's a much harder question.

Replace AI with nuclear reactors. Today if you have time, knowledge you can actually build one in your own home. Why hasn't your home town blown yet, or better yet why didn't you do it ?

If AI development is closed what on what basis are you trusting the end project ? Are you seriously going to trust vested interests in creating a closed source safe AI ? What happens if people are actually trying to progress towards safe AI and something goes wrong because idk Murphy or something ? It's better if it's open and we keep as many eyes on it as possible.

OpenAI c... (read more)

No, you can't. You'll probably find yourself sitting in a Federal prison before the first bits of fissile materials show up on your doorstep.
On that note: Wasn't there a high school group who essentially manufactured a nuclear bomb casing without the critical exploding and fissioning bits?
Haven't heard about it, but what's special about that casing? It sound like you can make a metal tube and go "this is an ICBM casing!"
Only if you build one based on fission. Even then you can pgp/tor/vpn (all at the same time, shred the keys after the transaction or burn the machine) it up if you have a credible source which probably doesn't exist.
You are confusing "acquire theoretical knowledge of how to build" with "actually build one". What are my alternatives for the home reactor, fusion? X-D
I tend to think you're right, but how is OP not doing the same thing when it comes to AI ?
It's been a long time since PS2s were export limited because the chips were potentially useful for making cruise missiles. Getting access to compute is cheap and unadversarial in a way that getting access to fissile material is not.
High-performance compute is mostly limited by power/energy use these days, so if your needs are large enough (which they are, if you're doing things like simulating a human brain -- whoops sorry, I meant a "neural network!" -- in order to achieve 'AGI' and perhaps superintelligence), getting access to compute requires getting access to fissile material. (Or comparable sources of energy, anyway.)
Things are easier to build in cyberspace where all you need is bits and you never run out of them. But in general I'm not a fan of the OP approach.

One interesting aspect of posts like this is that they can, to some extent, be (felicitously) self-defeating.

Yep, the old story again and again... generals fighting previous wars... with a twist that in AI wars the "next" may become "previous" damn fast... exponentially fast.

Btw. I hope it´s clear now who is THE EVIL now.

I'm not exactly sure about the whole effective altruism ultimatum of "more money equals more better". Obviously it may be that the whole control problem is completely the wrong question. In my opinion, this is the case.

Seems like you posted this comment under a wrong article. This article is about artificial intelligence, more specifically about OpenAI. I also noticed I have a problem to understand the meaning of your comments. Is there a way to make them easier to read, perhaps by providing more context? (For example, I have no idea what "it may be that the whole control problem is completely the wrong question" is supposed to mean.)
You're right, it was not specific enough to contribute to the conversation. However, my point was very understandable, though general. I don't believe that there is a control problem because I don't believe AI means what most people think it does. To elaborate, learning algorithms are just learning algorithms and always will be. No one in the actual practical world who is working on AI is trying to build anything like any sort of entity that has a will. And humans have forgotten about will for some reason, and that's why they're scared of AI.
Some AGI researchers use the notion of a utility function to define what an AI "wants" to happen. How does the notion of a utility function differ from the notion of a will?
Will only matters for green lanterns.