Reply to Holden on The Singularity Institute

Holden Karnofsky of GiveWell has objected to the Singularity Institute (SI) as a target for optimal philanthropy. As someone who thinks that existential risk reduction is really important and also that the Singularity Institute is an important target of optimal philanthropy, I would like to explain why I disagree with Holden on these subjects. (I am also SI's Executive Director.)

Mostly, I'd like to explain my views to a broad audience. But I'd also like to explain my views to Holden himself. I value Holden's work, I enjoy interacting with him, and I think he is both intelligent and capable of changing his mind about Big Things like this. Hopefully Holden and I can continue to work through the arguments together, though of course we are both busy with many other things.

I appreciate the clarity and substance of Holden's objections, and I hope to reply in kind. I begin with an overview of some basic points that may be familiar to most Less Wrong veterans, and then I reply point-by-point to Holden's post. In the final section, I summarize my reply to Holden.

Holden raised many different issues, so unfortunately this post needed to be long. My apologies to Holden if I have misinterpreted him at any point.


  • Existential risk reduction is a critical concern for many people, given their values and given many plausible models of the future. Details here.
  • Among existential risks, AI risk is probably the most important. Details here.
  • SI can purchase many kinds of AI risk reduction more efficiently than other groups can. Details here.
  • These points and many others weigh against many of Holden's claims and conclusions. Details here.
  • Summary of my reply to Holden


I must be brief, so while reading this post I am sure many objections will leap to your mind. To encourage constructive discussion on this post, each question (posted as a comment on this page) that follows the template described below will receive a reply from myself or another SI representative.

Please word your question as clearly and succinctly as possible, and don't assume your readers will have read this post before reading your question (because: the conversations here may be used as source material for a comprehensive FAQ).

Here's an example of how you could word the first paragraph of your question: "You claimed that [insert direct quote here], and also that [insert another direct quote here]. That seems to imply that [something something]. But that doesn't seem to take into account that [blah blah blah]. What do you think of that?"

If your question needs more explaining, leave the details to subsequent paragraphs in your comment. Please post multiple questions as multiple comments, so they can be voted upon and replied to individually. If you don't follow these rules, I can't guarantee SI will have time to give you a reply. (We probably won't.)

Why many people care greatly about existential risk reduction

Why do many people consider existential risk reduction to be humanity's most important task? I can't say it much better than Nick Bostrom does, so I'll just quote him:

An existential risk is one that threatens the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development. Although it is often difficult to assess the probability of existential risks, there are many reasons to suppose that the total such risk confronting humanity over the next few centuries is significant...

Humanity has survived what we might call natural existential risks [asteroid impacts, gamma ray bursts, etc.] for hundreds of thousands of years; thus it is prima facie unlikely that any of them will do us in within the next hundred...

In contrast, our species is introducing entirely new kinds of existential risk—threats we have no track record of surviving... In particular, most of the biggest existential risks seem to be linked to potential future technological breakthroughs that may radically expand our ability to manipulate the external world or our own biology. As our powers expand, so will the scale of their potential consequences—intended and unintended, positive and negative. For example, there appear to be significant existential risks in some of the advanced forms of biotechnology, molecular nanotechnology, and machine intelligence that might be developed in the decades ahead.

What makes existential catastrophes especially bad is not that they would [cause] a precipitous drop in world population or average quality of life. Instead, their significance lies primarily in the fact that they would destroy the future... To calculate the loss associated with an existential catastrophe, we must consider how much value would come to exist in its absence. It turns out that the ultimate potential for Earth-originating intelligent life is literally astronomical.

One gets a large number even if one confines one’s consideration to the potential for biological human beings living on Earth. If we suppose... that our planet will remain habitable for at least another billion years, and we assume that at least one billion people could live on it sustainably, then the potential exist for at least 1018 human lives. [The numbers get way bigger if you consider the expansion of posthuman civilization to the rest of the galaxy or the prospect of mind uploading.]

Even if we use the most conservative of these estimates, which entirely ignores the possibility of space colonization and software minds, we find that the expected loss of an existential catastrophe is greater than the value of 1016 human lives...

These considerations suggest that the loss in expected value resulting from an existential catastrophe is so enormous that the objective of reducing existential risks should be a dominant consideration whenever we act out of an impersonal concern for humankind as a whole.

I refer the reader to Bostrom's paper for further details and additional arguments, but neither his paper nor this post can answer every objection one might think of.

Nor can I summarize all the arguments and evidence related to estimating the severity and time horizon of every proposed existential risk. Even the 500+ pages of Oxford University Press' Global Catastrophic Risks can barely scratch the surface of this enormous topic. As explained in Intelligence Explosion: Evidence and Import, predicting long-term technological progress is hard. Thus, we must

examine convergent outcomes that—like the evolution of eyes or the emergence of markets—can come about through any of several different paths and can gather momentum once they begin.

I'll say more about convergent outcomes later, but for now I'd just like to suggest that:

  1. Many humans living today value both current and future people enough that if existential catastrophe is plausible this century, then upon reflection (e.g. after counteracting their unconscious, default scope insensitivity) they would conclude that reducing the risk of existential catastrophe is the most valuable thing they can do — whether through direct work or by donating to support direct work. It is to these people I appeal. (I also have much to say to people who e.g. don't care about future people, but it is too much to say here and now.)

  2. As it turns out, we do have good reason to believe that existential catastrophe is plausible this century.

I don't have the space here to discuss the likelihood of different kinds of existential catastrophe that could plausibly occur this century (see GCR for more details), so instead I'll talk about just one of them: an AI catastrophe.

AI risk: the most important existential risk

There are two primary reasons I think AI is the most important existential risk:

Reason 1: Mitigating AI risk could mitigate all other existential risks, but not vice-versa. There is an asymmetry between AI risk and other existential risks. If we mitigate the risks from (say) synthetic biology and nanotechnology (without building Friendly AI), this only means we have bought a few years or decades for ourselves before we must face yet another existential risk from powerful new technologies. But if we manage AI risk well enough (i.e. if we build a Friendly AI or "FAI"), we may be able to "permanently" (for several billion years) secure a desirable future. Machine superintelligence working in the service of humane goals could use its intelligence and resources to prevent all other existential catastrophes. (Eliezer: "I distinguish 'human', that which we are, from 'humane'—that which, being human, we wish we were.")

Reason 2: AI is probably the first existential risk we must face (given my evidence, only the tiniest fraction of which I can share in a blog post).

One reason AI may be the most urgent existential risk is that it's more likely for AI (compared to other sources of catastrophic risk) to be a full-blown existential catastrophe (as opposed to a merely billions dead catastrophe). Humans are smart and adaptable; we are already set up for a species-preserving number of humans to survive (e.g. in underground bunkers with stockpiled food, water, and medicine) major catastrophes from nuclear war, superviruses, supervolcano eruption, and many cases of asteroid impact or nanotechnological ecophagy.

Machine superintelligences, however, could intelligently seek out and neutralize humans which they (correctly) recognize as threats to the maximal realization of their goals. Humans are surprisingly easy to kill if an intelligent process is trying to do so. Cut off John's access to air for a few minutes, or cut off his water supply for a few days, or poke him with a sharp stick, and he dies. Forever. (Post-humans might shudder at this absurdity like we shudder at the idea that people used to die from their teeth.)

Why think AI is coming anytime soon? This is too complicated a topic to breach here. See Intelligence Explosion: Evidence and Import for a brief analysis of AI timelines. Or try The Uncertain Future, which outputs an estimated timeline for human-level AI based on your predictions of various technological developments. (SI is currently collaborating with the Future of Humanity Institute to write another paper on this subject.)

It's also important to mention that the case for caring about AI risk is less conjunctive that many seem to think, which I discuss in more detail here.

SI can purchase several kinds of AI risk reduction more efficiently than others can

The two organizations working most directly to reduce AI risk are the Singularity Institute and the Future of Humanity Institute (FHI). Luckily, these organizations complement each other well, as I pointed out back before I was running SI:

  • FHI is part of Oxford, and thus can bring credibility to existential risk reduction. Resulting output: lots of peer-reviewed papers, books from OUP like Global Catastrophic Risks, conferences, media appearances, etc.

  • SI is independent and is less constrained by conservatism or the university system. Resulting output: Very novel (and, to the mainstream, "weird") research on Friendly AI, and the ability to do unusual things that are nevertheless quite effective at finding/creating lots of new people interested in rationality and existential risk reduction: (1) The Sequences, the best tool I know for creating aspiring rationalists, (2) Harry Potter and the Methods of Rationality, a surprisingly successful tool for grabbing the attention of mathematicians and computer scientists around the world, and (3) the Singularity Summit, a mainstream-aimed conference that brings in people who end up making significant contributions to the movement — e.g. Tomer Kagan (an SI donor and board member) and David Chalmers (author of The Singularity: A Philosophical Analysis and The Singularity: A Reply).

A few weeks later, Nick Bostrom (Director of FHI) said the same things (as far as I know, without having read my comment):

I think there is a sense that both organizations are synergistic. If one were about to go under... that would probably be the one [to donate to]. If both were doing well... different people will have different opinions. We work quite closely with the folks from [the Singularity Institute]...

There is an advantage to having one academic platform and one outside academia. There are different things these types of organizations give us. If you wanna get academics to pay more attention to this, to get postdocs to work on this, that's much easier to do within academia; also to get the ear of policy-makers and media... On the other hand, for [SI] there might be things that are easier for them to do. More flexibility, they're not embedded in a big bureaucracy. So they can more easily hire people with non-standard backgrounds... and also more grass-roots stuff like Less Wrong...

FHI is, despite its small size, a highly productive philosophy department. More importantly, FHI has focused its research work on AI risk issues for the past 9 months, and plans to continue on that path for at least another 12 months. This is important work that should be supported. (Note that FHI recently hired SI research associate Daniel Dewey.)

SI lacks FHI's publishing productivity and its university credibility, but as an organization SI is improving quickly, and it can seize many opportunities for AI risk reduction that FHI is not well-positioned to seize. (New organizations will also tend to be less capable of seizing these opportunities than SI, due to the financial and human capital already concentrated at SI and FHI.)

Here are some examples of projects that SI is probably better able to carry out than FHI, given its greater flexibility (and assuming sufficient funding):

My replies to Holden, point by point

Holden's post makes so many claims that I'll just have to work through his post from beginning to end, and then summarize where I think we stand at the end.

GiveWell Labs

Holden opened "Thoughts on the Singularity Institute" by noting that SI was previously outside Givewell's scope, since GiveWell was focused on specific domains like poverty reduction. With the launch of GiveWell Labs, GiveWell is now open to evaluating any giving opportunity, including SI.

I admire this move. I'm sure people have been bugging GiveWell to do this for a long time, but almost none of those people appreciate how hard it is to launch broad new initiatives like this with the limited budget of an organization like Givewell or the Singularity Institute. Most of them also do not understand how much work is required to write something like "Thoughts on the Singularity Institute", "Reply to Holden on Tool AI", or this post.

Three possible outcomes

Next, Holden wrote:

[I hope] that one of these three things (or some combination) will happen:

  1. New arguments are raised that cause me to change my mind and recognize SI as an outstanding giving opportunity. If this happens I will likely attempt to raise more money for SI (most likely by discussing it with other GiveWell staff and collectively considering a GiveWell Labs recommendation).

  2. SI concedes that my objections are valid and increases its determination to address them. A few years from now, SI is a better organization and more effective in its mission.

  3. SI can't or won't make changes, and SI's supporters feel my objections are valid, so SI loses some support, freeing up resources for other approaches to doing good.

As explained at the top of Holden's post, I had already conceded that many of Holden's objections (especially concerning past organizational competence) are valid, and had been working to address them, even before Holden's post was published. So outcome #2 is already true in part.

I hope for outcome #1, too, but I don't expect Holden to change his opinion overnight. There are too many possible objections to which Holden has not yet heard a good response. But hopefully this post and its comment threads will successfully address some of Holden's (and others') objections.

Outcome #3 is unlikely since SI is already making changes, though of course it's possible we will be unable to raise sufficient funding for SI despite making these changes, or even because of our efforts to make these changes. (Improving general organizational effectiveness is important but it costs money and is not exciting to donors.)

SI's mission is more important than SI as an organization

Holden said:

whatever happens as a result of my post will be positive for SI's mission, whether or not it is positive for SI as an organization. I believe that most of SI's supporters and advocates care more about the former than about the latter, and that this attitude is far too rare in the nonprofit world.

Clearly, SI's mission is more important than SI as an organization. If somebody launches an organization more effective (at AI risk reduction) than SI but just as flexible, then SI should probably fold itself and try to move its donor base, support community, and the best of its human capital to that new organization.

That said, it's probably easier to reform SI into a more effective organization than it is to launch a new one, since SI has successfully concentrated lots of attention, donor support, and human capital. Also, SI has learned many lessons about how to run a very tricky kind of organization. AI risk reduction is a mission that (1) is beyond most people's time horizons for caring, (2) is hard to understand and visualize, (3) pattern-matches to science fiction and apocalyptic religion, (4) suffers under complicated and necessarily uncertain strategic considerations (compare to the simplicity of bed nets), (5) has a very small pool of people from which to recruit researchers, etc. SI has lots of experience with these issues; experience that probably takes a long time and lots of money to acquire.

(On the other hand, SI has also concentrated some bad reputation which a new organization could launch without. But I still think the weight of the arguments is in favor of reforming SI.)

SI's arguments need to be clearer


I do not believe that [my objections to SI's apparent views] constitute a sharp/tight case for the idea that SI's work has low/negative value; I believe, instead, that SI's own arguments are too vague for such a rebuttal to be possible. There are many possible responses to my objections, but SI's public arguments (and the private arguments) do not make clear which possible response (if any) SI would choose to take up and defend. Hopefully the dialogue following this post will clarify what SI believes and why.

I agree that SI's arguments are often vague. For example, Chris Hallquist reported:

I've been trying to write something about Eliezer's debate with Robin Hanson, but the problem I keep running up against is that Eliezer's points are not clearly articulated at all. Even making my best educated guesses about what's supposed to go in the gaps in his arguments, I still ended up with very little.

I know the feeling! That's why I've tried to write as many clarifying documents as I can, including the Singularity FAQ, Intelligence Explosion: Evidence and Import, The Singularity and Machine Ethics, Facing the Singularity, So You Want to Save the World, and How to Purchase AI Risk Reduction.

Unfortunately, it takes lots of resources to write up hundreds of arguments and responses to objections in clear and precise language, and we're working on it. (For comparison, Nick Bostrom's forthcoming book on machine superintelligence will barely scratch the surface of the things SI and FHI researchers have worked out in conversation, and it will probably take him 2+ years to write in total, and Bostrom is already an unusually prolific writer.) Hopefully SI's responses to Holden's post have helped to clarify our positions already.

Holden's objection #1 punts to objection #2

The first objection on Holden's numbered list was:

it seems to me that any AGI that was set to maximize a "Friendly" utility function would be extraordinarily dangerous.

I'm glad Holden agrees with us that successful Friendly AI is very hard. SI has spent much of its effort trying to show people that the first 20 solutions they come up with all fail. See: AI as a Positive and Negative Factor in Global Risk, The Singularity and Machine Ethics, Complex Value Systems are Required to Realize Valuable Futures, etc. Holden mentions the standard SI worry about the hidden complexity of wishes, and the one about a friendly utility function still causing havoc because the AI's priors are wrong (problem 3.6 from my list of open problems in AI risk research).

There are reasons to think FAI is harder still. What if we get the utility function right and we get the priors right but the AI's values change for the worse when it updates its ontology? What if the smartest, most careful, most insanely safety-conscious AI researchers humanity can produce just aren't smart enough to solve the problem? What if no humans are altruistic enough to choose to build FAI over an AI that will make them king of the universe? What if the idea of FAI is incoherent? (The human brain is an existence proof for the possibility of general intelligence, but we have no existence proof for the possibility of a decision theoretic agent which stably optimizes the world according to a set of preferences over states of affairs.)

So, yeah. Friendly AI is hard. But as I said elsewhere:

The point is that not trying as hard as you can to build Friendly AI is even worse, because then you almost certainly get uFAI. At least by trying to build FAI, we've got some chance of winning.

So Holden's objection #1 objection really just punts to objection #2, about tool-AGI, as the last paragraph in this section of Holden's post seems to indicate:

So far, all I have argued is that the development of "Friendliness" theory can achieve at best only a limited reduction in the probability of an unfavorable outcome. However, as I argue in the next section, I believe there is at least one concept - the "tool-agent" distinction - that has more potential to reduce risks, and that SI appears to ignore this concept entirely.

So if Holden's objection #2 doesn't work, then objection #1 ends up reducing to "the development of Friendliness theory can achieve at best a reduction in AI risk," which is what SI has been saying all along.

Tool AI

Holden's second numbered objection was:

SI appears to neglect the potentially important distinction between "tool" and "agent" AI.

Eliezer wrote a whole post about this here. To sum up:

(1) Whether you're working with Tool AI or Agent AI, you need the "Friendly AI" domain experts that SI is trying to recruit:

A "Friendly AI programmer" is somebody who specializes in seeing the correspondence of mathematical structures to What Happens in the Real World. It's somebody who looks at Hutter's specification of AIXI and reads the actual equations - actually stares at the Greek symbols and not just the accompanying English text - and sees, "Oh, this AI will try to gain control of its reward channel," as well as numerous subtler issues like, "This AI presumes a Cartesian boundary separating itself from the environment; it may drop an anvil on its own head." Similarly, working on TDT means e.g. looking at a mathematical specification of decision theory, and seeing "Oh, this is vulnerable to blackmail" and coming up with a mathematical counter-specification of an AI that isn't so vulnerable to blackmail.

Holden's post seems to imply that if you're building a non-self-modifying planning Oracle (aka 'tool AI') rather than an acting-in-the-world agent, you don't need a Friendly AI programmer because FAI programmers only work on agents. But this isn't how the engineering skills are split up. Inside the AI, whether an agent AI or a planning Oracle, there would be similar AGI-challenges like "build a predictive model of the world", and similar FAI-conjugates of those challenges like finding the 'user' inside an AI-created model of the universe. The insides would look a lot more similar than the outsides. An analogy would be supposing that a machine learning professional who does sales optimization for an orange company couldn't possibly do sales optimization for a banana company, because their skills must be about oranges rather than bananas.

(2) Tool AI isn't that much safer than Agent AI, because Tool AIs have lots of hidden "gotchas" that cause havoc, too. (See Eliezer's post for examples.)

These points illustrate something else Eliezer wrote:

What the human species needs from an x-risk perspective is experts on This Whole Damn Problem [of AI risk], who will acquire whatever skills are needed to that end. The Singularity Institute exists to host such people and enable their research—once we have enough funding to find and recruit them.

Indeed. We need places for experts who specialize in seeing the consequences of mathematical objects for things humans value (e.g. the Singularity Institute) just like we need places for experts on efficient charity (e.g. Givewell).

Anyway, it's worth pointing out that Holden did not make the common (and mistaken) argument that "We should just build Tool AIs instead of Agent AIs and then we'll be fine." This is wrong for many reasons, but one obvious point is that there are incentives to build Agent AIs (because they're powerful), so even if the first 6 teams are careful enough to build only Tool AIs, the 7th team could still build Agent AI and destroy the world.

Instead, Holden pointed out that you could use Tool AI to increase your chances of successfully building agenty FAI:

if developing "Friendly AI" is what we seek, a tool-AGI could likely be helpful enough in thinking through this problem as to render any previous work on "Friendliness theory" moot. Among other things, a tool-AGI would allow transparent views into the AGI's reasoning and predictions without any reason to fear being purposefully misled, and would facilitate safe experimental testing of any utility function that one wished to eventually plug into an "agent."

After reading Eliezer's reply, however, you can probably guess my replies to this paragraph:

  1. Tool AI isn't as safe as Holden thinks.
  2. But yeah, a Friendly AI team may very well use "Tool AI" to aid Friendliness research if it can figure out a safe way to do that. This doesn't obviate the need for Friendly AI researchers; it's part of their research toolbox.

So Holden's Objection #2 doesn't work, which (as explained earlier) means that his Objection #1 (as stated) doesn't work either.

SI's mission assumes a scenario that is far less conjunctive than it initially appears.

Holden's objection #3 is:

SI's envisioned scenario is far more specific and conjunctive than it appears at first glance, and I believe this scenario to be highly unlikely.

His main concern here seemed to be that technological developments and other factors would render earlier FAI work irrelevant. But Eliezer's clarifications about what we mean by "FAI team" render this objection moot, at least as it is currently stated. The purpose of an FAI team is not to blindly develop one particular approach to Friendly AI without checking to see whether this work will be obsoleted by future developments. Instead, the purpose of an FAI team is to develop highly specialized expertise on, among other things, which kinds of research are more and less likely to be relevant given future developments.

Holden's confusion about what SI means by "FAI team" is common and understandable, and it is one reason that SI's mission assumes a scenario that is far less conjunctive than it appears to many. We aren't saying we need an FAI team because we know lots of specific things about how AGI will be built 30 years from now. We're saying you need experts on "the consequences of mathematical objects for things humans value" (an FAI team) because AGIs are mathematical objects and will have big consequences. That's pretty disjunctive.

Similarly, many people think SI's mission is predicated on hard takeoff. After all, we call ourselves the "Singularity Institute," Eliezer has spent a lot of time arguing for hard takeoff, and our current research summary frames AI risk in terms of recursive self-improvement.

But the case for AI as a global risk, and thus the need for dedicated experts on AI risk and "the consequences of mathematical objects for things humans value", isn't predicated on hard takeoff. Instead, it looks something like this:

(1) Eventually, most tasks are performed by machine intelligences.

The improved flexibility, copyability, and modifiability of machine intelligences make them economically dominant even without other advantages (Brynjolfsson & McAfee 2011; Hanson 2008). In addition, there is plenty of room "above" the human brain in terms of hardware and software for general intelligence (Muehlhauser & Salamon 2012; Sotala 2012; Kurzweil 2005).

(2) Machine intelligences don't necessarily do things we like.

We don't necessarily control AIs, since advanced intelligences may be inherently goal-oriented (Omohundro 2007), and even if we build advanced "Tool AIs," these aren't necessarily safe either (Yudkowsky 2012) and there will be significant economic incentives to transform them into autonomous agents (Brynjolfsson & McAfee 2011). We don't value most possible futures, but it's very hard to get an autonomous AI to do exactly what you want (Yudkowsky 2008, 2011; Muehlhauser & Helm 2012; Arkin 2009).

(3) There are things we can do to increase the probability that machine intelligences do things we like.

Further research can clarify (1) the nature and severity of the risk, (2) how to engineer goal-oriented systems safely, (3) how to increase safety with differential technological development, (4) how to limit and control machine intelligences (Armstrong et al. 2012; Yampolskiy 2012), (5) solutions to AI development coordination problems, and more.

(4) We should do those things now.

People aren't doing much about these issues now. We could wait until we understand better (e.g.) what kind of AI is likely, but: (1) it might take a long time to resolve the core issues, including difficult technical subproblems that require time-consuming mathematical breakthroughs, (2) incentives may be badly aligned (e.g. there seem to be strong economic incentives to build AI, but not to take into account social and global risks for AI), (3) AI may not be that far away (Muehlhauser & Salamon 2012), and (4) the transition to machine dominance may be surprisingly rapid due to (e.g.) intelligence explosion (Chalmers 2010, 2012; Muehlhauser & Salamon 2012) or computing overhang.

What do I mean by "computing overhang"? We may get the hardware needed for AI long before we get the software, such that once software for general intelligence is figured out, there is tons of computing hardware sitting around for running AIs (a "computing overhang"). Thus we could switch from a world with one autonomous AI to a world with 10 billion autonomous AIs at the speed of copying software, and thereby transition rapidly from human dominance to AI dominance even without an intelligence explosion. (This is one of the many, many things we haven't yet written up in detail up due to lack of resources.)

(This broad argument is greatly compressed from a paper outline developed by Paul Christiano, Carl Shulman, Nick Beckstead, and myself. We'd love to write the paper at some point, but haven't had the resources to do so. The fuller version of this argument is of course more detailed.)

SI's public argumentation

Next, Holden turned to the topic of SI's organizational effectiveness:

when evaluating a group such as SI, I can't avoid placing a heavy weight on (my read on) the general competence, capability and "intangibles" of the people and organization, because SI's mission is not about repeating activities that have worked in the past...

There are several reasons that I currently have a negative impression of SI's general competence, capability and "intangibles."

The first reason Holden gave for his negative impression of SI is:

SI has produced enormous quantities of public argumentation... Yet I have never seen a clear response to any of the three basic objections I listed in the previous section. One of SI's major goals is to raise awareness of AI-related risks; given this, the fact that it has not advanced clear/concise/compelling arguments speaks, in my view, to its general competence.

I agree in part. Here's what I think:

  • SI hasn't made its arguments as clear, concise, and compelling as I would like. We're working on that. It takes time, money, and people who are (1) smart and capable enough to do AI risk research work and yet somehow (2) willing to work for non-profit salaries and (3) willing to not advance their careers like they would if they chose instead to work at a university.
  • There are a huge number of possible objections to SI's arguments, and we haven't had the resources to write up clear and compelling replies to all of them. (See Chalmers 2012 for quick rebuttals to many objections to intelligence explosion, but what he covers in that paper barely scratches the surface.) As Eliezer wrote, Holden's complaint that SI hasn't addressed his particular objections "seems to lack perspective on how many different things various people see as the one obvious solution to Friendly AI. Tool AI wasn't the obvious solution to John McCarthy, I.J. Good, or Marvin Minsky. Today's leading AI textbook, Artificial Intelligence: A Modern Approach... discusses Friendly AI and AI risk for 3.5 pages but doesn't mention tool AI as an obvious solution. For Ray Kurzweil, the obvious solution is merging humans and AIs. For Jurgen Schmidhuber, the obvious solution is AIs that value a certain complicated definition of complexity in their sensory inputs. Ben Goertzel, J. Storrs Hall, and Bill Hibbard, among others, have all written about how silly Singinst is to pursue Friendly AI when the solution is obviously X, for various different X. Among current leading people working on serious AGI programs labeled as such, neither Demis Hassabis (VC-funded to the tune of several million dollars) nor Moshe Looks (head of AGI research at Google) nor Henry Markram (Blue Brain at IBM) think that the obvious answer is Tool AI. Vernor Vinge, Isaac Asimov, and any number of other SF writers with technical backgrounds who spent serious time thinking about these issues didn't converge on that solution."
  • SI has done a decent job of raising awareness of AI risk, I think. Writing The Sequences and HPMoR have (indirectly) raised more awareness for AI risk that one can normally expect from, say, writing a bunch of clear and precise academic papers about a subject. (At least, it seems that way to me.)

SI's endorsements

The second reason Holden gave for his negative impression of SI is "a lack of impressive endorsements." This one is generally true, despite the three "celebrity endorsements" on our new donate page. More impressive than these is the fact that, as Eliezer mentioned, the latest edition of the leading AI textbook spend several pages talking about AI risk and Friendly AI, and discusses the work of SI-associated researchers like Eliezer Yudkowsky and Steve Omohundro while completely ignoring the existence of the older, more prestigious, and vastly larger mainstream academic field of "machine ethics."

Why don't we have impressive endorsements? To my knowledge, SI hasn't tried very hard to get them. That's another thing we're in the process of changing.

SI and feedback loops

The third reason Holden gave for his negative impression of SI is:

SI seems to have passed up opportunities to test itself and its own rationality by e.g. aiming for objectively impressive accomplishments... Pursuing more impressive endorsements and developing benign but objectively recognizable innovations (particularly commercially viable ones) are two possible ways to impose more demanding feedback loops.

We have thought many times about commercially viable innovations we could develop, but these would generally be large distractions from the work of our core mission. (The Center for Applied Rationality, in contrast, has many opportunities to develop commercially viable innovations in line with its core mission.)

Still, I do think it's important for the Singularity Institute to test itself with tight feedback loops wherever feasible. This is particularly difficult to do for a research organization doing a philosophy of long-term forecasting (30 years is not a "tight" feedback loop in the slightest), but that's what FHI does and they have more "objectively impressive" (that is, "externally proclaimed") accomplishments: lots of peer-reviewed publications, some major awards for its top researcher Nick Bostrom, etc.

SI and rationality

Holden's fourth concern about SI is that it is overconfident about the level of its own rationality, and that this seems to show itself in (e.g.) "insufficient self-skepticism" and "being too selective (in terms of looking for people who share its preconceptions) when determining whom to hire and whose feedback to take seriously."

What would provide good evidence of rationality? Holden explains:

I endorse Eliezer Yudkowsky's statement, "Be careful … any time you find yourself defining the [rationalist] as someone other than the agent who is currently smiling from on top of a giant heap of utility." To me, the best evidence of superior general rationality (or of insight into it) would be objectively impressive achievements (successful commercial ventures, highly prestigious awards, clear innovations, etc.) and/or accumulation of wealth and power. As mentioned above, SI staff/supporters/advocates do not seem particularly impressive on these fronts...

Unfortunately, this seems to misunderstand the term "rationality" as it is meant in cognitive science. As I explained elsewhere:

Like intelligence and money, rationality is only a ceteris paribus predictor of success.

So while it's empirically true (Stanovich 2010) that rationality is a predictor of life success, it's a weak one. (At least, it's a weak predictor of success at the levels of human rationality we are capable of training today.) If you want to more reliably achieve life success, I recommend inheriting a billion dollars or, failing that, being born+raised to have an excellent work ethic and low akrasia.

The reason you should "be careful… any time you find yourself defining the [rationalist] as someone other than the agent who is currently smiling from on top of a giant heap of utility" is because you should "never end up envying someone else's mere choices." You are still allowed to envy their resources, intelligence, work ethic, mastery over akrasia, and other predictors of success.

But I don't mean to dodge the key issue. I think SIers are generally more rational than most people (and so are LWers, it seems), but I think SIers have often overestimated their own rationality, myself included. Certainly, I think SI's leaders have been pretty irrational about organizational development at many times in the past. In internal communications about why SI should help launch CFAR, one reason on my list has been: "We need to improve our own rationality, and figure out how to create better rationalists than exist today."

SI's goals and activities

Holden's fifth concern about SI is the apparent disconnect between SI's goals and its activities:

SI seeks to build FAI and/or to develop and promote "Friendliness theory" that can be useful to others in building FAI. Yet it seems that most of its time goes to activities other than developing AI or theory.

This one is pretty easy to answer. We've focused mostly on movement-building rather than direct research because, until very recently, there wasn't enough community interest or funding to seriously begin to form an FAI team. To do that you need (1) at least a few million dollars a year, and (2) enough smart, altruistic people to care about AI risk that there exist some potential superhero mathematicians for the FAI team. And to get those two things, you've got to do mostly movement-building, e.g. Less Wrong, HPMoR, the Singularity Summit, etc.


And of course, Holden is (rightly) concerned about the 2009 theft of $118,000 from SI, and the lack of public statements from SI on the matter.


  • Two former employees stole $118,000 from SI. Earlier this year we finally won stipulated judgments against both individuals, forcing them to pay back the full amounts they stole. We have already recovered several thousand dollars of this.
  • We do have much better financial controls now. We consolidated our accounts so there are fewer accounts to watch, and at least three staff members check them regularly, as does our treasurer, who is not an SI staff member or board member.

Pascal's Mugging

In another section, Holden wrote:

A common argument that SI supporters raise with me is along the lines of, "Even if SI's arguments are weak and its staff isn't as capable as one would like to see, their goal is so important that they would be a good investment even at a tiny probability of success."

I believe this argument to be a form of Pascal's Mugging and I have outlined the reasons I believe it to be invalid...

Some problems with Holden's two posts on this subject will be explained in a forthcoming post by Steven Kaas. But as Holden notes, some SI principals like Eliezer don't use "small probability of large impact" arguments, anyway. We in fact argue that the probability of a large impact is not tiny.

Summary of my reply to Holden

Now that I have addressed so many details, let us return to the big picture. My summarized reply to Holden goes like this:

Holden's first two objections can be summarized as arguing that developing the Friendly AI approach is more dangerous than developing non-agent "Tool" AI. Eliezer's post points out that "Friendly AI" domain experts are what you need whether you're working with Tool AI or Agent AI, because (1) both of these approaches require FAI experts (experts in seeing the consequences of mathematical objects for what humans value), and because (2) Tool AI isn't necessarily much safer than Agent AI, because Tool AIs have lots of hidden gotchas, too. Thus, "What the human species needs from an x-risk perspective is experts on This Whole Damn Problem [of AI risk], who will acquire whatever skills are needed to that end. The Singularity Institute exists to host such people and enable their research — once we have enough funding to find and recruit them."

Holden's third objection was that the argument behind SI's mission is more conjunctive than it seems. I replied that the argument behind SI's mission is actually less conjunctive than it often seems, because an "FAI team" works on a broader set of problems than Holden had realized, and because the case for AI risk is more disjunctive than many people realize. These confusions are understandable, however, and they probably are a result of insufficient clear argumentative writing from SI on these matters — a problem we am trying to fix with several recent and forthcoming papers and other communications (like this one).

Holden's next objection concerned SI as an organization: "SI has, or has had, multiple properties that I associate with ineffective organizations." I acknowledged these problems before Holden published his post, and have since outlined the many improvements we've made to organizational effectiveness since I was made Executive Director. I addressed several of Holden's specific worries here.

Finally, Holden recommended giving to a donor-advised fund rather than to SI:

I don't think that "Cause X is the one I care about and Organization Y is the only one working on it" to be a good reason to support Organization Y. For donors determined to donate within this cause, I encourage you to consider donating to a donor-advised fund while making it clear that you intend to grant out the funds to existential-risk-reduction-related organizations in the future....

For one who accepts my arguments about SI, I believe withholding funds in this way is likely to be better for SI's mission than donating to SI

By now I've called into question most of Holden's arguments about SI, but I will still address the issue of donating to SI vs. donating to a donor-advised fund.

First: Which public charity would administer the donor-advised fund? Remember also that in the U.S., the administering charity need not spend from the donor-advised fund as the donor wishes, though they often do.

Second: As I said earlier,

it's probably easier to reform SI into a more effective organization than it is to launch a new one, since SI has successfully concentrated lots of attention, donor support, and human capital. Also, SI has learned many lessons about how to run a very tricky kind of organization. AI risk reduction is a mission that (1) is beyond most people's time horizons for caring, (2) is hard to understand and visualize, (3) pattern-matches to science fiction and apocalyptic religion, (4) suffers under complicated and necessarily uncertain strategic considerations (compare to the simplicity of bed nets), (5) has a very small pool of people from which to recruit researchers, etc. SI has lots of experience with these issues; experience that probably takes a long time and lots of money to acquire.

The case for funding improvements and growth at SI (as opposed to starving SI as Holden suggests) is bolstered by the fact that SI's productivity and effectiveness have been improving rapidly of late, and many other improvements (and exciting projects) are on our "to-do" list if we can raise sufficient funding to implement them.

Holden even seems to share some of this optimism:

Luke's... recognition of the problems I raise... increases my estimate of the likelihood that SI will work to address them...

I'm aware that SI has relatively new leadership that is attempting to address the issues behind some of my complaints. I have a generally positive impression of the new leadership; I believe the Executive Director and Development Director, in particular, to represent a step forward in terms of being interested in transparency and in testing their own general rationality. So I will not be surprised if there is some improvement in the coming years...


For brevity's sake I have skipped many important details. I may also have misinterpreted Holden somewhere. And surely, Holden and other readers have follow-up questions and objections. This is not the end of the conversation; it is closer to the beginning. I invite you to leave your comments, preferably in accordance with these guidelines (for improved discussion clarity).

213 comments, sorted by
magical algorithm
Highlighting new comments since Today at 12:37 PM
Select new highlight date
Moderation Guidelinesexpand_more

This post and the reactions to it will be an interesting test for my competing models about the value of giving detailed explanations to supporters. Here are just two of them:

One model says that detailed communication with supporters is good because it allows you to make your case for why your charity matters, and thus increase the donors' expectation that your charity can turn money into goods that they value, like poverty reduction or AI risk reduction.

Another model says that detailed communication with supporters is bad because (1) supporters are generally giving out of positive affect toward the organization, and (2) that positive affect can't be increased much once they grok the mission enough to start donating, but (3) the positive affect they feel toward the charity can be overwhelmed by the absolute number of the organization's statements with which they disagree, and (4) more detailed communication with supporters increases this absolute number more quickly than limited communication that repeats the same points again and again (e.g. in a newsletter).

I worry that model #2 may be closer to the truth, in part because of things like (Dilbert-creator) Scott Adams' account of why he decided to blog less:

I hoped that people who loved the blog would spill over to people who read Dilbert, and make my flagship product stronger. Instead, I found that if I wrote nine highly popular posts, and one that a reader disagreed with, the reaction was inevitably “I can never read Dilbert again because of what you wrote in that one post.” Every blog post reduced my income, even if 90% of the readers loved it.

An issue that SI must inevitably confront is how much rationality it will assume of its target population of donors. If it simply wanted to raise as much money as possible, there are, I expect, all kinds of Dark techniques it could use (of which decreasing communication is only the tip of the iceberg). The problem is that SI also wants to raise the sanity waterline, since that is integral to its larger mission -- and it's hard (not to mention hypocritical) to do that while simultaneously using fundraising methods that depend on the waterline being below a certain level among its supporters.

How do you expect to determine the effects of this information on donations from the comments made by supporters? In my case, for instance, I've been fairly encouraged by the explanations like this that have been coming out of SI (and had been somewhat annoyed by the lack of them previously), but my comments tend to sound negative because I tend to focus on things that I'm still not completely satisfied with.

Another model says that detailed communication with supporters is bad because (1) supporters are generally giving out of positive affect toward the organization, and (2) that positive affect can't be increased much once they grok the mission enough to start donating, but (3) the positive affect they feel toward the charity can be overwhelmed by the absolute number of the organization's statements with which they disagree, and (4) more detailed communication with supporters increases this absolute number more quickly than limited communication that repeats the same points again and again (e.g. in a newsletter).

As an example datapoint Eliezer's reply to Holden caused a net decrease (not necessarily an enormous one) in both my positive affect for and abstract evaluation of the merit of the organisation based off one particularly bad argument that shocked me. It prompted some degree (again not necessarily a large degree) of updating towards the possibility that SingInst could suffer the same kind of mind-killed thinking and behavior I expect from other organisations in the class of pet-cause idealistic charities. (And that matters more for FAI oriented charities than save-the-puppies charities, with the whole think-right or destroy the world thing.)

When allowing for the possibility that I am wrong and Eliezer is right you have to expect most other supporters to be wrong a non-trivial proportion of the time too so too much talking is going to have negative side effects.

Which issue are you talking about? Is there already a comments thread about it on Eliezer's post?

Which issue are you talking about? Is there already a comments thread about it on Eliezer's post?

Found it. It was nested too deep in a comment tree.

The particular line was:

I would ask him what he knows now, in advance, that all those sane intelligent people will miss. I don't see how you could (well-justifiedly) access that epistemic state.

The position is something I think it is best I don't mention again until (unless) I get around to writing the post "Predicting Failure Without Details" to express the position clearly with references and what limits apply to that kind of reasoning.

Isn't it just straight-up outside view prediction?

Is it possible that supporters might update on communicativeness, separately from updating on what you actually have to say? Generally when I see the SI talking to people, I feel the warm fuzziness before I actually read what you're saying. It just seems like people might associate "detailed engagement with supporters and critics" with the reference class of "good organizations".