Reply to Holden on The Singularity Institute

lukeprog

Holden Karnofsky of GiveWell has objected to the Singularity Institute (SI) as a target for optimal philanthropy. As someone who thinks that existential risk reduction is really important and also that the Singularity Institute is an important target of optimal philanthropy, I would like to explain why I disagree with Holden on these subjects. (I am also SI's Executive Director.)

Mostly, I'd like to explain my views to a broad audience. But I'd also like to explain my views to Holden himself. I value Holden's work, I enjoy interacting with him, and I think he is both intelligent and capable of changing his mind about Big Things like this. Hopefully Holden and I can continue to work through the arguments together, though of course we are both busy with many other things.

I appreciate the clarity and substance of Holden's objections, and I hope to reply in kind. I begin with an overview of some basic points that may be familiar to most Less Wrong veterans, and then I reply point-by-point to Holden's post. In the final section, I summarize my reply to Holden.

Holden raised many different issues, so unfortunately this post needed to be long. My apologies to Holden if I have misinterpreted him at any point.

Existential risk reduction is a critical concern for many people, given their values and given many plausible models of the future. Details here.
Among existential risks, AI risk is probably the most important. Details here.
SI can purchase many kinds of AI risk reduction more efficiently than other groups can. Details here.
These points and many others weigh against many of Holden's claims and conclusions. Details here.
Summary of my reply to Holden

Comments

I must be brief, so while reading this post I am sure many objections will leap to your mind. To encourage constructive discussion on this post, each question (posted as a comment on this page) that follows the template described below will receive a reply from myself or another SI representative.

Please word your question as clearly and succinctly as possible, and don't assume your readers will have read this post before reading your question (because: the conversations here may be used as source material for a comprehensive FAQ).

Here's an example of how you could word the first paragraph of your question: "You claimed that [insert direct quote here], and also that [insert another direct quote here]. That seems to imply that [something something]. But that doesn't seem to take into account that [blah blah blah]. What do you think of that?"

If your question needs more explaining, leave the details to subsequent paragraphs in your comment. Please post multiple questions as multiple comments, so they can be voted upon and replied to individually. If you don't follow these rules, I can't guarantee SI will have time to give you a reply. (We probably won't.)

Why many people care greatly about existential risk reduction

Why do many people consider existential risk reduction to be humanity's most important task? I can't say it much better than Nick Bostrom does, so I'll just quote him:

An existential risk is one that threatens the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development. Although it is often difficult to assess the probability of existential risks, there are many reasons to suppose that the total such risk confronting humanity over the next few centuries is significant...

Humanity has survived what we might call natural existential risks [asteroid impacts, gamma ray bursts, etc.] for hundreds of thousands of years; thus it is prima facie unlikely that any of them will do us in within the next hundred...

In contrast, our species is introducing entirely new kinds of existential risk—threats we have no track record of surviving... In particular, most of the biggest existential risks seem to be linked to potential future technological breakthroughs that may radically expand our ability to manipulate the external world or our own biology. As our powers expand, so will the scale of their potential consequences—intended and unintended, positive and negative. For example, there appear to be significant existential risks in some of the advanced forms of biotechnology, molecular nanotechnology, and machine intelligence that might be developed in the decades ahead.

What makes existential catastrophes especially bad is not that they would [cause] a precipitous drop in world population or average quality of life. Instead, their significance lies primarily in the fact that they would destroy the future... To calculate the loss associated with an existential catastrophe, we must consider how much value would come to exist in its absence. It turns out that the ultimate potential for Earth-originating intelligent life is literally astronomical.

One gets a large number even if one confines one’s consideration to the potential for biological human beings living on Earth. If we suppose... that our planet will remain habitable for at least another billion years, and we assume that at least one billion people could live on it sustainably, then the potential exist for at least 10¹⁸ human lives. [The numbers get way bigger if you consider the expansion of posthuman civilization to the rest of the galaxy or the prospect of mind uploading.]

Even if we use the most conservative of these estimates, which entirely ignores the possibility of space colonization and software minds, we find that the expected loss of an existential catastrophe is greater than the value of 10¹⁶ human lives...

These considerations suggest that the loss in expected value resulting from an existential catastrophe is so enormous that the objective of reducing existential risks should be a dominant consideration whenever we act out of an impersonal concern for humankind as a whole.

I refer the reader to Bostrom's paper for further details and additional arguments, but neither his paper nor this post can answer every objection one might think of.

Nor can I summarize all the arguments and evidence related to estimating the severity and time horizon of every proposed existential risk. Even the 500+ pages of Oxford University Press' Global Catastrophic Risks can barely scratch the surface of this enormous topic. As explained in Intelligence Explosion: Evidence and Import, predicting long-term technological progress is hard. Thus, we must

examine convergent outcomes that—like the evolution of eyes or the emergence of markets—can come about through any of several different paths and can gather momentum once they begin.

I'll say more about convergent outcomes later, but for now I'd just like to suggest that:

Many humans living today value both current and future people enough that if existential catastrophe is plausible this century, then upon reflection (e.g. after counteracting their unconscious, default scope insensitivity) they would conclude that reducing the risk of existential catastrophe is the most valuable thing they can do — whether through direct work or by donating to support direct work. It is to these people I appeal. (I also have much to say to people who e.g. don't care about future people, but it is too much to say here and now.)
As it turns out, we do have good reason to believe that existential catastrophe is plausible this century.

I don't have the space here to discuss the likelihood of different kinds of existential catastrophe that could plausibly occur this century (see GCR for more details), so instead I'll talk about just one of them: an AI catastrophe.

AI risk: the most important existential risk

There are two primary reasons I think AI is the most important existential risk:

Reason 1: Mitigating AI risk could mitigate all other existential risks, but not vice-versa. There is an asymmetry between AI risk and other existential risks. If we mitigate the risks from (say) synthetic biology and nanotechnology (without building Friendly AI), this only means we have bought a few years or decades for ourselves before we must face yet another existential risk from powerful new technologies. But if we manage AI risk well enough (i.e. if we build a Friendly AI or "FAI"), we may be able to "permanently" (for several billion years) secure a desirable future. Machine superintelligence working in the service of humane goals could use its intelligence and resources to prevent all other existential catastrophes. (Eliezer: "I distinguish 'human', that which we are, from 'humane'—that which, being human, we wish we were.")

Reason 2: AI is probably the first existential risk we must face (given my evidence, only the tiniest fraction of which I can share in a blog post).

One reason AI may be the most urgent existential risk is that it's more likely for AI (compared to other sources of catastrophic risk) to be a full-blown existential catastrophe (as opposed to a merely billions dead catastrophe). Humans are smart and adaptable; we are already set up for a species-preserving number of humans to survive (e.g. in underground bunkers with stockpiled food, water, and medicine) major catastrophes from nuclear war, superviruses, supervolcano eruption, and many cases of asteroid impact or nanotechnological ecophagy.

Machine superintelligences, however, could intelligently seek out and neutralize humans which they (correctly) recognize as threats to the maximal realization of their goals. Humans are surprisingly easy to kill if an intelligent process is trying to do so. Cut off John's access to air for a few minutes, or cut off his water supply for a few days, or poke him with a sharp stick, and he dies. Forever. (Post-humans might shudder at this absurdity like we shudder at the idea that people used to die from their teeth.)

Why think AI is coming anytime soon? This is too complicated a topic to breach here. See Intelligence Explosion: Evidence and Import for a brief analysis of AI timelines. Or try The Uncertain Future, which outputs an estimated timeline for human-level AI based on your predictions of various technological developments. (SI is currently collaborating with the Future of Humanity Institute to write another paper on this subject.)

It's also important to mention that the case for caring about AI risk is less conjunctive that many seem to think, which I discuss in more detail here.

SI can purchase several kinds of AI risk reduction more efficiently than others can

The two organizations working most directly to reduce AI risk are the Singularity Institute and the Future of Humanity Institute (FHI). Luckily, these organizations complement each other well, as I pointed out back before I was running SI:

FHI is part of Oxford, and thus can bring credibility to existential risk reduction. Resulting output: lots of peer-reviewed papers, books from OUP like Global Catastrophic Risks, conferences, media appearances, etc.
SI is independent and is less constrained by conservatism or the university system. Resulting output: Very novel (and, to the mainstream, "weird") research on Friendly AI, and the ability to do unusual things that are nevertheless quite effective at finding/creating lots of new people interested in rationality and existential risk reduction: (1) The Sequences, the best tool I know for creating aspiring rationalists, (2) Harry Potter and the Methods of Rationality, a surprisingly successful tool for grabbing the attention of mathematicians and computer scientists around the world, and (3) the Singularity Summit, a mainstream-aimed conference that brings in people who end up making significant contributions to the movement — e.g. Tomer Kagan (an SI donor and board member) and David Chalmers (author of The Singularity: A Philosophical Analysis and The Singularity: A Reply).

A few weeks later, Nick Bostrom (Director of FHI) said the same things (as far as I know, without having read my comment):

I think there is a sense that both organizations are synergistic. If one were about to go under... that would probably be the one [to donate to]. If both were doing well... different people will have different opinions. We work quite closely with the folks from [the Singularity Institute]...

There is an advantage to having one academic platform and one outside academia. There are different things these types of organizations give us. If you wanna get academics to pay more attention to this, to get postdocs to work on this, that's much easier to do within academia; also to get the ear of policy-makers and media... On the other hand, for [SI] there might be things that are easier for them to do. More flexibility, they're not embedded in a big bureaucracy. So they can more easily hire people with non-standard backgrounds... and also more grass-roots stuff like Less Wrong...

FHI is, despite its small size, a highly productive philosophy department. More importantly, FHI has focused its research work on AI risk issues for the past 9 months, and plans to continue on that path for at least another 12 months. This is important work that should be supported. (Note that FHI recently hired SI research associate Daniel Dewey.)

SI lacks FHI's publishing productivity and its university credibility, but as an organization SI is improving quickly, and it can seize many opportunities for AI risk reduction that FHI is not well-positioned to seize. (New organizations will also tend to be less capable of seizing these opportunities than SI, due to the financial and human capital already concentrated at SI and FHI.)

Here are some examples of projects that SI is probably better able to carry out than FHI, given its greater flexibility (and assuming sufficient funding):

A scholarly AI risk wiki written and maintained by dozens of part-time researchers from around the world.
Reaching young math/compsci talent in unusual ways, e.g. HPMoR.
Writing Open Problems in Friendly AI (Eliezer has spent far more time working on the mathy sub-problems of FAI than anyone else).

My replies to Holden, point by point

Holden's post makes so many claims that I'll just have to work through his post from beginning to end, and then summarize where I think we stand at the end.

GiveWell Labs

Holden opened "Thoughts on the Singularity Institute" by noting that SI was previously outside Givewell's scope, since GiveWell was focused on specific domains like poverty reduction. With the launch of GiveWell Labs, GiveWell is now open to evaluating any giving opportunity, including SI.

I admire this move. I'm sure people have been bugging GiveWell to do this for a long time, but almost none of those people appreciate how hard it is to launch broad new initiatives like this with the limited budget of an organization like Givewell or the Singularity Institute. Most of them also do not understand how much work is required to write something like "Thoughts on the Singularity Institute", "Reply to Holden on Tool AI", or this post.

Three possible outcomes

Next, Holden wrote:

[I hope] that one of these three things (or some combination) will happen:

New arguments are raised that cause me to change my mind and recognize SI as an outstanding giving opportunity. If this happens I will likely attempt to raise more money for SI (most likely by discussing it with other GiveWell staff and collectively considering a GiveWell Labs recommendation).

SI concedes that my objections are valid and increases its determination to address them. A few years from now, SI is a better organization and more effective in its mission.

SI can't or won't make changes, and SI's supporters feel my objections are valid, so SI loses some support, freeing up resources for other approaches to doing good.

As explained at the top of Holden's post, I had already conceded that many of Holden's objections (especially concerning past organizational competence) are valid, and had been working to address them, even before Holden's post was published. So outcome #2 is already true in part.

I hope for outcome #1, too, but I don't expect Holden to change his opinion overnight. There are too many possible objections to which Holden has not yet heard a good response. But hopefully this post and its comment threads will successfully address some of Holden's (and others') objections.

Outcome #3 is unlikely since SI is already making changes, though of course it's possible we will be unable to raise sufficient funding for SI despite making these changes, or even because of our efforts to make these changes. (Improving general organizational effectiveness is important but it costs money and is not exciting to donors.)

SI's mission is more important than SI as an organization

Holden said:

whatever happens as a result of my post will be positive for SI's mission, whether or not it is positive for SI as an organization. I believe that most of SI's supporters and advocates care more about the former than about the latter, and that this attitude is far too rare in the nonprofit world.

Clearly, SI's mission is more important than SI as an organization. If somebody launches an organization more effective (at AI risk reduction) than SI but just as flexible, then SI should probably fold itself and try to move its donor base, support community, and the best of its human capital to that new organization.

That said, it's probably easier to reform SI into a more effective organization than it is to launch a new one, since SI has successfully concentrated lots of attention, donor support, and human capital. Also, SI has learned many lessons about how to run a very tricky kind of organization. AI risk reduction is a mission that (1) is beyond most people's time horizons for caring, (2) is hard to understand and visualize, (3) pattern-matches to science fiction and apocalyptic religion, (4) suffers under complicated and necessarily uncertain strategic considerations (compare to the simplicity of bed nets), (5) has a very small pool of people from which to recruit researchers, etc. SI has lots of experience with these issues; experience that probably takes a long time and lots of money to acquire.

(On the other hand, SI has also concentrated some bad reputation which a new organization could launch without. But I still think the weight of the arguments is in favor of reforming SI.)

SI's arguments need to be clearer

Holden:

I do not believe that [my objections to SI's apparent views] constitute a sharp/tight case for the idea that SI's work has low/negative value; I believe, instead, that SI's own arguments are too vague for such a rebuttal to be possible. There are many possible responses to my objections, but SI's public arguments (and the private arguments) do not make clear which possible response (if any) SI would choose to take up and defend. Hopefully the dialogue following this post will clarify what SI believes and why.

I agree that SI's arguments are often vague. For example, Chris Hallquist reported:

I've been trying to write something about Eliezer's debate with Robin Hanson, but the problem I keep running up against is that Eliezer's points are not clearly articulated at all. Even making my best educated guesses about what's supposed to go in the gaps in his arguments, I still ended up with very little.

I know the feeling! That's why I've tried to write as many clarifying documents as I can, including the Singularity FAQ, Intelligence Explosion: Evidence and Import, The Singularity and Machine Ethics, Facing the Singularity, So You Want to Save the World, and How to Purchase AI Risk Reduction.

Unfortunately, it takes lots of resources to write up hundreds of arguments and responses to objections in clear and precise language, and we're working on it. (For comparison, Nick Bostrom's forthcoming book on machine superintelligence will barely scratch the surface of the things SI and FHI researchers have worked out in conversation, and it will probably take him 2+ years to write in total, and Bostrom is already an unusually prolific writer.) Hopefully SI's responses to Holden's post have helped to clarify our positions already.

Holden's objection #1 punts to objection #2

The first objection on Holden's numbered list was:

it seems to me that any AGI that was set to maximize a "Friendly" utility function would be extraordinarily dangerous.

I'm glad Holden agrees with us that successful Friendly AI is very hard. SI has spent much of its effort trying to show people that the first 20 solutions they come up with all fail. See: AI as a Positive and Negative Factor in Global Risk, The Singularity and Machine Ethics, Complex Value Systems are Required to Realize Valuable Futures, etc. Holden mentions the standard SI worry about the hidden complexity of wishes, and the one about a friendly utility function still causing havoc because the AI's priors are wrong (problem 3.6 from my list of open problems in AI risk research).

There are reasons to think FAI is harder still. What if we get the utility function right and we get the priors right but the AI's values change for the worse when it updates its ontology? What if the smartest, most careful, most insanely safety-conscious AI researchers humanity can produce just aren't smart enough to solve the problem? What if no humans are altruistic enough to choose to build FAI over an AI that will make them king of the universe? What if the idea of FAI is incoherent? (The human brain is an existence proof for the possibility of general intelligence, but we have no existence proof for the possibility of a decision theoretic agent which stably optimizes the world according to a set of preferences over states of affairs.)

So, yeah. Friendly AI is hard. But as I said elsewhere:

The point is that not trying as hard as you can to build Friendly AI is even worse, because then you almost certainly get uFAI. At least by trying to build FAI, we've got some chance of winning.

So Holden's objection #1 objection really just punts to objection #2, about tool-AGI, as the last paragraph in this section of Holden's post seems to indicate:

So far, all I have argued is that the development of "Friendliness" theory can achieve at best only a limited reduction in the probability of an unfavorable outcome. However, as I argue in the next section, I believe there is at least one concept - the "tool-agent" distinction - that has more potential to reduce risks, and that SI appears to ignore this concept entirely.

So if Holden's objection #2 doesn't work, then objection #1 ends up reducing to "the development of Friendliness theory can achieve at best a reduction in AI risk," which is what SI has been saying all along.

Tool AI

Holden's second numbered objection was:

SI appears to neglect the potentially important distinction between "tool" and "agent" AI.

Eliezer wrote a whole post about this here. To sum up:

(1) Whether you're working with Tool AI or Agent AI, you need the "Friendly AI" domain experts that SI is trying to recruit:

A "Friendly AI programmer" is somebody who specializes in seeing the correspondence of mathematical structures to What Happens in the Real World. It's somebody who looks at Hutter's specification of AIXI and reads the actual equations - actually stares at the Greek symbols and not just the accompanying English text - and sees, "Oh, this AI will try to gain control of its reward channel," as well as numerous subtler issues like, "This AI presumes a Cartesian boundary separating itself from the environment; it may drop an anvil on its own head." Similarly, working on TDT means e.g. looking at a mathematical specification of decision theory, and seeing "Oh, this is vulnerable to blackmail" and coming up with a mathematical counter-specification of an AI that isn't so vulnerable to blackmail.

Holden's post seems to imply that if you're building a non-self-modifying planning Oracle (aka 'tool AI') rather than an acting-in-the-world agent, you don't need a Friendly AI programmer because FAI programmers only work on agents. But this isn't how the engineering skills are split up. Inside the AI, whether an agent AI or a planning Oracle, there would be similar AGI-challenges like "build a predictive model of the world", and similar FAI-conjugates of those challenges like finding the 'user' inside an AI-created model of the universe. The insides would look a lot more similar than the outsides. An analogy would be supposing that a machine learning professional who does sales optimization for an orange company couldn't possibly do sales optimization for a banana company, because their skills must be about oranges rather than bananas.

(2) Tool AI isn't that much safer than Agent AI, because Tool AIs have lots of hidden "gotchas" that cause havoc, too. (See Eliezer's post for examples.)

These points illustrate something else Eliezer wrote:

What the human species needs from an x-risk perspective is experts on This Whole Damn Problem [of AI risk], who will acquire whatever skills are needed to that end. The Singularity Institute exists to host such people and enable their research—once we have enough funding to find and recruit them.

Indeed. We need places for experts who specialize in seeing the consequences of mathematical objects for things humans value (e.g. the Singularity Institute) just like we need places for experts on efficient charity (e.g. Givewell).

Anyway, it's worth pointing out that Holden did not make the common (and mistaken) argument that "We should just build Tool AIs instead of Agent AIs and then we'll be fine." This is wrong for many reasons, but one obvious point is that there are incentives to build Agent AIs (because they're powerful), so even if the first 6 teams are careful enough to build only Tool AIs, the 7th team could still build Agent AI and destroy the world.

Instead, Holden pointed out that you could use Tool AI to increase your chances of successfully building agenty FAI:

if developing "Friendly AI" is what we seek, a tool-AGI could likely be helpful enough in thinking through this problem as to render any previous work on "Friendliness theory" moot. Among other things, a tool-AGI would allow transparent views into the AGI's reasoning and predictions without any reason to fear being purposefully misled, and would facilitate safe experimental testing of any utility function that one wished to eventually plug into an "agent."

After reading Eliezer's reply, however, you can probably guess my replies to this paragraph:

Tool AI isn't as safe as Holden thinks.
But yeah, a Friendly AI team may very well use "Tool AI" to aid Friendliness research if it can figure out a safe way to do that. This doesn't obviate the need for Friendly AI researchers; it's part of their research toolbox.

So Holden's Objection #2 doesn't work, which (as explained earlier) means that his Objection #1 (as stated) doesn't work either.

SI's mission assumes a scenario that is far less conjunctive than it initially appears.

Holden's objection #3 is:

SI's envisioned scenario is far more specific and conjunctive than it appears at first glance, and I believe this scenario to be highly unlikely.

His main concern here seemed to be that technological developments and other factors would render earlier FAI work irrelevant. But Eliezer's clarifications about what we mean by "FAI team" render this objection moot, at least as it is currently stated. The purpose of an FAI team is not to blindly develop one particular approach to Friendly AI without checking to see whether this work will be obsoleted by future developments. Instead, the purpose of an FAI team is to develop highly specialized expertise on, among other things, which kinds of research are more and less likely to be relevant given future developments.

Holden's confusion about what SI means by "FAI team" is common and understandable, and it is one reason that SI's mission assumes a scenario that is far less conjunctive than it appears to many. We aren't saying we need an FAI team because we know lots of specific things about how AGI will be built 30 years from now. We're saying you need experts on "the consequences of mathematical objects for things humans value" (an FAI team) because AGIs are mathematical objects and will have big consequences. That's pretty disjunctive.

Similarly, many people think SI's mission is predicated on hard takeoff. After all, we call ourselves the "Singularity Institute," Eliezer has spent a lot of time arguing for hard takeoff, and our current research summary frames AI risk in terms of recursive self-improvement.

But the case for AI as a global risk, and thus the need for dedicated experts on AI risk and "the consequences of mathematical objects for things humans value", isn't predicated on hard takeoff. Instead, it looks something like this:

(1) Eventually, most tasks are performed by machine intelligences.

The improved flexibility, copyability, and modifiability of machine intelligences make them economically dominant even without other advantages (Brynjolfsson & McAfee 2011; Hanson 2008). In addition, there is plenty of room "above" the human brain in terms of hardware and software for general intelligence (Muehlhauser & Salamon 2012; Sotala 2012; Kurzweil 2005).

(2) Machine intelligences don't necessarily do things we like.

We don't necessarily control AIs, since advanced intelligences may be inherently goal-oriented (Omohundro 2007), and even if we build advanced "Tool AIs," these aren't necessarily safe either (Yudkowsky 2012) and there will be significant economic incentives to transform them into autonomous agents (Brynjolfsson & McAfee 2011). We don't value most possible futures, but it's very hard to get an autonomous AI to do exactly what you want (Yudkowsky 2008, 2011; Muehlhauser & Helm 2012; Arkin 2009).

(3) There are things we can do to increase the probability that machine intelligences do things we like.

Further research can clarify (1) the nature and severity of the risk, (2) how to engineer goal-oriented systems safely, (3) how to increase safety with differential technological development, (4) how to limit and control machine intelligences (Armstrong et al. 2012; Yampolskiy 2012), (5) solutions to AI development coordination problems, and more.

(4) We should do those things now.

People aren't doing much about these issues now. We could wait until we understand better (e.g.) what kind of AI is likely, but: (1) it might take a long time to resolve the core issues, including difficult technical subproblems that require time-consuming mathematical breakthroughs, (2) incentives may be badly aligned (e.g. there seem to be strong economic incentives to build AI, but not to take into account social and global risks for AI), (3) AI may not be that far away (Muehlhauser & Salamon 2012), and (4) the transition to machine dominance may be surprisingly rapid due to (e.g.) intelligence explosion (Chalmers 2010, 2012; Muehlhauser & Salamon 2012) or computing overhang.

What do I mean by "computing overhang"? We may get the hardware needed for AI long before we get the software, such that once software for general intelligence is figured out, there is tons of computing hardware sitting around for running AIs (a "computing overhang"). Thus we could switch from a world with one autonomous AI to a world with 10 billion autonomous AIs at the speed of copying software, and thereby transition rapidly from human dominance to AI dominance even without an intelligence explosion. (This is one of the many, many things we haven't yet written up in detail up due to lack of resources.)

(This broad argument is greatly compressed from a paper outline developed by Paul Christiano, Carl Shulman, Nick Beckstead, and myself. We'd love to write the paper at some point, but haven't had the resources to do so. The fuller version of this argument is of course more detailed.)

SI's public argumentation

Next, Holden turned to the topic of SI's organizational effectiveness:

when evaluating a group such as SI, I can't avoid placing a heavy weight on (my read on) the general competence, capability and "intangibles" of the people and organization, because SI's mission is not about repeating activities that have worked in the past...

There are several reasons that I currently have a negative impression of SI's general competence, capability and "intangibles."

The first reason Holden gave for his negative impression of SI is:

SI has produced enormous quantities of public argumentation... Yet I have never seen a clear response to any of the three basic objections I listed in the previous section. One of SI's major goals is to raise awareness of AI-related risks; given this, the fact that it has not advanced clear/concise/compelling arguments speaks, in my view, to its general competence.

I agree in part. Here's what I think:

SI hasn't made its arguments as clear, concise, and compelling as I would like. We're working on that. It takes time, money, and people who are (1) smart and capable enough to do AI risk research work and yet somehow (2) willing to work for non-profit salaries and (3) willing to not advance their careers like they would if they chose instead to work at a university.
There are a huge number of possible objections to SI's arguments, and we haven't had the resources to write up clear and compelling replies to all of them. (See Chalmers 2012 for quick rebuttals to many objections to intelligence explosion, but what he covers in that paper barely scratches the surface.) As Eliezer wrote, Holden's complaint that SI hasn't addressed his particular objections "seems to lack perspective on how many different things various people see as the one obvious solution to Friendly AI. Tool AI wasn't the obvious solution to John McCarthy, I.J. Good, or Marvin Minsky. Today's leading AI textbook, Artificial Intelligence: A Modern Approach... discusses Friendly AI and AI risk for 3.5 pages but doesn't mention tool AI as an obvious solution. For Ray Kurzweil, the obvious solution is merging humans and AIs. For Jurgen Schmidhuber, the obvious solution is AIs that value a certain complicated definition of complexity in their sensory inputs. Ben Goertzel, J. Storrs Hall, and Bill Hibbard, among others, have all written about how silly Singinst is to pursue Friendly AI when the solution is obviously X, for various different X. Among current leading people working on serious AGI programs labeled as such, neither Demis Hassabis (VC-funded to the tune of several million dollars) nor Moshe Looks (head of AGI research at Google) nor Henry Markram (Blue Brain at IBM) think that the obvious answer is Tool AI. Vernor Vinge, Isaac Asimov, and any number of other SF writers with technical backgrounds who spent serious time thinking about these issues didn't converge on that solution."
SI has done a decent job of raising awareness of AI risk, I think. Writing The Sequences and HPMoR have (indirectly) raised more awareness for AI risk that one can normally expect from, say, writing a bunch of clear and precise academic papers about a subject. (At least, it seems that way to me.)

SI's endorsements

The second reason Holden gave for his negative impression of SI is "a lack of impressive endorsements." This one is generally true, despite the three "celebrity endorsements" on our new donate page. More impressive than these is the fact that, as Eliezer mentioned, the latest edition of the leading AI textbook spend several pages talking about AI risk and Friendly AI, and discusses the work of SI-associated researchers like Eliezer Yudkowsky and Steve Omohundro while completely ignoring the existence of the older, more prestigious, and vastly larger mainstream academic field of "machine ethics."

Why don't we have impressive endorsements? To my knowledge, SI hasn't tried very hard to get them. That's another thing we're in the process of changing.

SI and feedback loops

The third reason Holden gave for his negative impression of SI is:

SI seems to have passed up opportunities to test itself and its own rationality by e.g. aiming for objectively impressive accomplishments... Pursuing more impressive endorsements and developing benign but objectively recognizable innovations (particularly commercially viable ones) are two possible ways to impose more demanding feedback loops.

We have thought many times about commercially viable innovations we could develop, but these would generally be large distractions from the work of our core mission. (The Center for Applied Rationality, in contrast, has many opportunities to develop commercially viable innovations in line with its core mission.)

Still, I do think it's important for the Singularity Institute to test itself with tight feedback loops wherever feasible. This is particularly difficult to do for a research organization doing a philosophy of long-term forecasting (30 years is not a "tight" feedback loop in the slightest), but that's what FHI does and they have more "objectively impressive" (that is, "externally proclaimed") accomplishments: lots of peer-reviewed publications, some major awards for its top researcher Nick Bostrom, etc.

SI and rationality

Holden's fourth concern about SI is that it is overconfident about the level of its own rationality, and that this seems to show itself in (e.g.) "insufficient self-skepticism" and "being too selective (in terms of looking for people who share its preconceptions) when determining whom to hire and whose feedback to take seriously."

What would provide good evidence of rationality? Holden explains:

I endorse Eliezer Yudkowsky's statement, "Be careful … any time you find yourself defining the [rationalist] as someone other than the agent who is currently smiling from on top of a giant heap of utility." To me, the best evidence of superior general rationality (or of insight into it) would be objectively impressive achievements (successful commercial ventures, highly prestigious awards, clear innovations, etc.) and/or accumulation of wealth and power. As mentioned above, SI staff/supporters/advocates do not seem particularly impressive on these fronts...

Unfortunately, this seems to misunderstand the term "rationality" as it is meant in cognitive science. As I explained elsewhere:

Like intelligence and money, rationality is only a ceteris paribus predictor of success.

So while it's empirically true (Stanovich 2010) that rationality is a predictor of life success, it's a weak one. (At least, it's a weak predictor of success at the levels of human rationality we are capable of training today.) If you want to more reliably achieve life success, I recommend inheriting a billion dollars or, failing that, being born+raised to have an excellent work ethic and low akrasia.

The reason you should "be careful… any time you find yourself defining the [rationalist] as someone other than the agent who is currently smiling from on top of a giant heap of utility" is because you should "never end up envying someone else's mere choices." You are still allowed to envy their resources, intelligence, work ethic, mastery over akrasia, and other predictors of success.

But I don't mean to dodge the key issue. I think SIers are generally more rational than most people (and so are LWers, it seems), but I think SIers have often overestimated their own rationality, myself included. Certainly, I think SI's leaders have been pretty irrational about organizational development at many times in the past. In internal communications about why SI should help launch CFAR, one reason on my list has been: "We need to improve our own rationality, and figure out how to create better rationalists than exist today."

SI's goals and activities

Holden's fifth concern about SI is the apparent disconnect between SI's goals and its activities:

SI seeks to build FAI and/or to develop and promote "Friendliness theory" that can be useful to others in building FAI. Yet it seems that most of its time goes to activities other than developing AI or theory.

This one is pretty easy to answer. We've focused mostly on movement-building rather than direct research because, until very recently, there wasn't enough community interest or funding to seriously begin to form an FAI team. To do that you need (1) at least a few million dollars a year, and (2) enough smart, altruistic people to care about AI risk that there exist some potential superhero mathematicians for the FAI team. And to get those two things, you've got to do mostly movement-building, e.g. Less Wrong, HPMoR, the Singularity Summit, etc.

Theft

And of course, Holden is (rightly) concerned about the 2009 theft of $118,000 from SI, and the lack of public statements from SI on the matter.

Briefly:

Two former employees stole $118,000 from SI. Earlier this year we finally won stipulated judgments against both individuals, forcing them to pay back the full amounts they stole. We have already recovered several thousand dollars of this.
We do have much better financial controls now. We consolidated our accounts so there are fewer accounts to watch, and at least three staff members check them regularly, as does our treasurer, who is not an SI staff member or board member.

Pascal's Mugging

In another section, Holden wrote:

A common argument that SI supporters raise with me is along the lines of, "Even if SI's arguments are weak and its staff isn't as capable as one would like to see, their goal is so important that they would be a good investment even at a tiny probability of success."

I believe this argument to be a form of Pascal's Mugging and I have outlined the reasons I believe it to be invalid...

Some problems with Holden's two posts on this subject will be explained in a forthcoming post by Steven Kaas. But as Holden notes, some SI principals like Eliezer don't use "small probability of large impact" arguments, anyway. We in fact argue that the probability of a large impact is not tiny.

Summary of my reply to Holden

Now that I have addressed so many details, let us return to the big picture. My summarized reply to Holden goes like this:

Holden's first two objections can be summarized as arguing that developing the Friendly AI approach is more dangerous than developing non-agent "Tool" AI. Eliezer's post points out that "Friendly AI" domain experts are what you need whether you're working with Tool AI or Agent AI, because (1) both of these approaches require FAI experts (experts in seeing the consequences of mathematical objects for what humans value), and because (2) Tool AI isn't necessarily much safer than Agent AI, because Tool AIs have lots of hidden gotchas, too. Thus, "What the human species needs from an x-risk perspective is experts on This Whole Damn Problem [of AI risk], who will acquire whatever skills are needed to that end. The Singularity Institute exists to host such people and enable their research — once we have enough funding to find and recruit them."

Holden's third objection was that the argument behind SI's mission is more conjunctive than it seems. I replied that the argument behind SI's mission is actually less conjunctive than it often seems, because an "FAI team" works on a broader set of problems than Holden had realized, and because the case for AI risk is more disjunctive than many people realize. These confusions are understandable, however, and they probably are a result of insufficient clear argumentative writing from SI on these matters — a problem we am trying to fix with several recent and forthcoming papers and other communications (like this one).

Holden's next objection concerned SI as an organization: "SI has, or has had, multiple properties that I associate with ineffective organizations." I acknowledged these problems before Holden published his post, and have since outlined the many improvements we've made to organizational effectiveness since I was made Executive Director. I addressed several of Holden's specific worries here.

Finally, Holden recommended giving to a donor-advised fund rather than to SI:

I don't think that "Cause X is the one I care about and Organization Y is the only one working on it" to be a good reason to support Organization Y. For donors determined to donate within this cause, I encourage you to consider donating to a donor-advised fund while making it clear that you intend to grant out the funds to existential-risk-reduction-related organizations in the future....

For one who accepts my arguments about SI, I believe withholding funds in this way is likely to be better for SI's mission than donating to SI

By now I've called into question most of Holden's arguments about SI, but I will still address the issue of donating to SI vs. donating to a donor-advised fund.

First: Which public charity would administer the donor-advised fund? Remember also that in the U.S., the administering charity need not spend from the donor-advised fund as the donor wishes, though they often do.

Second: As I said earlier,

it's probably easier to reform SI into a more effective organization than it is to launch a new one, since SI has successfully concentrated lots of attention, donor support, and human capital. Also, SI has learned many lessons about how to run a very tricky kind of organization. AI risk reduction is a mission that (1) is beyond most people's time horizons for caring, (2) is hard to understand and visualize, (3) pattern-matches to science fiction and apocalyptic religion, (4) suffers under complicated and necessarily uncertain strategic considerations (compare to the simplicity of bed nets), (5) has a very small pool of people from which to recruit researchers, etc. SI has lots of experience with these issues; experience that probably takes a long time and lots of money to acquire.

The case for funding improvements and growth at SI (as opposed to starving SI as Holden suggests) is bolstered by the fact that SI's productivity and effectiveness have been improving rapidly of late, and many other improvements (and exciting projects) are on our "to-do" list if we can raise sufficient funding to implement them.

Holden even seems to share some of this optimism:

Luke's... recognition of the problems I raise... increases my estimate of the likelihood that SI will work to address them...

I'm aware that SI has relatively new leadership that is attempting to address the issues behind some of my complaints. I have a generally positive impression of the new leadership; I believe the Executive Director and Development Director, in particular, to represent a step forward in terms of being interested in transparency and in testing their own general rationality. So I will not be surprised if there is some improvement in the coming years...

Conclusion

For brevity's sake I have skipped many important details. I may also have misinterpreted Holden somewhere. And surely, Holden and other readers have follow-up questions and objections. This is not the end of the conversation; it is closer to the beginning. I invite you to leave your comments, preferably in accordance with these guidelines (for improved discussion clarity).

I think this post makes a strong case for needing further donations. Have $3,000.

I agree. Have another $1,100. Also, for those who are interested, a link to a blog post I wrote explaining why I donated.

5lukeprog14y

Thanks!!!

7MichaelAnissimov14y

Thank you Rain.

This post and the reactions to it will be an interesting test for my competing models about the value of giving detailed explanations to supporters. Here are just two of them:

One model says that detailed communication with supporters is good because it allows you to make your case for why your charity matters, and thus increase the donors' expectation that your charity can turn money into goods that they value, like poverty reduction or AI risk reduction.

Another model says that detailed communication with supporters is bad because (1) supporters are generally giving out of positive affect toward the organization, and (2) that positive affect can't be increased much once they grok the mission enough to start donating, but (3) the positive affect they feel toward the charity can be overwhelmed by the absolute number of the organization's statements with which they disagree, and (4) more detailed communication with supporters increases this absolute number more quickly than limited communication that repeats the same points again and again (e.g. in a newsletter).

I worry that model #2 may be closer to the truth, in part because of things like (Dilbert-creator) Scott Adams' account of w... (read more)

An issue that SI must inevitably confront is how much rationality it will assume of its target population of donors. If it simply wanted to raise as much money as possible, there are, I expect, all kinds of Dark techniques it could use (of which decreasing communication is only the tip of the iceberg). The problem is that SI also wants to raise the sanity waterline, since that is integral to its larger mission -- and it's hard (not to mention hypocritical) to do that while simultaneously using fundraising methods that depend on the waterline being below a certain level among its supporters.

8AlexMennen14y

How do you expect to determine the effects of this information on donations from the comments made by supporters? In my case, for instance, I've been fairly encouraged by the explanations like this that have been coming out of SI (and had been somewhat annoyed by the lack of them previously), but my comments tend to sound negative because I tend to focus on things that I'm still not completely satisfied with.

5lukeprog14y

It's very hard. Comments like this help a little.

6wedrifid14y

As an example datapoint Eliezer's reply to Holden caused a net decrease (not necessarily an enormous one) in both my positive affect for and abstract evaluation of the merit of the organisation based off one particularly bad argument that shocked me. It prompted some degree (again not necessarily a large degree) of updating towards the possibility that SingInst could suffer the same kind of mind-killed thinking and behavior I expect from other organisations in the class of pet-cause idealistic charities. (And that matters more for FAI oriented charities than save-the-puppies charities, with the whole think-right or destroy the world thing.) When allowing for the possibility that I am wrong and Eliezer is right you have to expect most other supporters to be wrong a non-trivial proportion of the time too so too much talking is going to have negative side effects.

1lukeprog14y

Which issue are you talking about? Is there already a comments thread about it on Eliezer's post?

4wedrifid14y

Found it. It was nested too deep in a comment tree. The particular line was: [...] The position is something I think it is best I don't mention again until (unless) I get around to writing the post "Predicting Failure Without Details" to express the position clearly with references and what limits apply to that kind of reasoning.

Isn't it just straight-up outside view prediction?

0[anonymous]14y

I thought so.

1ChrisHallquist14y

I can think of a really big example favoring model #2 within the atheist community. On the oyher hand, you and Eliezer have written so much about your views on these matters that the "detailed communication" toothpaste may not be going back in the tube. And this piece made me much more inclined to support SI, particularly the disjunctive vs. Conjunctive section which did a lot for worries raised by things Eliezer has said in the past.

0Giles14y

Is it possible that supporters might update on communicativeness, separately from updating on what you actually have to say? Generally when I see the SI talking to people, I feel the warm fuzziness before I actually read what you're saying. It just seems like people might associate "detailed engagement with supporters and critics" with the reference class of "good organizations".

0lukeprog14y

Yup, that might be true. I hope so.

0TheOtherDave14y

Presumably, even under model #1, the extent to which detailed communication increases donor expectations of my charity's ability to turn money into valuable goods depends a lot on their pre-existing expectations, the level of expectations justified by the reality, and how effective the communication is at conveying the reality.

Regarding the theft:

I was telling my friend (who recently got into HPMOR and lurks a little on LW) about Holden's critique, specifically with regard to the theft. He's an accounting and finance major, and was a bit taken aback. His immediate response was to ask if SI had an outside accountant audit their statements. We searched around and it doesn't look like to us that you do. He immediately said that he would never donate to an organization that did not have an accountant audit their statements, and knowing how much I follow LW, immediately advised me to not to either. This seems like a really good step for addressing the transparency issues here, and now that he mentions it, seems a very prudent and obvious thing for any nonprofit to do.

Edit 2: Luke asked me to clarify, I am not necessarily endorsing not donating to SI because of this, unless this problem is a concern of yours. My intent was only to suggest ways SI can improve and may be turning away potential donors.

Edit: He just mentioned to me that the big four accounting firms often do pro bono work because it can be a tax write-off. This may be worth investigating.

Also note that thefts of this size are not as rare as they appear, because many non-profits simply don't report them. I have inside knowledge about very few charities, but even I know one charity that suffered a larger theft than SI did, and they simply didn't tell anybody. They knew that donors would punish them for the theft and not at all reward them for reporting it. Unfortunately, this is probably true for SI, too, which did report the theft.

Yep. We knew that would happen at the time - it was explicitly discussed in the Board meeting - and we went ahead and reported it anyway, partly because we didn't want to have exposable secrets, partly because we felt honesty was due our donors, and partially because I'd looked up embezzlement-related stuff online and had found that a typical nonprofit-targeting embezzler goes through many nonprofits before being reported and prosecuted by a nonprofit "heroic" enough, if you'll pardon the expression, to take the embarrassment-hit in order to stop the embezzler.

2Shmi14y

I suspect that some of the hit was due to partial disclosure. Outsiders were left guessing what exactly had transpired and why, and what specific steps were taken to address the issue. Maybe you had to do it this way for legal reasons, but this was never spelled out explicitly.

3Eliezer Yudkowsky14y

Pretty sure it was spelled out explicitly.

Yes, we're currently in the process of hiring a bookkeeper (interviewed one, scheduling interviews with 2 others), which will allow us to get our books in enough order that an accountant will audit our statements. We do have an outside accountant prepare our 990s already. Anyway, this all requires donations. We can't get our books cleaned up and audited unless we have the money to do so.

Also, it's my impression that many or most charities our size and smaller don't have their books audited by an accountant because it's expensive to do so. It's largely the kind of thing a charity does when they have a bigger budget than we currently do. But I'd be curious to see if there are statistics on this somewhere; I could be wrong.

And yes, we are investigating the possibility of getting pro bono work from an accounting firm; it's somewhere around #27 on my "urgent to-do list." :)

Edit: BTW, anyone seriously concerned about this matter is welcome to earmark their donations for "CPA audit" so that those donations are only used for (1) paying a bookkeeper to clean up our processes enough so that an accountant will sign off on them, and (2) paying for a CPA audit of our books. I will personally make sure those earmarks are honored.

-1private_messaging14y

How many possible universes could here be (what % of the universes), where not donating to a charity that does not do accounting right when pulling in 500 grand a year, would result in destruction of mankind? 500 grand a year is not so little when you can get away with it. My GF's family owns a company smaller than that (in the US) and it has books in order.

0homunq14y

Yeah, that would be really unfair, wouldn't it? And so it's hard to believe it could be true. And so it must not be. (I actually don't believe it is likely to be true. But the fact it sounds silly and unfairly out-of-proportion is one of the worst possible arguments against it.)

You can't deduct the value of services donated to nonprofits. Not sure your friend is as knowledgeable as stated. Outside accounting is expensive and the IRS standard is to start doing it once your donations hit $2,000,000/year, which we haven't hit yet. Also, SIAI recently passed an IRS audit.

Fifteen seconds of Googling resulted in Deloitte's pro-bono service, which is done for CSR and employee morale rather than tax avoidance. Requests need to originate with Deloitte personnel- I know a friend who works there who might be interested in LW, but it'd be a while before I'd be comfortable asking him to recommend SI. It's a big enough company that it's likely that there are some HPMOR or LW fans that work there.

Interesting!

"Applications for a contribution of pro bono professional services must be made by Deloitte personnel. To be considered for a pro bono engagement, a nonprofit organization (NPO) with a 501c3 tax status must have an existing relationship with Deloitte through financial support, volunteerism, Deloitte personnel serving on its Board of Directors or Trustees, or a partner, principal or director (PPD) sponsor (advocate for the duration of the engagement). External applications for this program are not accepted. Organizations that do not currently have a relationship with Deloitte are welcome to introduce themselves to the Deloitte Community Involvement Leader in their region, in the long term interest of developing one."

Deloitte is requiring a very significant investment from its employees before offering pro bono services. Nonetheless, I have significant connections there and would be willing to explore this option with them.

7RobertLumley14y

You might want to pm this directly to lukeprog to make sure that he sees this comment. Since you replied to Vaniver, he may have not seen it, and this seems important enough to merit the effort.

0Cosmos14y

Thanks for the excellent idea! I did in fact email Lukeprog personally to let him know. :)

Thanks. As I said, this is something on our to-do list, but I didn't know about Deloitte in particular.

Clarifications:

In California, a non-profit is required to hire a CPA audit once donations hit $2m/yr, which SI hasn't hit yet. That's the way in which outside accounting is "IRS standard" after $2m/yr.
SI is in the process of passing an IRS audit for the year 2010.

Eliezer is right: RobertLumley's friend is mistaken:

can the value of your time and services while providing pro bono legal services qualify as a charitable contribution that is deductible from gross income on your federal tax return? Unfortunately, in a word, nope.

According to IRS Publication 526, “you cannot deduct the value of your time or services, including blood donations to the Red Cross or to blood banks, and the value of income lost while you work as an unpaid volunteer for a qualified organization.”

1somervta14y

He may be referring to the practice of being paid for work, then giving it back as a tax-deductible charitable donation. My understanding is that you can also deduct expenses you incur while working for a non-profit - admittedly not something I can see applying to accounting. There's also cause marketing, but that's getting a bit further afield.

4SarahNibs14y

In the one instance of a non-profit getting accounting work done that I know of, the non-profit paid and then received an equal donation. Magic.

5Eliezer Yudkowsky14y

This is exactly equivalent to not paying, which is precisely the IRS rationale for why donated services aren't directly deductible.

0lukeprog14y

"the big four accounting firms often do pro bono work because it can be a tax write-off" doesn't sound much like "being paid for work, then giving it back as a tax-deductible charitable donation".

2RobertLumley14y

In talking to him, I think he may have just known they do pro bono work and assumed it was because of taxes. Given Vaniver's comment, this seems pretty likely to me. He did say that the request usually has to originate from inside the company, which is consistent with that comment.

0somervta14y

Ah. That would make more sense.

Certainly the fact that some really awful charities are untruthful doesn't mean SI shouldn't be held accountable merely because it managed to tell the truth.

I think you're missing Luke's implied argument that more than 'some' charities are untruthful, but quite a lot of them are. The situation is the same as with, say, corporations getting hacked: they have no incentive to report it because only bad things will happen, and this leads to systematic underreporting, which reinforces the equilibrium as anyone reporting honestly will be seen as an outlier (as indeed they are) and punished. A vicious circle.

(Given the frequency of corporations having problems, and the lack of market discipline for nonprofits and how they depend on patrons, I could well believe that nonprofits routinely have problems with corruption, embezzlement, self-dealing, etc.)

2David_Gerard14y

Charities tend to be a trusting lot and not think of this sort of thing until it happens to them. Because they don't hear about it, for the reasons Luke sets out above. I just found out about another charity that got done in a similar manner to SIAI, though for not nearly as much money, and is presently going through the pains of disclosure.

How do I know that supporting SI doesn't end up merely funding a bunch of movement-building leading to no real progress?

It seems to me that the premise of funding SI is that people smarter (or more appropriately specialized) than you will then be able to make discoveries that otherwise would be underfunded or wrongly-purposed.

I think the (friendly or not) AI problem is hard. So it seems natural for people to settle for movement-building or other support when they get stuck.

That said, some of the collateral output to date has been enjoyable.

Behold, I come bearing real progress! :)

3Jonathan_Graehl13y

The best possible response. I haven't ready any of them yet, but the topics seem relevant to the long range goal of becoming convinced of the Friendliness of complicated programs.

For SI, movement building is directly progress more than it is for, say, Oxfam, because a big part of their mission is to try and persuade people not to do the very dangerous thing.

7Jonathan_Graehl14y

Good point. But I don't see any evidence that anyone who was likely to create an AI soon, now won't. Those whose profession and status is in approximating AI largely won't change course for what must seem to them like sci-fi tropes. [1] Or, put another way, there are working computer scientists who are religious - you can't expect reason everywhere in someone's life. [1] but in the long run, perhaps SI and others can offer a smooth transition for dangerously smart researchers into high-status alternatives such as FAI or other AI risk mitigation.

But I don't see any evidence that anyone who was likely to create an AI soon, now won't.

According to Luke, Moshe Looks (head of Google's AGI team) is now quite safety conscious, and a Singularity Institute supporter.

4lukeprog14y

Update: It's not really correct to say that Google has "an AGI team." Moshe Looks has been working on program induction, and this guy said that some people are working on AI "on a large scale," but I'm not aware of any publicly-visible Google project which has the ambitions of, say, Novamente.

4lincolnquirk14y

The plausible story in movement-building is not convincing existing AGI PIs to stop a long program of research, but instead convincing younger people who would otherwise eventually become AGI researchers to do something safer. The evidence to look for would be people who said "well, I was going to do AI research but instead I decided to get involved with SingInst type goals" -- and I suspect someone who knows the community better might be able to cite quite a few people for whom this is true, though I don't have any names myself.

3Jonathan_Graehl14y

I didn't think of that. I expect current researchers to be dead or nearly senile by the time we have plentiful human substitutes/emulations, so I shouldn't care that incumbents are unlikely to change careers (except for the left tail - I'm very vague in my expectation).

8lukeprog14y

Movement-building is progress, but... I hear ya. If I'm your audience, you're preaching to the choir. Open Problems in Friendly AI — more in line with what you'd probably call "real progress" — is something I've been lobbying for since I was hired as a researcher in September 2011, and I'm excited that Eliezer plans to begin writing it in mid-August, after SPARC. [...] Such as?

3Jonathan_Graehl14y

The philosophy and fiction have been fun (though they hardly pay my bills). I've profited from reading well-researched posts on the state of evidence-based (social-) psychology / nutrition / motivation / drugs, mostly from you, Yvain, Anna, gwern, and EY (and probably a dozen others whose names aren't available). The bias/rationality stuff was fun to think but "ugh fields", for me at least, turned out to be the only thing that mattered. I imagine that's different for other types of people, though. Additionally, the whole project seems to have connected people who didn't belong to any meaningful communities (thinking of various regional meetup clusters).

5JaneQ14y

But then SI has to have dramatically better idea what research has to be funded to protect the mankind, than every other group of people capable of either performing such research or employing people to perform such research. Muehlhauser has stated that SI should be compared to alternatives in form of the organizations working on the AI risk mitigation, but that seems like an overly narrow choice reliant on presumption that it is not an alternative to not work on AI risk mitigation now. For example, 100 years ago it would seem to have been too early to fund work on AI risk mitigation; that may still be the case; as the time gone on one could naturally expect that the opinions will form a distribution and the first organizations offering AI risk mitigation will pop up earlier than the time at which such work is effective. When we look into the past through the goggles of notoriety, we don't see all the failed early starts.

9Vladimir_Nesov14y

Disagree. There are many remaining theoretical (philosophical and mathematical) difficulties whose investigation doesn't depend on the current level of technology. It would've been better to start working on the problem 300 years ago, when AI risk was still far away. Value of information on this problem is high, and we don't (didn't) know that there is nothing to be discovered, it wouldn't be surprising if some kind of progress is made.

I do think OP is right that in practice, 100 years ago, it would have been really hard to figure out what an AI issue looked like. This was pre-Godel, pre-decision-theory, pre-Bayesian-revolution, and pre-computer. Yes, a sufficiently competent Earth would be doing AI math before it had the technology for computers, in full awareness of what it meant - but that's a pretty darned competent Earth we're talking about.

1JaneQ14y

I think it is fair to say Earth was doing the "AI math" before the computers. Extending to the today - there is a lot of mathematics to be done for a good, safe AI - but how are we to know that the SI has the actionable effort planning skills required to correctly identify and fund research in such mathematics? I know that you believe that you have the required skills; but note that in my model such belief results from both the presence of extraordinary effort planning skill, and from absence of effort planning skills. The prior probability of extraordinary effort planning skill is very low. Furthermore as the effort planning is, to some extent, a cross domain skill, the prior inefficacy (which was criticized by Holden) seem to be a fairly strong evidence against extraordinary skills in this area.

If my writings (on FAI, on decision theory, and on the form of applied-math-of-optimization called human rationality) so far haven't convinced you that I stand a sufficient chance of identifying good math problems to solve to maintain the strength of an input into existential risk, you should probably fund CFAR instead. This is not, in any way shape or form, the same skill as the ability to manage a nonprofit. I have not ever, ever claimed to be good at managing people, which is why I kept trying to have other people doing it.

7JaneQ14y

I'm not sure why you think that such writings should convince a rational person that you have the relevant skill. If you were an art critic, even a very good one, that would not convince people you are a good artist. [...] Indeed, but you are asking me to assume that the skills you display writing your articles are the same skill as the skills relevant to directing the AI effort. edit: Furthermore, when it comes to works on rationality as 'applied math of optimization', the most obvious way to classify those writings is to look for some great success attributable to your writings - some highly successful businessmen saying how much the article on such and such fallacy helped them succeed, that sort of thing.

4AlexanderD14y

It seems to me that the most obvious way to demonstrate the brilliance and excellent outcomes of the applied math of optimization would be to generate large sums of money, rather than seeking endorsements. The Singularity Institute could begin this at no cost (beyond opportunity cost of staff time) by employing the techniques of rationality in a fake market, for example, if stock opportunities were the chosen venue. After a few months of fake profits, SI could set them up with $1,000. If that kept growing, then a larger investment could be considered. This has been done, very recently. Someone on Overcoming Bias recently wrote of how they and some friends made about $500 each with a small investment by identifying an opportunity for arbitrage between the markets on InTrade and another prediction market, without any loss. Money can be made, according to proverb, by being faster, luckier, or smarter. It's impossible to create luck in the market, and in the era of microsecond purchases by Goldman Sachs it's very nearly impossible to be faster, but an organization (or perhaps associated organizations?) devoted to defeating internal biases and mathematically assessing the best choices in the world should be striving to be smarter. While it seems very interesting and worthwhile to work on existential risk from UFAI directly, it seems like the smarter thing to do might be to devote a decade to making an immense pile of money for the institute and developing the associated infrastructure (hiring money managers, socking a bunch away into Berkshire Hathaway for safety, etc.) Then hire a thousand engineers and mathematicians. And what's more, you'll raise awareness of UFAI an incredibly greater amount than you would have otherwise, plugging along as another $1-2m charity. I'm sure this must have been addressed somewhere, of course - there is simply way too much written in too many places by too many smart people. But it is odd to me that SI's page on Strategic Insight doe

1Kawoomba14y

The official introductory SI pages may have to sugarcoat such issues due to PR considerations ("everyone get rich, then donate your riches" sends off a bad vibe). As you surmised, your idea has been brought up quite often in various contexts, especially in optimal charity discussions. For many/most endeavors, the globally optimal starting steps are "acquire more capabilities / become more powerful" (players of strategy games may be more explicitly cognizant of that stratagem). I also do remember speculation that friendly AI and unfriendly AI may act very similarly at first - both choosing the optimal path to powering up, so that they can pursue the differing goals of their respective utility functions more efficiently at a future point in time. So your thoughts on the matter seem compatible with the local belief cluster. Your money proverb seems to still hold true, anecdotally I'm acquainted with some CS people making copious amounts of money on NASDAQ doing simple ANOVA analyses, while barely being able to spell the companies' names. So why aren't we doing that? Maybe a combination of mental inertia and being locked into a research/get endorsements modus operandi, which may be hard to shift out of into a more active "let's create start-ups"/"let's do day-trading" mode. A goal-function of "seek influential person X's approval" will lead to a different mind set from "let quantifiable results speak for themselves", the latter will allow you not to optimize every step of the way for signalling purposes.

6[anonymous]14y

How would you even pose the question of AI risk to someone in the eighteenth century? I'm trying to imagine what comes out the other end of Newton's chronophone, but it sounds very much like "You should think really hard about how to prevent the creation of man-made gods."

5Vladimir_Nesov14y

I don't think it's plausible that people could stumble on the problem statement 300 years ago, but within that hypothetical, it wouldn't have been too early.

2JaneQ14y

It seems to me that 100 years ago (or more) you would have to consider pretty much any philosophy and mathematics to be relevant to AI risk reduction, as well as reduction of other potential risks, and the attempts to select the work particularly conductive to the AI risk reduction would not be able to succeed. Effort planning is the key to success. On somewhat unrelated: Reading the publications and this thread, there is point of definitions that I do not understand: what exactly does S.I. mean when it speaks of "utility function" in the context of an AI? Is it a computable mathematical function over a model, such that the 'intelligence' component computes the action that results in maximum of that function taken over the world state resulting from the action?

0johnlawrenceaspden14y

Surely "Effort planning is a key to success"? Also, and not just wanting to flash academic applause lights but also genuinely curious, which mathematical successes have been due to effort planning? Even in my own mundane commercial programming experiences, the company which won the biggest was more "This is what we'd like, go away and do it and get back to us when it's done..." than "We have this Gantt chart...".

2summerstay14y

There are very few people who would have understood in the 18th century, but Leibniz would have understood in the 17th. He underestimated the difficulty in creating an AI, like everyone did before the 1970s, but he was explicitly trying to do it.

0[anonymous]14y

Your definition of "explicit" must be different from mine. Working on prototype arithmetic units and toying with the universal characteristic is AI research? He subscribed wholeheartedly to the ideographic myth; the most he would have been capable of is a machine that passes around LISP tokens. In any case, based on the Monadology, I don't believe Leibniz would consider the creation of a godlike entity to be theologically possible.

0johnlawrenceaspden14y

How about: "Eventually your machines will be so powerful they can grant wishes. But remember that they are not benevolent. What will you wish for when you can make a wish-machine?"

0A1987dM14y

Oh, wait... The tale of the Tower of Babel was told via chronophone by people from the future right before succumbing to uFAI!

0A1987dM14y

That's hindsight. Nobody could have reasonably foreseen the rise of very powerful computing machines that far ago.

1Jonathan_Graehl14y

Hilarious, and an unfairly effective argument. I'd like to know such people, who can entertain an idea that will still be tantalizing yet unresolved a century out. [...] Yes. I agree with everything else, too, with the caveat that SI is not the first organization to draw attention to AI risk) - not that you said so.

I greatly appreciate the response to my post, particularly the highly thoughtful responses of Luke (original post), Eliezer, and many commenters.

Broad response to Luke's and Eliezer's points:

As I see it, there are a few possible visions of SI's mission:

M1. SI is attempting to create a team to build a "Friendly" AGI.
M2. SI is developing "Friendliness theory," which addresses how to develop a provably safe/useful/benign utility function without needing iterative/experimental development; this theory could be integrated into an AGI developed by another team, in order to ensure that its actions are beneficial.
M3. SI is broadly committed to reducing AGI-related risks, and work on whatever will work toward that goal, including potentially M1 and M2.

My view is that the broader SI's mission, the higher the bar should be for the overall impressiveness of the organization and team. An organization with a very narrow, specific mission - such as "analyzing how to develop a provably safe/useful/benign utility function without needing iterative/experimental development" - can, relatively easily, establish which other organizations (if any) are trying to provid... (read more)

My view is that the broader SI's mission, the higher the bar should be for the overall impressiveness of the organization and team.

Can you describe a hypothetical organization and some examples of the impressive achievements it might have, which would pass the bar for handling mission M3? What is your estimate of the probability of such an organization coming into existence in the next five or ten years, if a large fraction of current SI donors were to put their money into donor-advised funds instead?

I'm very much an outsider to this discussion, and by no means a "professional researcher", but I believe those to be the primary reasons why I'm actually qualified to make the following point. I'm sure it's been made before, but a rapid scan revealed no specific statement of this argument quite as directly and explicitly.

HoldenKarnofsky: (...) my view is that SI's suggested approach to AGI development is more dangerous than the "traditional" approach to software development, and thus that SI is advocating for an approach that would worsen risks from AGI.

I've always understood SI's position on this matter not as one of "We should not focus on building Tool AI! Fully reflectively self-modifying AGIs are the only way to go!", but rather that it is extremely unlikely that we can prevent everyone else from building one.

To my understanding, logic goes: If any programmer with relevant skills is sufficiently convinced, by whatever means and for whatever causes, that building a full traditional AGI is more efficient and will more "lazily" achieve his goals with less resources or achieve them faster, the programmer will build it whether you think... (read more)

7John_Maxwell14y

This looks similar to this point Kaj Sotala made. My own restatement: As the body of narrow AI research devoted to making tools grows larger and larger, building agent AGI gets easier and easier, and there will always be a few Shane Legg types who are crazy enough to try it. I sometimes suspect that Holden's true rejection to endorsing SI is that the optimal philanthropy movement is fringe enough already, and he doesn't want to associate it with nutty-seeming beliefs related to near-inevitable doom from superintelligence. Sometimes I wish SI would market themselves as being similar to nuclear risk organizations like the Bulletin of Atomic Scientists. After all, EY was an AI researcher who quit and started working on Friendliness when he saw the risks, right? I think you could make a pretty good case for SI's usefulness just working based on analogies from nuclear risk, without any mention of FOOM or astronomical waste or paperclip maximizers. Ideally we'd have wanted to know about nuclear weapon risks before having built them, not afterwards, right?

1DaFranker14y

Personally, I highly doubt that to be Holden's true rejection, though it is most likely one of the emotional considerations that cannot be ignored in a strategic perspective. Holden claims to have gone through most of the relevant LessWrong sequence and SIAI public presentation material, which makes the likelihood of a deceptive (or self-deceptive) argumentation lower, I believe. No, what I believe to be the real issue is that Holden and (Most of SIAI) have disagreements over many specific claims used to justify broader claims - if the specific claims are granted in principle, both seem to generally agree in good bayesian fashion on the broader or more general claim. Much of the disagreements on those specifics also appears to stem from different priors in ethical and moral values, as well as differences in their evaluations and models of human population behaviors and specific (but often unspecified) "best guess" probabilities. For a generalized example, one strong claim for existential risk being optimal effort is that even a minimal decrease in risk provides immense expected value simply from the sheer magnitude of what could most likely be achieved by humanity throughout the rest of its course of existence. Many experts and scientists outright reject this on the grounds that "future, intangible, merely hypothetical other humans" should not be assigned value on the same order-of-magnitude as current humans, or even one order of magnitude lower.

1[anonymous]14y

Well, SI's mission makes sense on the premise that the best way to prevent a badly built AGI from being developed or deployed is to build a friendly AGI which has that as one of its goals. 'Best way' here is a compromise between, on the one hand, the effectiveness of the FAI relative to other approaches, and on the other, the danger presented by the FAI itself as opposed to other approaches. So I think Holden's position is that the ratio of danger vs. effectiveness does not weigh favorably for FAI as opposed to tool AI. So to argue against Holden, we would have to argue either that FAI will be less dangerous than he thinks, or that tool AI will be less effective than he thinks. I take it the latter is the more plausible.

2DaFranker14y

Indeed, we would have to argue that to argue against Holden. My initial reaction was to counter this with a claim that we should not be arguing against anyone in the first place, but rather looking for probable truth (concentrate anticipations). And then I realized how stupid that was: Arguments Are Soldiers. If SI (and by the Blue vs Green principle, any SI-supporter) can't even defend a few claims and defeat its opponents, it is obviously stupid and not worth paying attention to. SI needs some amount of support, yet support-maximization strategies carry a very high risk of introducing highly dangerous intellectual contamination through various forms (including self-reinforcing biases in the minds of researchers and future supporters) that could turn out to cause even more existential risk. Yet, at the same time, not gathering enough support quickly enough dramatically augments the risk that someone, somewhere, is going to trip on a power cable and poof, all humans are just gone. I am definitely not masterful enough in mathematics and bayescraft to calculate the optimal route through this differential probabilistic maze, but I suspect others could provide a very good estimate. Also, it's very much worth noting that these very considerations, on a meta level, are an integral part of SI's mission, so figuring out whether that premise you stated is true or not, and whether there are better solutions or not actually is SI's objective. Basically, while I might understand some of the cognitive causes for it, I am still very much rationally confused when someone questions SI's usefulness by questioning the efficiency of subgoal X, while SI's original and (to my understanding) primary mission is precisely to calculate the efficiency of subgoal X.

9lukeprog14y

Just a few thoughts for now: * I agree that some of our disagreements "come down to relatively deep worldview differences (related to the debate over 'Pascal's Mugging')." The forthcoming post on this subject by Steven Kaas may be a good place to engage further on this matter. * I retain the claim that Holden's "objection #1 punts to objection #2." For the moment, we seem to be talking past each other on this point. The reply Eliezer and I gave on Tool AI was not just that Tool AI has its own safety concerns, but also that understanding the tool AI approach and other possible approaches to the AGI safety problem are part of what an "FAI Programmer" does. We understand why people have gotten the impression that SI's FAI team is specifically about building a "self-improving CEV-maximizing agent", but that's just one approach under consideration, and figuring out which approach is best requires the kind of expertise that SI aims to host. * The evidence suggesting that rationality is a weak predictor of success comes from studies on privileged Westerners. Perhaps Holden has a different notion of what counts as a measure of rationality than the ones currently used by psychologists? * I've looked further into donor advised funds and now agree that the institutions named by Holden are unlikely to overrule their client's wishes. * I, too, would be curious to hear Holden's response to Wei Dai's question.

On the question of the impact of rationality, my guess is that:

Luke, Holden, and most psychologists agree that rationality means something roughly like the ability to make optimal decisions given evidence and goals.
The main strand of rationality research followed by both psychologists and LWers has been focused on fairly obvious cognitive biases. (For short, let's call these "cognitive biases".)
Cognitive biases cause people to make choices that are most obviously irrational, but not most importantly irrational. For example, it's very clear that spinning a wheel should not affect people's estimates of how many African countries are in the UN. But do you know anyone for whom this sort of thing is really their biggest problem?
Since cognitive biases are the primary focus of research into rationality, rationality tests mostly measure how good you are at avoiding them. These are the tests used in the studies psychologists have done on whether rationality predicts success.
LW readers tend to be fairly good at avoiding cognitive biases (and will be even better if CFAR takes off).
But there are a whole series of much more important irrationalities that LWers suffer from.

... (read more)

5lukeprog13y

For the record, I basically agree with all this.

6John_Maxwell14y

How does Givewell plan to deal with the possibility that people who come to Givewell looking for charity advice may have a variety of worldviews that impact their thinking on this?

Reason 1: Mitigating AI risk could mitigate all other existential risks, but not vice-versa. There is an asymmetry between AI risk and other existential risks. If we mitigate the risks from (say) synthetic biology and nanotechnology (without building Friendly AI), this only means we have bought a few years or decades for ourselves before we must face yet another existential risk from powerful new technologies. But if we manage AI risk well enough (i.e. if we build a Friendly AI or "FAI"), we may be able to "permanently" (for several billion years) secure a desirable future.

This equates "managing AI risk" and "building FAI" without actually making the case that these are equivalent. Many people believe that dangerous research can be banned by governments, for instance; it would be useful to actually make the case (or link to another place where it has been made) that managing AI risk is intractable without FAI.

This is one of the 10,000 things I didn't have the space to discuss in the original post, but I'm happy to briefly address it here!

It's much harder to successfully ban AI research than to successfully ban, say, nuclear weapons. Nuclear weapons require rare and expensive fissile material that requires rare heavy equipment to manufacture. Such things can be tracked to some degree. In contrast, AI research requires... um... a few computers.

Moreover, it's really hard to tell whether the code somebody is running on a computer is potentially dangerous AI stuff or something else. Even if you magically had a monitor installed on every computer to look for dangerous AI stuff, it would have to know what "dangerous AI stuff" looks like, which is hard to do before the dangerous AI stuff is built in the first place.

The monetary, military, and political incentives to build AGI are huge, and would be extremely difficult to counteract through a worldwide ban. You couldn't enforce the ban, anyway, for the reasons given above. That's why Ben Goertzel advocates "Nanny AI," though Nanny AI may be FAI-complete, as mentioned here.

I hope that helps?

8fubarobfusco14y

Yes.

0[anonymous]14y

It does help...I had the same reaction as fubarob. However, your argument assumes that our general IT capabilities have already matured to the point where AGI is possible. I agree that restricting AGI research then is likely a lost cause. Much less clear to me is whether it is equally futile to try restricting IT and computing research or even general technological progress before such a point. Could we expect to bring about global technological stasis? One may be tempted to say that such an effort is doomed to a fate like global warming accords except ten times deader. I disagree entirely. Both Europe and the United States, in fact, have in recent years been implementing a quite effective policy of zero economic growth! It is true that progress in computing has continued despite the general slowdown but this seems hardly written in stone for the future. In fact we need only consider this paper on existential risks and the "Crunches" section for several examples of how stasis might be brought about. Can anyone recommend detailed discussions of broad relinquishment, from any point of view? The closest writings I know are Bill McKibben's book Enough and Bill Joy's essay, but anything else would be great.

2lukeprog14y

I'm pretty sure we have the computing power to run at least one AGI, but I can't prove that. Still, restricting general computing progress should delay the arrival of AGI because the more hardware you have, the "dumber" you can be with solving the AGI software problem. (You can run less efficient algorithms.) Global technological stasis seems just as hard or maybe harder than restricting AGI research. The incentives for AGI are huge, but there might be some points at which you have to spend a lot of money before you get much additional economic/political/military advantage. But when it comes to general computing progress, then it looks to be always the case that a bit more investment can always yield better returns, e.g. by squeezing more processor cores into a box a little more tightly. Other difficulties of global technological stasis are discussed in the final chapter of GCR, called "The Totalitarian Threat." Basically, you'd need some kind of world government, because any country that decides to slow down its computing progress rapidly falls behind other nations. But political progress is so slow that it seems unlikely we'll get a world government in the next century unless somebody gets a decisive technological advantage via AGI, in which case we're talking about the AGI problem anyway. (The world government scenario looks like the most plausible of Bostrom's "crunches", which is why I chose to discuss it.) Relinquishment is also discussed (in very different ways) by Berglas (2009), Kaczynski (1995), and De Garis (2005).

1[anonymous]14y

I don't want to link directly to what I believe is a solution to this problem, as it was so widely misinterpreted, but I thought there was already a solution to this problem that hinged on how expensive it is to harden fabrication plants. In that case, you wouldn't even need one world government; a sufficiently large collaboration (e.g., NATO) would do the trick.

0[anonymous]14y

Quoting the B-man: A world government may not be the only form of stable social equilibrium that could permanently thwart progress. Many regions of the world today have great difficulty building institutions that can support high growth. And historically, there are many places where progress stood still or retreated for significant periods of time. Economic and technological progress may not be as inevitable as is appears to us. It seems to me like saying world government is necessary underestimates the potential impact of growth-undermining ideas. If all large governments for example buy into the idea that cutting government infrastructure spending in a recession boosts employment, then we can assume that global growth will slow as a result of the acceptance of this false assertion. To me the key seems less to have some world nabob pronouncing edicts on technology and more shifting the global economy so as to make edicts merely the gift wrap. I will definitely take a look at that chapter of GCR. Thanks also for the other links. The little paper by Berglas I found interesting. Mr. K needs no comment, naturally. Might be good reading for similar reasons that Mien Kampf is. With De Garis I have long felt like the kook-factor was too high to warrant messing with. Anyone read him much? Maybe it would be good just for some ideas.

1lukeprog14y

It certainly could be true that economic growth and technological progress can slow down. In fact, I suspect the former if not the latter will slow down, and perhaps the latter, too. That's very different from stopping technological progress that will lead to AGI, though.

1elharo13y

Not only can economic growth and technological progress slow down. They can stop and reverse. Just because we're now further out in front than humanity has ever been before in history does not mean that we can't go backwards. Economic growth is probably more likely to reverse than technological progress. That's what a depression is, after all. But a sufficiently bad global catastrophe, perhaps one that destroyed the electrical grid and other key infrastructure, could reverse a lot of technological progress too and perhaps knock us way back without necessarily causing complete extinction.

0[anonymous]14y

I think technological stasis could really use more discussion. For example I was able to find this paper by James Hughes discussing relinquishment. He however treats the issue as one of regulation and verification, similar to nuclear weapons, noting that: [...] Regulation and verification may indeed be a kind of Gordian knot. The more specific the technologies you want to stop, the harder it becomes to do that and still advance generally. Berglas recognizes that problem in his paper and so proposes restricting computing as a whole. Even this, however, may be too specific to cut the knot. The feasibility of stopping economic growth entirely so we never reach the point where regulation is necessary seems to me an unexplored question. If we look at global GDP growth over the past 50 years it's been uniformly positive except for the most recent recession. It's also been quite variable over short periods. Clearly stopping it for longer than a few years would require some new phenomenon driving a qualitative break with the past. That does not mean however that a stop is impossible. There does exist a small minority camp within the economics profession advocating no-growth policies for environmental or similar reasons. I wonder if anyone has created a roadmap for bringing such policies about on a global level.

-1gwern14y

Out of curiosity, have you read my little essay "Slowing Moore's Law"? It seems relevant.

Certainly the fact that some really awful charities are untruthful doesn't mean SI shouldn't be held accountable merely because it managed to tell the truth.

I didn't mean that SI shouldn't be held accountable for the theft. I was merely lamenting my expectation that it will probably be punished for reporting it.

2Vaniver14y

Conservation of expected evidence often has unpleasant implications.

A clarification. In Thoughts on the Singularity Institute, Holden wrote:

I will commit to is reading and carefully considering up to 50,000 words of content that are (a) specifically marked as SI-authorized responses to the points I have raised; (b) explicitly cleared for release to the general public as SI-authorized communications. In order to consider a response "SI-authorized and cleared for release," I will accept explicit communication from SI's Executive Director or from a majority of its Board of Directors endorsing the content in question.

As SI's Executive Director I am hereby marking three different things as "SI-authorized responses" to the points Holden raised: my "Reply to Holden on the Singularity Institute" (the post above), my long comment on recent organizational improvements at SI, and Eliezer's Reply to Holden on Tool AI.

According to Word Count Tool, these three things add up to a mere 13,940 words.

As SI's Executive Director I am hereby marking three different things as "SI-authorized responses" to the points Holden raised: my "Reply to Holden on the Singularity Institute" (the post above), my long comment on recent organizational improvements at SI, and Eliezer's Reply to Holden on Tool AI.

Consider removing the first sentence of the final link:

This comment is not intended to be part of the 50,000-word response which Holden invited.

8lukeprog14y

Good point. :) Fixed.

I expected more disagreement than this. Was my post really that persuasive?

I linked this to an IRC channel full of people skeptical of SI. One person commented that

the reply doesn't seem to be saying much

and another that

I think most arguments are 'yes we are bad but we will improve'
and some opinion based statement about how FAI is the most improtant thing on the world.

Which was somewhat my reaction as well - I can't put a finger on it and say exactly what it is that's wrong, but somehow it feels like this post isn't "meaty" enough to elicit much of a reaction, positive or negative. Which on the other feels odd, since e.g. the "SI's mission assumes a scenario that is far less conjunctive than it initially appears" heading makes an important point that SI hasn't really communicated well in the past. Maybe it just got buried under the other stuff, or something.

5ChrisHallquist14y

I found the "less conjunctive" section very persuasive, suspect Kaj may be right about it getting burried.

4lukeprog14y

That's an unfortunate response, given that I offered a detailed DH6-level disagreement (quote the original article directly, and refute the central points), and also offered important novel argumentation not previously published by SI. I'm not sure what else people could have wanted. If somebody figures out why Kaj and some others had the reaction they did, I'm all ears.

I can't speak for anyone else, and had been intending to sit this one out, since my reactions to this post were not really the kind of reaction you'd asked for.

But, OK, my $0.02.

The claim that an organization is exceptionally well-suited to convert money into existential risk mitigation is an extraordinary one, and extraordinary claims require extraordinary evidence. This puts a huge burden on you, as the person attempting to provide that evidence.

So, I'll ask you: do you think your response provides such evidence?

If you do, then your problem seems to be (as others have suggested) one of document organization. Perhaps starting out with an elevator-pitch answer to the question "Why should I believe that SI is capable of this extraordinary feat?" might be a good idea.

Because my take-away from reading this post was "Well, nobody else is better suited to do it, and SI does some cool movement-building stuff (the Sequences, the Rationality Camps, and HPMoR) that attracts smart people and encourages them to embrace a more rational approach to their lives, and SI is fixing some of its organizational and communication problems but we need more money to really make progress... (read more)

The claim that an organization is exceptionally well-suited to convert money into existential risk mitigation is an extraordinary one, and extraordinary claims require extraordinary evidence.

Reminder: I don't know if you were committing this particular error internally, but, at the least, the sentence is liable to cause the error externally, so: Large consequences != prior improbability. E.g. although global warming has very large consequences, and even implies that we should take large actions, it isn't improbable a priori that carbon dioxide should trap heat in the atmosphere - it's supposed to happen, according to standard physics. And so demanding strong evidence that global warming is anthropogenic is bad probability theory and decision theory. Expensive actions imply a high value of information, meaning that if we happen to have access to cheap, powerfully distinguishing evidence about global warming we should look at it; but if that evidence is not available, then we go from the default extrapolation from standard physics and make policy on that basis - not demand more powerful evidence on pain of doing nothing.

The claim that SIAI is currently best-suited to convert ... (read more)

I am coming to the conclusion that "extraordinary claims require extraordinary evidence" is just bad advice, precisely because it causes people to conflate large consequences and prior improbability. People are fond of saying it about cryonics, for example.

At least sometimes, people may say "extraordinary claims require extraordinary evidence" when they mean "your large novel claim has set off my fraud risk detector; please show me how you're not a scam."

In other words, the caution being expressed is not about prior probabilities in the natural world, but rather the intentions and morals of the claimant.

We need two new versions of the advice, to satisfy everyone.

Version for scientists: "improbable claims require extraordinary evidence".

Versions for politicians: "inconvenient claims require extraordinary evidence".

-4private_messaging14y

Well, consider strategic point of view. Suppose that a system (humans) is known for it's poor performance at evaluating the claims without performing direct experimentation. Long, long history of such failures. Consider also that a false high-impact claim can ruin ability of this system to perform it's survival function, with again a long history of such events; the damage is proportionally to the claimed impact. (Mayans are a good example, killing people so that the sun will rise tomorrow; great utilitarian rationalists they were; believing that their reasoning is perfect enough to warrant such action. Note that donating to a wrong charity instead of a right one kills people) When we anticipate that a huge percentage of the claims will be false, we can build the system to require evidence that if the claim was false the system would be in a small probability world (i.e. require that for a claim evidence was collected so that p(evidence | ~claim)/p(evidence | claim) is low), to make the system, once deployed, fall off the cliffs less often. The required strength of the evidence is then increasing with impact of the claim. It is not an ideal strategy, but it is the one that works given the limitations. There are other strategies and it is not straightforward to improve performance (and easy to degrade performance by making idealized implicit assumptions).

4TheOtherDave14y

What I meant when I described the claim (hereafter "C") that SI is better suited to convert dollars to existential risk mitigation than any other charitable organization as "extraordinary" was that priors for C are low (C is false for most organizations, and therefore likely to be false for SI absent additional evidence about SI), not that C has large consequences (although that is true as well). Yes, this might be a failing of using the wrong reference class (charitable organizations in general) to establish one's priors., as you suggest. The fact remains that when trying to solicit broad public support, or support from an organization like GiveWell, it's likely that SI will be evaluated within the reference class of other charities. If using that reference class leads to improperly low priors for C, it seems SI has a few strategic choices: 1) Convince GiveWell, and donors in general, that SI is importantly unlike other charities, and should not be evaluated as though it were like them -- in other words, win at reference class tennis. 2) Ignore donors in general and concentrate its attention primarily on potential donors who already use the correct reference class. 3) Provide enough evidence to convince even someone who starts out with improperly low priors drawn from the incorrect reference class of "SI is a charity" to update to a sufficiently high estimate of C that donating money to SI seems reasonable (in practice, I think this is what has happened and is happening with anthropogenic climate change). 4) Look for alternate sources of funding besides charitable donations. One way to approach strategy #1 is the one you use here -- shift the conversation from whether or not SI can actually spend money effectively to mitigate existential risk to whether or not uFAI/FAI by 2025 (or some other near-mode threshold) is plausible. That's not a bad tactic; it works pretty well in general.

8Eliezer Yudkowsky14y

Your statement was that it was an extraordinary claim that SIAI provided x-risk reduction - why then would SIAI be compared to most other charities, which don't provide x-risk reduction, and don't claim to provide x-risk reduction? The AI-risk item was there for comparison of standards, as was global warming; i.e., if you claim that you doubt X because of Y, but Y implies doubting Z, but you don't doubt Z, you should question whether you're really doubting X because of Y.

5TheOtherDave14y

Are you trying to argue that it isn't in fact being compared to other charities? (Specifically, by GiveWell?) Or merely that if it is, those doing such comparison are mistaken? If you're arguing the former... huh. I will admit, in that case, that almost everything I've said in this thread is irrelevant to your point, and I've completely failed to follow your argument. If that's the case, let me know and I'll back up and re-read your argument in that context. If you're arguing the latter, well, I'm happy to grant that, but I'm not sure how relevant it is to Luke's goal (which I take to be encouraging Holden to endorse SI as a charitable donation). If SI wants to argue that GiveWell's expertise with evaluating other charities isn't relevant to evaluating SI because SI ought not be compared to other charities in the first place, that's a coherent argument (though it raises the question of why GiveWell ever got involved in evaluating SI to begin with... wasn't that at SI's request? Maybe not. Or maybe it was, but SI now realizes that was a mistake. I don't know.) But as far as I can tell that's not the argument SI is making in Luke's reply to Holden. (Perhaps it ought to be? I don't know.)

I worry that this conversation is starting to turn around points of phrasing, but... I think it's worth separating the ideas that you ought to be doing x-risk reduction and that SIAI is the most efficient way to do it, which is why I myself agreed strongly with your own, original phrasing, that the key claim is providing the most efficient x-risk reduction. If someone's comparing SIAI to Rare Diseases in Cute Puppies or anything else that isn't about x-risk, I'll leave that debate to someone else - I don't think I have much comparative advantage in talking about it.

1TheOtherDave14y

I agree with you on all of those points. Further, it seems to me that Holden is implicitly comparing SI to other charitable-giving opportunities when he provides GW's evaluation of SI, rather than comparing SI to other x-risk-reduction opportunities. I tentatively infer, from the fact that you consider responding to such a comparison something you should leave to others but you're participating in a discussion of how SI ought to respond to Holden, that you don't agree that Holden is engaging in such a comparison. If you're right, then I don't know what Holden is doing, and I probably don't have a clue how Luke ought to reply to Holden.

Holden is comparing SI to other giving opportunities, not just to giving opportunities that may reduce x-risk. That's not a part of the discussion Eliezer feels he should contribute to, though. I tried to address it in the first two sections of my post above, and then in part 3 I talked about why both FHI and SI contribute unique and important value to the x-risk reduction front.

In other words: I tried to explain that for many people, x-risk is Super Duper Important, and so for those people, what matters is which charities among those reducing x-risk they should support. And then I went on to talk about SI's value for x-risk reduction in particular.

Much of the debate over x-risk as a giving opportunity in general has to do with Holden's earlier posts about expected value estimates, and SI's post on that subject (written by Steven Kaas) is still under development.

-4private_messaging14y

If by "utility function" you mean "a computable function, expressible using lambda calculus" (or Turing machine tape or python code, that's equivalent), then the arguing that majority of such functions lead to a model-based utility-based agent killing you, is a huge stretch, as such functions are not grounded and the correspondence of model with the real world is not a sub-goal to finding maximum of such function.

7lukeprog14y

SI is not exceptionally well-suited for x-risk mitigation relative to some ideal organization, but relative to the alternatives (as you said). But the reason I gave for this was not "unlike them, we're focused on the right problem", though I think that's true. Instead, the reasons I gave (twice!) were: [...] As for getting back to the original problem rather than just doing movement-building, well... that's what I've been fighting for since I first showed up at SI, via Open Problems in Friendly AI. And now it's finally happening, after SPARC. [...] Yes, this is a promising idea. It's also probably 40-100 hours of work, and there are many other urgent things for us to do as well. That's not meant as a dismissal, just as a report from the ground of "Okay, yes, everyone's got a bunch of great ideas, but where are the resources I'm supposed to use to do all those cool things? I've been working my ass off but I can't do even more stuff that people want without more resources."

2TheOtherDave14y

Absolutely. As I said in the first place, I hadn't initially intended to reply to this, as I didn't think my reactions were likely to be helpful given the situation you're in. But your followup comment seemed more broadly interested in what people might have found compelling, and less in specific actionable suggestions, than your original post. So I decided to share my thoughts on the former question. I totally agree that you might not have the wherewithal to do the things that people might find compelling, and I understand how frustrating that is. It might help emotionally to explicitly not-expect that convincing people to donate large sums of money to your organization is necessarily something that you, or anyone, are able to do with a human amount of effort. Not that this makes the problem any easier, but it might help you cope better with the frustration of being expected to put forth an amount of effort that feels unreasonably superhuman. Or it might not. [...] I'll observe that the bulk of the text you quote here is not reasons to believe SI is capable of it, but reasons to believe the task is difficult. What's potentially relevant to the former question is: [...] If that is your primary answer to "Why should I believe SI is capable of mitigating x-risk given $?", then you might want to show why the primary obstacles to mitigating x-risk are psychological/organizational issues rather than philosophical/technical ones, such that SI's competence at addressing the former set is particularly relevant. (And again, I'm not asserting that showing this is something you are able to do, or ought to be able to do. It might not be. Heck, the assertion might even be false, in which case you actively ought not be able to show it.) You might also want to make more explicit the path from "we have experience addressing these psychological/organizational issues" to "we are good at addressing these psychological/organizational issues (compared to relevant others)". Bette

1lukeprog14y

Thank you for understanding. :) My statement "SI has successfully concentrated lots of attention, donor support, and human capital [and also] has learned many lessons [and] has lots of experience with [these unusual, complicated] issues" was in support of "better to help SI grow and improve rather than start a new, similar AI risk reduction organization", not in support of "SI is capable of mitigating x-risk given money." However, if I didn't also think SI was capable of reducing x-risk given money, then I would leave SI and go do something else, and indeed will do so in the future if I come to believe that SI is no longer capable of reducing x-risk given money. How to Purchase AI Risk Reduction is a list of things that (1) SI is currently doing to reduce AI risk, or that (2) SI could do almost immediately (to reduce AI risk) if it had sufficient funding.

1TheOtherDave14y

Ah, OK. I misunderstood that; thanks for the clarification. For what it's worth, I think the case for "support SI >> start a new organization on a similar model" is pretty compelling. And, yes, the "How to Purchase AI Risk Reduction" series is an excellent step in the direction of making SI's current and planned activities, and how they relate to your mission, more concrete and transparent. Yay you!

2Mass_Driver14y

I strongly agree with this comment, and also have a response to Eliezer's response to it. While I share TheOtherDave's views, as TheOtherDave noted, he doesn't necessarily share mine! It's not the large consequences that make it a priori unlikely that an organization is really good at mitigating existential risks -- it's the objectively small probabilities and lack of opportunity to learn by trial and error. If your goal is to prevent heart attacks in chronically obese, elderly people, then you're dealing with reasonably large probabilities. For example, the AHA estimates that a 60-year-old, 5'8" man weighing 220 pounds has a 10% chance of having a heart attack in the next 10 years. You can fiddle with their calculator here. This is convenient, because you can learn by trial or error whether your strategies are succeeding. If only 5% of a group of the elderly obese under your treatment have heart attacks over the next 10 years, then you're probably doing a good job. If 12% have heart attacks, you should probably try another tactic. These are realistic swings to expect from an effective treatment -- it might really be possible to cut the rate of heart attacks in half among a particular population.This study, for example, reports a 25% relative risk reduction. If an organization claims to be doing really well at preventing heart attacks, it's a credible signal -- if they weren't doing well, someone could check their results and prove it, which would be embarrassing for the organization. So, that kind of claim only needs a little bit of evidence to support it. On the other hand, any given existential risk has a small chance of happening, a smaller chance of being mitigated, and, by definition, little or no opportunity to learn by trial and error. For example, the odds of an artificial intelligence explosion in the next 10 years might be 1%. A team of genius mathematicians funded with $5 million over the next 10 years might be able to reduce that risk to 0.8%. Howeve

2TheOtherDave14y

I should say, incidentally (since this was framed as agreement to my comment) that Mass_Driver's point is rather different from mine.

One sad answer is that your post is boring, which is another way of saying it doesn't have enough Dark Arts to be sufficiently persuasive.

There are many ways to infect a population with a belief; presenting evidence for its accuracy is among the least effective

-Sister Y

It didn't have the same cohesiveness as Holden's original post; there were many more dangling threads, to borrow the same metaphor I used to say why his post was so interesting. You wrote it as a technical, thoroughly cited response and literature review instead of a heartfelt, wholly self-contained Mission Statement, and you made it very clear of that by stating at least 10 times that there was much more info 'somewhere else' (in conversations, in people's heads, yet to be written, etc.).

He wrote an intriguing short story, you wrote a dry paper.

Edit: Also, the answer to every question seems to be, "That will be in Eliezer's next Sequence," which postpones further debate.

3Jack14y

I doubt random skeptics on the internet followed links to papers. Their thoughts are unlikely to be diagnostic. The group of people who disagree with you and will earnestly go through all the arguments is small. Also, explanations of the form "Yes this was a problem but we're going to fix it." are usually just read as rationalizations. It sounds a bit like "Please, sir, give me another chance. I know I can do better" or "I'm sorry I cheated on you. It will never happen again". The problems actually have to be fixed before the argument is rebutted. It will go better when you can say things like "We haven't had any problems of this kind in 5 years".

-8private_messaging14y

The purpose of an FAI team is not to blindly develop one particular approach to Friendly AI without checking to see whether this work will be obsoleted by future developments. Instead, the purpose of an FAI team is to develop highly specialized expertise on, among other things, which kinds of research are more and less likely to be relevant given future developments.

This is unsettling. It sounds a lot like trying to avoid saying anything specific.

Eliezer will have lots of specific things to say in his forthcoming "Open Problems in Friendly AI" sequence (I know; I've seen the outline). In any case, wouldn't it be a lot more unsettling if, at this early stage, we pretended we knew enough to commit entirely to one very particular approach?

It's unsettling that this is still an early stage. SI has been around for over a decade. I'm looking forward to the open problems sequence; perhaps I should shut up about the lack of explanation of SI's research for now, considering that the sequence seems like a credible promise to remedy this.

When making the case for SI's comparative advantage, you point to these things:

... [A]nd the ability to do unusual things that are nevertheless quite effective at finding/creating lots of new people interested in rationality and existential risk reduction: (1) The Sequences, the best tool I know for creating aspiring rationalists, (2) Harry Potter and the Methods of Rationality, a surprisingly successful tool for grabbing the attention of mathematicians and computer scientists around the world...

What evidence supports these claims?

1Thrasymachus14y

I appreciate you folks are busy, but I'm going to bump as it has been more than a week. Besides, it strikes me as an important question given the prominence of these things to the claim that SI can buy x-risk reduction more effectively than other orgs.

1endoself14y

You can PM Luke if you want. It's the "Send message" button next to the username on the user page.

0Thrasymachus14y

I'm bumping this again because there's been no response to this question (three weeks since asking), and I poked Luke via PM a week ago. Given this is the main plank supporting SI's claim that it is a good way of spending money, I think this question should be answered. (especially compare to Holden's post)

I'm really glad you pointed out that SI's strategy is not predicated on hard take-off. I don't recall if this has been discussed elsewhere, but that's something that always bothered me since I think hard take-off is relatively unlikely. (Admittedly, soft take-off still considerably diminishes my expected impact for SI and donating to it.)

0Bruno_Coelho14y

For some time I think EY support hard takeoff -- the bunch of guys in the garage argument --, but if luke say now it's not so, then ok.

If I earmark my donations for "HPMOR Finale or CPA Audit whichever comes first" would that act as positive or negative pressure towards Eliezer's fiction creation complex? (I only ask because bugging him for an update has been previously suggested to reduce update speed)

Furthermore. Oracle AI/Nanny AI seem to both fail the heuristic of "other country is about to beat us in a war, should we remove the safety programming" that I use quite often with nearly everyone I debate AI about from outside the LW community. Thank you both for writing such concise yet detailed responses that helped me understand the problem areas of Tool AI better.

If I earmark my donations for "HPMOR Finale or CPA Audit whichever comes first" would that act as positive or negative pressure towards Eliezer's fiction creation complex?

I think the issue is that we need a successful SPARC and an "Open Problems in Friendly AI" sequence more urgently than we need an HPMOR finale.

"Open Problems in Friendly AI" sequence

an HPMOR finale

A sudden, confusing vision just occurred, of the two being somehow combined. Aaagh.

2Shmi14y

Spoiler: Voldemort is a uFAI.

8arundelo14y

For the record: [...] (And later in the thread, when asked about "so far": "And I have no intention at this time to do it later, but don't want to make it a blanket prohibition.")

2NancyLebovitz14y

In the earlier chapters, it seemed to me that the Hogwarts facility dealing with Harry was something like being faced with an AI of uncertain Friendliness. Correction: It was more like the faculty dealing with an AI that's trying to get itself out of its box.

0MatthewBaker14y

I think our values our positively maximized by delaying the HPMOR finale as long as possible, my post was more out of curiosity to see what would be most helpful to Eliezer.

In general - never earmark donations. It's a stupendous pain in the arse to deal with. If you trust an organisation enough to donate to them, trust them enough to use the money for whatever they see a need for. Contrapositive: If you don't trust them enough to use the money for whatever they see a need for, don't donate to them.

4MatthewBaker14y

I never have before but this CPA Audit seemed like a logical thing that would encourage my wealthy parents to donate :)

The discussion of how conjunctive SIAI's vision is seems unclear to me. Luke appears to have responded to only part of what I think Holden is likely to have meant.

Some assumptions whose conjunctions seem important to me (in order of decreasing importance):

1) The extent to which AGI will consist of one entity taking over the world versus many diverse entities with limited ability to dominate the others.

2) The size of the team required to build the first AGI (if it requires thousands of people, a nonprofit is unlikely to acquire the necessary resources; if i... (read more)

After being initially impressed by this, I found one thing to pick at:

Reason 1: Mitigating AI risk could mitigate all other existential risks, but not vice-versa.

"Could" here tells you very little. The question isn't whether "build FAI" could work as a strategy for mitigating all other existential risks, it's whether that strategy has a good enough chance of working to be superior to other strategies for mitigating the other risks. What's missing is an argument for saying "yes" to that second question.

our new donate page

This is off-topic, but I'm curious: What were you and Louie working on in that photo on the donate page?

Why, we were busy working on a photo for the donate page! :)

Hopefully that photo is a more helpful illustration of the problems we work on than a photo of our normal work, which looks like a bunch of humans hunched over laptops, reading and typing.

7komponisto14y

Definite articles missing in a number of places on that page (and others at the site).

3Spurlock14y

Just for the sake of feedback, that photo immediately made me laugh. It just seemed so obviously staged. I agree that it's better than "hunched over laptops" though.

0Paul Crowley14y

I have posed for a similar photo myself. Happily a colleague had had genuine cause to draw a large, confusing looking diagram not long beforehand, so we could all stand around it pointing at bits and looking thoughtful...

0Benquo14y

Same here.

2beoShaffer14y

It could just be me but it somehow seems wrong that Peter Theil is paired with the google option rather than pay-pal.

You mention "computing overhang" as a threat essentially akin to hard takeoff. But regarding the value of FAI knowledge, it does not seem similar to me at all. A hard-takeoff AI can, at least in principal, be free from darwinian pressure. A "computing overhang" explosion of many small AIs will tend to be diverse and thus subject to strong evolutionary pressures of all kinds[1]. Presuming that FAI-ness is more-or-less delicate[1.5], those pressures are likely to destroy it as AIs multiply across available computing power (or, if we're ex... (read more)

1nshepperd14y

One way for the world to quickly go from one single AI to millions of AIs is for the first AGI to deliberately copy itself, or arrange for itself to be copied many times, in order to take advantage of the world's computing power. In this scenario, assuming the AI takes the first halfway-intelligent security measure of checksumming all its copies to prevent corruption, the vast majority of the copies will have exactly the same code. Hence, to begin with, there's no real variation for natural selection to work on. Secondly, unless the AI was programmed to have some kind of "selfish" goal system, the resulting copies will all also have the same utility function, so they'll want to cooperate, not compete (which is, after all, the reason an AI would want to copy itself. No point doing it if your copies are going to be your enemies). Of course, a more intelligent first AGI would—rather than creating copies—modify itself to run on a distributed architecture allowing the one AI to take advantage of all the available computing power without all the inefficiency of message passing between independent copies. In this situation there would still seem to be huge advantages to making the first AGI Friendly, since if it's at all competent, almost all its children ought to be Friendly too, and they can consequently use their combined computing power to weed out the defective copies. In some respects it's rather like an intelligence explosion, but using extra computing power rather than code modification to increase its speed and intelligence. I suppose one possible alternative is if the AGI isn't smart enough to figure all this out by itself, and so the main method of copying is, to begin with, random humans downloading the FAI source code from, say, wikileaks. If humans are foolish, which they are, some of them will alter the code and run the modified programs, introducing the variation needed for evolution into the system.

0homunq14y

The whole assumption that prompted this scenario is that there's no hard takeoff, so the first agi is probably around human-level in insight and ingenuity, though plausibly much faster. It seems likely that in these circumstances, human actions would still be significant. If it starts aggressively taking over computing resources, humanity will react, and unless the original programmers were unable to prevent v1.0 from being skynet-level unfriendly, at least some humans will escalate as far as necessary to get "their" computers under their control. At that point, it would be trivially easy to start up a mutated version; perhaps even one designed for better friendliness. But once mutations happen, evolution takes over. Oh, and by the way, checksums may not work to safeguard friendliness for v1.0. For instance, most humans seem pretty friendly, but the wrong upbringinging could turn them bad. Tl;dr: no-mutations is an inherently more-conjunctive scenario than mutations.

Thanks for posting this!

I am also grateful to Holden for provoking this - as far as I can tell, the only substantial public speech from SIAI on LessWrong. SIAI often seems to be far more concerned with internal projects than communicating with its supporters, such as most of us on LessWrong.

2lukeprog14y

Also see How to Purchase AI Risk Reduction, So You Want to Save the World, AI Risk & Opportunity: A Strategic Analysis...

2Johnicholas14y

Those are interesting reviews but I didn't know they were speeches in SIAI's voice.

What if the smartest, most careful, most insanely safety-conscious AI researchers humanity can produce just aren't smart enough to solve the problem?

This is very worrying, especially in light of the lack of a public research agenda. SI's inability to describe its research agenda suggests the possibility that they cannot describe their research agenda because they do not know what they are doing because FAI is such a ridiculously hard problem that they have no idea where to begin. I'm hoping that SI will soon be able to make it clear that this is not the... (read more)

9lukeprog14y

Yeah, this is the point of Eliezer's forthcoming 'Open Problems in Friendly AI' sequence, which I personally wish he had written in 2009 after his original set of sequences.

0ChrisHallquist14y

I find your points abput altruism unpersuasive, because humans are very good at convincing themselves that whatever's best for them, individually, is right or at least permissible. Even if they don't explicitly program it to care about only their CEV, they might work out the part of the program that's supposed to handle friendliness in a way subtly biased towards themselves.

Lately I've been wondering whether it would make more sense to simply try to prevent the development of AGI rather than work to make it "friendly," at least for the foreseeable future. My thought is that AGI carries substantial existential risks, developing other innovations first might reduce those risks. and anything we can do to bring about such reductions is worth even enormous costs. In other words, if it takes ten thousand years to develop social or other innovations that would reduce the risk of terminal catastrophe by even 1% when AGI is ... (read more)

4Vaniver14y

That sounds like a goal, rather than a sequence of actions.

0[anonymous]14y

Sorry, I don't understand your point.

2Vaniver14y

Consider an alternative situation: "simply try to prevent your teenage daughter from having sex." Well, actually achieving that goal takes more than just trying, and effective plans (which don't cause massive collateral damage) are rarely simple.

0[anonymous]14y

But even averting massive collateral damage could be less important than mitigating existential risk. I think my above comment applies here.

4Vaniver14y

It could be less important! The challenge is navigating value disagreements. Some people are willing to wait a century to make sure the future happens correctly, and others discuss how roughly 2 people die every second, which might stop once we reach the future, and others would comment that, if we delay for a century, we will be condemning them to death since we will ruin their chance of reaching the deathless future. Even among those who only care about existential risk, there are tradeoffs between different varieties of existential risk- it may be that by slowing down technological growth, we decrease our AGI risk but increase our asteroid risk.

0[anonymous]14y

Value disagreements are no doubt important. It depends on the discount rate. However, Bostrom has said that the biggest existential risks right now stem from human technology, so I think asteroid risk is not such a huge factor for the next century. If we expand that to the next ten thousand years then one might have to do some calculations. If we assume a zero discount rate then the primary consideration becomes whether or not we can expect to have any impact on existential risk from AGI by putting it off. If we can lower the AGI-related existential risk by even 1% then it makes sense to delay AGI for even huge timespans assuming other risks are not increased too much. It therefore becomes very important to answer the question of whether such delays would in fact reduce AGI-related risk. Obviously it depends on the reasons for the delay. If the reason for the delay is a nuclear war that nearly annihilates humanity but we are lucky enough to slowly crawl back from the brink, I don't see any obvious reason why AGI-related risk would be reduced at all. But if the reason for the delay includes some conscious effort to focus first on SIRCS then some risk reduction seems likely.

0fubarobfusco14y

Would you mind switching to an example that doesn't assume so much about your audience?

4Vaniver14y

If you can come up with a good one, I'll switch. I'm having trouble finding something where the risk of collateral damage is obvious (and obviously undesirable) and there are other agents with incentives to undermine the goal.

3fubarobfusco14y

Sorry — your response indicates exactly in which way I should have been more clear. Using "teenage daughter having sex" to stand for something "obviously undesirable" assumes a lot about your audience. For one, it assumes that your audience does not contain any sexually-active teenage women; nor any sex-positive parents of teenage women; nor any sex-positive sex-educators or queer activists; nor anyone who has had positive (and thus not "obviously undesirable") experiences as (or with) a sexually active teenage woman. To any of the above folks, "teenage daughter having sex" communicates something not undesirable at all (assuming the sex is wanted, of course). Going by cultural tropes, your choice of example gives the impression that your audience is made of middle-aged, middle-class, straight, socially conservative men — or at least, people who take the views of that sort of person to be normal, everyday, and unmarked. On LW, a lot of your audience doesn't fit those assumptions: 25% of us are under 21; 17% of us are non-heterosexual; 38% of us grew up with non-theistic family values; and between 13% and 40% of us are non-monogamous, according to the 2011 survey for instance). To be clear, I'm not concerned that you're offending or hurting anyone with your example. Rather, if you're trying to make a point to a general audience, you might consider drawing on examples that don't assume so much. As for alternatives: "Simply try to prevent your house from being robbed" perhaps? I suspect that a very small fraction of LWers are burglars or promoters of burglary.

I don't have the goal of preventing my teenage daughter from having sex (firstly because I have no daughter yet, and secondly because the kind of people who would have such a goal often have a similar goal about younger sisters, and I don't -- indeed, I sometimes introduce single males to her); but I had no problem with pretending I had that goal for the sake of argument. Hell, even if Vaniver had said "simply try to cause more paperclips to exist" I would have pretended I had that goal.

BTW, I don't think that is the real reason why people flinch at such examples. If Vaniver had said “try to win your next motorcycle race” -- a goal that probably even fewer people share -- would anyone have objected?

7GLaDOS14y

I agree. I find it annoying when people pretend otherwise.

Small correction: The term "obviously undesirable" referred to the potential collateral damage from trying to prevent the daughter from having sex, not to her having sex.

2fubarobfusco14y

Oh. Well, that does make a little more sense.

5Vaniver14y

I understand your perspective, and that's a large part of why I like it as an example. Is AGI something that's "obviously undesirable"?

3wedrifid14y

Burglary is an integral part of my family heritage. That's how we earned our passage to Australia. Specifically, burgaling some items a copper kettle, getting a death sentence and having it commuted to life in the prison continent. With those kind of circumstances in mind I say burglary is ethically acceptable when, say, your family is starving but usually far too risky to be practical or advisable.

3Kaj_Sotala14y

Here's one such argument, which I find quite persuasive. Also, look at how little success the environmentalists have had with trying to restrict carbon emissions, or how the US government eventually gave up its attempts to restrict cryptography: [...]

-1[anonymous]14y

Anyone know of anything more on deliberate relinquishment? I have seen some serious discussion by Bill McKibben in his book Enough but that's about it. In the linked post on the government controlling AGI development, the arguments say that it's hard to narrowly tailor the development of specific technologies. Information technology was advancing rapidly and cryptography proved impossible to control. The government putting specific restrictions on "soft AI" amid otherwise advancing IT similarly seems far-fetched. But there are other routes. Instead we could enact policies that would deliberately slow growth in broad sectors like IT, biotechnology, and anything leading to self-replicating nanotechnology. Or maybe slow economic growth entirely and have the government direct resources at SIRCS. One can hardly argue that it is impossible to slow or even stop economic growth. We are in the middle of a worldwide economic slowdown as we type. The United States has seen little growth for at least the past ten years. I think broad relinquishment certainly cannot be dismissed without extensive discussion and to me it seems the natural way to deal with existential risk.

0Kaj_Sotala14y

Yes, but most governments are doing their best to undo that slowdown: you'd need immense political power in order to make them encourage it.

-1[anonymous]14y

Given some of today's policy debates you might need less power than one might think. I think many governments, Europe being a clear case, are not doing their best to undo the slowdown. Rather, they are publicly proclaiming to be doing their best while actually setting very far from optimal policies. In a democracy you must always wear at least a cloak of serving the perceived public interest but that does not necessarily mean that you truly work in that perceived interest. So when your Global Stasis Party wins 1% of the vote, you do not have 1% of people trying to bring about stasis and 99% trying to increase growth. Instead, already 50% of politicians may publicly proclaim to want increased growth but actually pursue growth reducing policies, and your 1% breaks the logjam and creates a 51% majority against growth. This assumes that you understand which parties are actually for and against growth, that is, you are wise enough to see through people's facades. I wonder how today's policymakers would react to challengers seriously favoring no-growth economics. Would this have the effect of shifting the Overton Window? This position is so radically different from anything I've heard of that perhaps a small dose would have outsized effects.

1Kaj_Sotala14y

You're right about that. And there is already the degrowth movement, plus lately I've been hearing even some less radical politicians talking about scaling down economic growth (due to it not increasing well-being in the developed countries anymore). So perhaps something could in fact be done about that.

0[anonymous]14y

And of course there is Bill Joy's essay. I forgot about that. But seems like small potatoes.

2Strange714y

Please, oh please, think about this for five minutes. Coordination cannot happen without communication, and global communication depends very much on technology.

2wedrifid14y

Not technically true. True enough for humans though.

0[anonymous]14y

Well I agree that it is not as obvious as I made out. However, for this purpose it suffices to note that these innovations/social features could be greatly furthered without more technological advances.

0TheOtherDave14y

Do you see any reason to believe this argument wasn't equally sound (albeit with different scary technologies) thirty years ago, or a hundred?

0[anonymous]14y

Thirty years ago it may have still been valid although difficult to make since nobody knew about the risks of AGI or self-replicating assemblers. A hundred years ago it would not have been valid in this form since we lacked surveillance and space exploration technologies. Keep in mind that we have a certain bias on this question since we happen to have survived up until this point in history but there is no guarantee of that in the future.

SI and rationality

Paraphrasing:

Holden expects us to have epistemic and instrumental powers of rationality that would make us successful in Western society, however this is a strawman. Being rational isn't succeeding in society, but succeeding at your own goals.

(Btw, I'm going to coin a new term for this: the straw-morra [a reference to the main character from Limitless]).

Now that being said, you shouldn't anticipate that the members of SI would be morra-like.

There's a problem with this: arguments made to support an individual are not nearly as c... (read more)

2lukeprog14y

I reject the paraphrase, and the test you link to involved a lot more than the CRT.

0siodine14y

Why? Direct quotes: [...] That is synonymous with success in Western society. His definition of superior general rationality or insight (read: instrumental and epistemic rationality) fits with my paraphrase of that direct quote. [...] You think his definition is wrong. [...] I.e., we shouldn't necessarily expect rational people to be successful. The only problem I see with my paraphrase is in explaining why some people aren't successful given that they're rational (per your definition), which is by having atypical goals. Well, that should make sense if they're instrumentally rational (of course, this discounts luck. but i don't think luck is an overriding factor on average, here.) [...] This isn't useful information unless you also link to the other tests and show why they're meaningful after training to do well on them. I would take it out of your argument, as is. (Also, it's a spider web of links -- which I've read before).

2lukeprog14y

Your paraphrase of me was: [...] But I didn't think that what Holden got wrong was a confusion between one's own goals and "success in Western society" goals. Many of SI's own goals include "success in Western society" goals like lots of accumulated wealth and power. Instead, what I thought Holden got wrong was his estimate of the relation between rationality and success. Re: the testing. LWers hadn't trained specifically for the battery of tests given them that day, but they outperformed every other group I know of who has taken those tests. I agree that these data aren't as useful as the data CFAR is collecting now about the impact of rationality training on measures of life success, but they are suggestive enough to support a weak, qualified claim like the one I made, that "it seems" like LWers are more rational than the general population.

It occurs to me that Holden's actual reasoning (never mind what he said) is perhaps not about rationality per se and instead may be along these lines: "Since SI staff haven't already accumulated wealth and power, they probably suffer from something like insufficient work ethic or high akrasia or not-having-inherited-billions, and thus will probably be ineffective at achieving the kind of extremely-ambitious goals they have set for themselves."

2lmm13y

It may or may not be Holden's, but I think you've put your finger on my real reasons for not wanting to donate to SI. I'd be interested to hear any counterpoint.

1siodine14y

Right, then I (correctly, I think) took your reasoning a step farther than you did. The SI's goals don't necessarily correspond with its members' goals. SIers may be there because they want to be around a lot of cool people, and may not have any particular desire for being successful (I suspect many of them do). But this discounts luck, like luck in being born conscientiousness -- the power to accomplish your goals. And like I said, poor luck like that is unconvincing when applied to a group of people. [...] When I say "it seems", being an unknown here, people will likely take me to be reporting an anecdote. When you, the executive director of SI and a researcher on this topic, says "it seems" I think people will take it as a weak impression of the available research. Scientists adept at communicating with journalists get around this by saying "I speculate" instead.

71

Reply to Holden on The Singularity Institute

71

Contents

Comments

Why many people care greatly about existential risk reduction

AI risk: the most important existential risk

SI can purchase several kinds of AI risk reduction more efficiently than others can

My replies to Holden, point by point

GiveWell Labs

Three possible outcomes

SI's mission is more important than SI as an organization

SI's arguments need to be clearer

Holden's objection #1 punts to objection #2

Tool AI

SI's mission assumes a scenario that is far less conjunctive than it initially appears.

SI's public argumentation

SI's endorsements

SI and feedback loops

SI and rationality

SI's goals and activities

Theft

Pascal's Mugging

Summary of my reply to Holden

Conclusion

71

71