This post presents thoughts on the Singularity Institute from Holden Karnofsky, Co-Executive Director of GiveWell. Note: Luke Muehlhauser, the Executive Director of the Singularity Institute, reviewed a draft of this post, and commented: "I do generally agree that your complaints are either correct (especially re: past organizational competence) or incorrect but not addressed by SI in clear argumentative writing (this includes the part on 'tool' AI). I am working to address both categories of issues." I take Luke's comment to be a significant mark in SI's favor, because it indicates an explicit recognition of the problems I raise, and thus increases my estimate of the likelihood that SI will work to address them.

September 2012 update: responses have been posted by Luke and Eliezer (and I have responded in the comments of their posts). I have also added acknowledgements.

The Singularity Institute (SI) is a charity that GiveWell has been repeatedly asked to evaluate. In the past, SI has been outside our scope (as we were focused on specific areas such as international aid). With GiveWell Labs we are open to any giving opportunity, no matter what form and what sector, but we still do not currently plan to recommend SI; given the amount of interest some of our audience has expressed, I feel it is important to explain why. Our views, of course, remain open to change. (Note: I am posting this only to Less Wrong, not to the GiveWell Blog, because I believe that everyone who would be interested in this post will see it here.)

I am currently the GiveWell staff member who has put the most time and effort into engaging with and evaluating SI. Other GiveWell staff currently agree with my bottom-line view that we should not recommend SI, but this does not mean they have engaged with each of my specific arguments. Therefore, while the lack of recommendation of SI is something that GiveWell stands behind, the specific arguments in this post should be attributed only to me, not to GiveWell.

Summary of my views

  • The argument advanced by SI for why the work it's doing is beneficial and important seems both wrong and poorly argued to me. My sense at the moment is that the arguments SI is making would, if accepted, increase rather than decrease the risk of an AI-related catastrophe. More
  • SI has, or has had, multiple properties that I associate with ineffective organizations, and I do not see any specific evidence that its personnel/organization are well-suited to the tasks it has set for itself. More
  • A common argument for giving to SI is that "even an infinitesimal chance that it is right" would be sufficient given the stakes. I have written previously about why I reject this reasoning; in addition, prominent SI representatives seem to reject this particular argument as well (i.e., they believe that one should support SI only if one believes it is a strong organization making strong arguments). More
  • My sense is that at this point, given SI's current financial state, withholding funds from SI is likely better for its mission than donating to it. (I would not take this view to the furthest extreme; the argument that SI should have some funding seems stronger to me than the argument that it should have as much as it currently has.)
  • I find existential risk reduction to be a fairly promising area for philanthropy, and plan to investigate it further. More
  • There are many things that could happen that would cause me to revise my view on SI. However, I do not plan to respond to all comment responses to this post. (Given the volume of responses we may receive, I may not be able to even read all the comments on this post.) I do not believe these two statements are inconsistent, and I lay out paths for getting me to change my mind that are likely to work better than posting comments. (Of course I encourage people to post comments; I'm just noting in advance that this action, alone, doesn't guarantee that I will consider your argument.) More

Intent of this post

I did not write this post with the purpose of "hurting" SI. Rather, I wrote it in the hopes that one of these three things (or some combination) will happen:

  1. New arguments are raised that cause me to change my mind and recognize SI as an outstanding giving opportunity. If this happens I will likely attempt to raise more money for SI (most likely by discussing it with other GiveWell staff and collectively considering a GiveWell Labs recommendation).
  2. SI concedes that my objections are valid and increases its determination to address them. A few years from now, SI is a better organization and more effective in its mission.
  3. SI can't or won't make changes, and SI's supporters feel my objections are valid, so SI loses some support, freeing up resources for other approaches to doing good.

Which one of these occurs will hopefully be driven primarily by the merits of the different arguments raised. Because of this, I think that whatever happens as a result of my post will be positive for SI's mission, whether or not it is positive for SI as an organization. I believe that most of SI's supporters and advocates care more about the former than about the latter, and that this attitude is far too rare in the nonprofit world.

Does SI have a well-argued case that its work is beneficial and important?

I know no more concise summary of SI's views than this page, so here I give my own impressions of what SI believes, in italics.

  1. There is some chance that in the near future (next 20-100 years), an "artificial general intelligence" (AGI) - a computer that is vastly more intelligent than humans in every relevant way - will be created.
  2. This AGI will likely have a utility function and will seek to maximize utility according to this function.
  3. This AGI will be so much more powerful than humans - due to its superior intelligence - that it will be able to reshape the world to maximize its utility, and humans will not be able to stop it from doing so.
  4. Therefore, it is crucial that its utility function be one that is reasonably harmonious with what humans want. A "Friendly" utility function is one that is reasonably harmonious with what humans want, such that a "Friendly" AGI (FAI) would change the world for the better (by human standards) while an "Unfriendly" AGI (UFAI) would essentially wipe out humanity (or worse).
  5. Unless great care is taken specifically to make a utility function "Friendly," it will be "Unfriendly," since the things humans value are a tiny subset of the things that are possible.
  6. Therefore, it is crucially important to develop "Friendliness theory" that helps us to ensure that the first strong AGI's utility function will be "Friendly." The developer of Friendliness theory could use it to build an FAI directly or could disseminate the theory so that others working on AGI are more likely to build FAI as opposed to UFAI.

From the time I first heard this argument, it has seemed to me to be skipping important steps and making major unjustified assumptions. However, for a long time I believed this could easily be due to my inferior understanding of the relevant issues. I believed my own views on the argument to have only very low relevance (as I stated in my 2011 interview with SI representatives). Over time, I have had many discussions with SI supporters and advocates, as well as with non-supporters who I believe understand the relevant issues well. I now believe - for the moment - that my objections are highly relevant, that they cannot be dismissed as simple "layman's misunderstandings" (as they have been by various SI supporters in the past), and that SI has not published anything that addresses them in a clear way.

Below, I list my major objections. I do not believe that these objections constitute a sharp/tight case for the idea that SI's work has low/negative value; I believe, instead, that SI's own arguments are too vague for such a rebuttal to be possible. There are many possible responses to my objections, but SI's public arguments (and the private arguments) do not make clear which possible response (if any) SI would choose to take up and defend. Hopefully the dialogue following this post will clarify what SI believes and why.

Some of my views are discussed at greater length (though with less clarity) in a public transcript of a conversation I had with SI supporter Jaan Tallinn. I refer to this transcript as "Karnofsky/Tallinn 2011."

Objection 1: it seems to me that any AGI that was set to maximize a "Friendly" utility function would be extraordinarily dangerous.

Suppose, for the sake of argument, that SI manages to create what it believes to be an FAI. Suppose that it is successful in the "AGI" part of its goal, i.e., it has successfully created an intelligence vastly superior to human intelligence and extraordinarily powerful from our perspective. Suppose that it has also done its best on the "Friendly" part of the goal: it has developed a formal argument for why its AGI's utility function will be Friendly, it believes this argument to be airtight, and it has had this argument checked over by 100 of the world's most intelligent and relevantly experienced people. Suppose that SI now activates its AGI, unleashing it to reshape the world as it sees fit. What will be the outcome?

I believe that the probability of an unfavorable outcome - by which I mean an outcome essentially equivalent to what a UFAI would bring about - exceeds 90% in such a scenario. I believe the goal of designing a "Friendly" utility function is likely to be beyond the abilities even of the best team of humans willing to design such a function. I do not have a tight argument for why I believe this, but a comment on LessWrong by Wei Dai gives a good illustration of the kind of thoughts I have on the matter:

What I'm afraid of is that a design will be shown to be safe, and then it turns out that the proof is wrong, or the formalization of the notion of "safety" used by the proof is wrong. This kind of thing happens a lot in cryptography, if you replace "safety" with "security". These mistakes are still occurring today, even after decades of research into how to do such proofs and what the relevant formalizations are. From where I'm sitting, proving an AGI design Friendly seems even more difficult and error-prone than proving a crypto scheme secure, probably by a large margin, and there is no decades of time to refine the proof techniques and formalizations. There's good recent review of the history of provable security, titled Provable Security in the Real World, which might help you understand where I'm coming from.

I think this comment understates the risks, however. For example, when the comment says "the formalization of the notion of 'safety' used by the proof is wrong," it is not clear whether it means that the values the programmers have in mind are not correctly implemented by the formalization, or whether it means they are correctly implemented but are themselves catastrophic in a way that hasn't been anticipated. I would be highly concerned about both. There are other catastrophic possibilities as well; perhaps the utility function itself is well-specified and safe, but the AGI's model of the world is flawed (in particular, perhaps its prior or its process for matching observations to predictions are flawed) in a way that doesn't emerge until the AGI has made substantial changes to its environment.

By SI's own arguments, even a small error in any of these things would likely lead to catastrophe. And there are likely failure forms I haven't thought of. The overriding intuition here is that complex plans usually fail when unaccompanied by feedback loops. A scenario in which a set of people is ready to unleash an all-powerful being to maximize some parameter in the world, based solely on their initial confidence in their own extrapolations of the consequences of doing so, seems like a scenario that is overwhelmingly likely to result in a bad outcome. It comes down to placing the world's largest bet on a highly complex theory - with no experimentation to test the theory first.

So far, all I have argued is that the development of "Friendliness" theory can achieve at best only a limited reduction in the probability of an unfavorable outcome. However, as I argue in the next section, I believe there is at least one concept - the "tool-agent" distinction - that has more potential to reduce risks, and that SI appears to ignore this concept entirely. I believe that tools are safer than agents (even agents that make use of the best "Friendliness" theory that can reasonably be hoped for) and that SI encourages a focus on building agents, thus increasing risk.

Objection 2: SI appears to neglect the potentially important distinction between "tool" and "agent" AI.

Google Maps is a type of artificial intelligence (AI). It is far more intelligent than I am when it comes to planning routes.

Google Maps - by which I mean the complete software package including the display of the map itself - does not have a "utility" that it seeks to maximize. (One could fit a utility function to its actions, as to any set of actions, but there is no single "parameter to be maximized" driving its operations.)

Google Maps (as I understand it) considers multiple possible routes, gives each a score based on factors such as distance and likely traffic, and then displays the best-scoring route in a way that makes it easily understood by the user. If I don't like the route, for whatever reason, I can change some parameters and consider a different route. If I like the route, I can print it out or email it to a friend or send it to my phone's navigation application. Google Maps has no single parameter it is trying to maximize; it has no reason to try to "trick" me in order to increase its utility.

In short, Google Maps is not an agent, taking actions in order to maximize a utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish.

Every software application I know of seems to work essentially the same way, including those that involve (specialized) artificial intelligence such as Google Search, Siri, Watson, Rybka, etc. Some can be put into an "agent mode" (as Watson was on Jeopardy!) but all can easily be set up to be used as "tools" (for example, Watson can simply display its top candidate answers to a question, with the score for each, without speaking any of them.)

The "tool mode" concept is importantly different from the possibility of Oracle AI sometimes discussed by SI. The discussions I've seen of Oracle AI present it as an Unfriendly AI that is "trapped in a box" - an AI whose intelligence is driven by an explicit utility function and that humans hope to control coercively. Hence the discussion of ideas such as the AI-Box Experiment. A different interpretation, given in Karnofsky/Tallinn 2011, is an AI with a carefully designed utility function - likely as difficult to construct as "Friendliness" - that leaves it "wishing" to answer questions helpfully. By contrast with both these ideas, Tool-AGI is not "trapped" and it is not Unfriendly or Friendly; it has no motivations and no driving utility function of any kind, just like Google Maps. It scores different possibilities and displays its conclusions in a transparent and user-friendly manner, as its instructions say to do; it does not have an overarching "want," and so, as with the specialized AIs described above, while it may sometimes "misinterpret" a question (thereby scoring options poorly and ranking the wrong one #1) there is no reason to expect intentional trickery or manipulation when it comes to displaying its results.

Another way of putting this is that a "tool" has an underlying instruction set that conceptually looks like: "(1) Calculate which action A would maximize parameter P, based on existing data set D. (2) Summarize this calculation in a user-friendly manner, including what Action A is, what likely intermediate outcomes it would cause, what other actions would result in high values of P, etc." An "agent," by contrast, has an underlying instruction set that conceptually looks like: "(1) Calculate which action, A, would maximize parameter P, based on existing data set D. (2) Execute Action A." In any AI where (1) is separable (by the programmers) as a distinct step, (2) can be set to the "tool" version rather than the "agent" version, and this separability is in fact present with most/all modern software. Note that in the "tool" version, neither step (1) nor step (2) (nor the combination) constitutes an instruction to maximize a parameter - to describe a program of this kind as "wanting" something is a category error, and there is no reason to expect its step (2) to be deceptive.

I elaborated further on the distinction and on the concept of a tool-AI in Karnofsky/Tallinn 2011.

This is important because an AGI running in tool mode could be extraordinarily useful but far more safe than an AGI running in agent mode. In fact, if developing "Friendly AI" is what we seek, a tool-AGI could likely be helpful enough in thinking through this problem as to render any previous work on "Friendliness theory" moot. Among other things, a tool-AGI would allow transparent views into the AGI's reasoning and predictions without any reason to fear being purposefully misled, and would facilitate safe experimental testing of any utility function that one wished to eventually plug into an "agent."

Is a tool-AGI possible? I believe that it is, and furthermore that it ought to be our default picture of how AGI will work, given that practically all software developed to date can (and usually does) run as a tool and given that modern software seems to be constantly becoming "intelligent" (capable of giving better answers than a human) in surprising new domains. In addition, it intuitively seems to me (though I am not highly confident) that intelligence inherently involves the distinct, separable steps of (a) considering multiple possible actions and (b) assigning a score to each, prior to executing any of the possible actions. If one can distinctly separate (a) and (b) in a program's code, then one can abstain from writing any "execution" instructions and instead focus on making the program list actions and scores in a user-friendly manner, for humans to consider and use as they wish.

Of course, there are possible paths to AGI that may rule out a "tool mode," but it seems that most of these paths would rule out the application of "Friendliness theory" as well. (For example, a "black box" emulation and augmentation of a human mind.) What are the paths to AGI that allow manual, transparent, intentional design of a utility function but do not allow the replacement of "execution" instructions with "communication" instructions? Most of the conversations I've had on this topic have focused on three responses:

  • Self-improving AI. Many seem to find it intuitive that (a) AGI will almost certainly come from an AI rewriting its own source code, and (b) such a process would inevitably lead to an "agent." I do not agree with either (a) or (b). I discussed these issues in Karnofsky/Tallinn 2011 and will be happy to discuss them more if this is the line of response that SI ends up pursuing. Very briefly:
    • The idea of a "self-improving algorithm" intuitively sounds very powerful, but does not seem to have led to many "explosions" in software so far (and it seems to be a concept that could apply to narrow AI as well as to AGI).
    • It seems to me that a tool-AGI could be plugged into a self-improvement process that would be quite powerful but would also terminate and yield a new tool-AI after a set number of iterations (or after reaching a set "intelligence threshold"). So I do not accept the argument that "self-improving AGI means agent AGI." As stated above, I will elaborate on this view if it turns out to be an important point of disagreement.
    • I have argued (in Karnofsky/Tallinn 2011) that the relevant self-improvement abilities are likely to come with or after - not prior to - the development of strong AGI. In other words, any software capable of the relevant kind of self-improvement is likely also capable of being used as a strong tool-AGI, with the benefits described above.
    • The SI-related discussions I've seen of "self-improving AI" are highly vague, and do not spell out views on the above points.
  • Dangerous data collection. Some point to the seeming dangers of a tool-AI's "scoring" function: in order to score different options it may have to collect data, which is itself an "agent" type action that could lead to dangerous actions. I think my definition of "tool" above makes clear what is wrong with this objection: a tool-AGI takes its existing data set D as fixed (and perhaps could have some pre-determined, safe set of simple actions it can take - such as using Google's API - to collect more), and if maximizing its chosen parameter is best accomplished through more data collection, it can transparently output why and how it suggests collecting more data. Over time it can be given more autonomy for data collection through an experimental and domain-specific process (e.g., modifying the AI to skip specific steps of human review of proposals for data collection after it has become clear that these steps work as intended), a process that has little to do with the "Friendly overarching utility function" concept promoted by SI. Again, I will elaborate on this if it turns out to be a key point.
  • Race for power. Some have argued to me that humans are likely to choose to create agent-AGI, in order to quickly gain power and outrace other teams working on AGI. But this argument, even if accepted, has very different implications from SI's view.

    Conventional wisdom says it is extremely dangerous to empower a computer to act in the world until one is very sure that the computer will do its job in a way that is helpful rather than harmful. So if a programmer chooses to "unleash an AGI as an agent" with the hope of gaining power, it seems that this programmer will be deliberately ignoring conventional wisdom about what is safe in favor of shortsighted greed. I do not see why such a programmer would be expected to make use of any "Friendliness theory" that might be available. (Attempting to incorporate such theory would almost certainly slow the project down greatly, and thus would bring the same problems as the more general "have caution, do testing" counseled by conventional wisdom.) It seems that the appropriate measures for preventing such a risk are security measures aiming to stop humans from launching unsafe agent-AIs, rather than developing theories or raising awareness of "Friendliness."

One of the things that bothers me most about SI is that there is practically no public content, as far as I can tell, explicitly addressing the idea of a "tool" and giving arguments for why AGI is likely to work only as an "agent." The idea that AGI will be driven by a central utility function seems to be simply assumed. Two examples:

  • I have been referred to Muehlhauser and Salamon 2012 as the most up-to-date, clear explanation of SI's position on "the basics." This paper states, "Perhaps we could build an AI of limited cognitive ability — say, a machine that only answers questions: an 'Oracle AI.' But this approach is not without its own dangers (Armstrong, Sandberg, and Bostrom 2012)." However, the referenced paper (Armstrong, Sandberg and Bostrom 2012) seems to take it as a given that an Oracle AI is an "agent trapped in a box" - a computer that has a basic drive/utility function, not a Tool-AGI. The rest of Muehlhauser and Salamon 2012 seems to take it as a given that an AGI will be an agent.
  • I have often been referred to Omohundro 2008 for an argument that an AGI is likely to have certain goals. But this paper seems, again, to take it as given that an AGI will be an agent, i.e., that it will have goals at all. The introduction states, "To say that a system of any design is an 'artificial intelligence', we mean that it has goals which it tries to accomplish by acting in the world." In other words, the premise I'm disputing seems embedded in its very definition of AI.

The closest thing I have seen to a public discussion of "tool-AGI" is in Dreams of Friendliness, where Eliezer Yudkowsky considers the question, "Why not just have the AI answer questions, instead of trying to do anything? Then it wouldn't need to be Friendly. It wouldn't need any goals at all. It would just answer questions." His response:

To which the reply is that the AI needs goals in order to decide how to think: that is, the AI has to act as a powerful optimization process in order to plan its acquisition of knowledge, effectively distill sensory information, pluck "answers" to particular questions out of the space of all possible responses, and of course, to improve its own source code up to the level where the AI is a powerful intelligence. All these events are "improbable" relative to random organizations of the AI's RAM, so the AI has to hit a narrow target in the space of possibilities to make superintelligent answers come out.

This passage appears vague and does not appear to address the specific "tool" concept I have defended above (in particular, it does not address the analogy to modern software, which challenges the idea that "powerful optimization processes" cannot run in tool mode). The rest of the piece discusses (a) psychological mistakes that could lead to the discussion in question; (b) the "Oracle AI" concept that I have outlined above. The comments contain some more discussion of the "tool" idea (Denis Bider and Shane Legg seem to be picturing something similar to "tool-AGI") but the discussion is unresolved and I believe the "tool" concept defended above remains essentially unaddressed.

In sum, SI appears to encourage a focus on building and launching "Friendly" agents (it is seeking to do so itself, and its work on "Friendliness" theory seems to be laying the groundwork for others to do so) while not addressing the tool-agent distinction. It seems to assume that any AGI will have to be an agent, and to make little to no attempt at justifying this assumption. The result, in my view, is that it is essentially advocating for a more dangerous approach to AI than the traditional approach to software development.

Objection 3: SI's envisioned scenario is far more specific and conjunctive than it appears at first glance, and I believe this scenario to be highly unlikely.

SI's scenario concerns the development of artificial general intelligence (AGI): a computer that is vastly more intelligent than humans in every relevant way. But we already have many computers that are vastly more intelligent than humans in some relevant ways, and the domains in which specialized AIs outdo humans seem to be constantly and continuously expanding. I feel that the relevance of "Friendliness theory" depends heavily on the idea of a "discrete jump" that seems unlikely and whose likelihood does not seem to have been publicly argued for.

One possible scenario is that at some point, we develop powerful enough non-AGI tools (particularly specialized AIs) that we vastly improve our abilities to consider and prepare for the eventuality of AGI - to the point where any previous theory developed on the subject becomes useless. Or (to put this more generally) non-AGI tools simply change the world so much that it becomes essentially unrecognizable from the perspective of today - again rendering any previous "Friendliness theory" moot. As I said in Karnofsky/Tallinn 2011, some of SI's work "seems a bit like trying to design Facebook before the Internet was in use, or even before the computer existed."

Perhaps there will be a discrete jump to AGI, but it will be a sort of AGI that renders "Friendliness theory" moot for a different reason. For example, in the practice of software development, there often does not seem to be an operational distinction between "intelligent" and "Friendly." (For example, my impression is that the only method programmers had for evaluating Watson's "intelligence" was to see whether it was coming up with the same answers that a well-informed human would; the only way to evaluate Siri's "intelligence" was to evaluate its helpfulness to humans.) "Intelligent" often ends up getting defined as "prone to take actions that seem all-around 'good' to the programmer." So the concept of "Friendliness" may end up being naturally and subtly baked in to a successful AGI effort.

The bottom line is that we know very little about the course of future artificial intelligence. I believe that the probability that SI's concept of "Friendly" vs. "Unfriendly" goals ends up seeming essentially nonsensical, irrelevant and/or unimportant from the standpoint of the relevant future is over 90%.

Other objections to SI's views

There are other debates about the likelihood of SI's work being relevant/helpful; for example,

  • It isn't clear whether the development of AGI is imminent enough to be relevant, or whether other risks to humanity are closer.
  • It isn't clear whether AGI would be as powerful as SI's views imply. (I discussed this briefly in Karnofsky/Tallinn 2011.)
  • It isn't clear whether even an extremely powerful UFAI would choose to attack humans as opposed to negotiating with them. (I find it somewhat helpful to analogize UFAI-human interactions to human-mosquito interactions. Humans are enormously more intelligent than mosquitoes; humans are good at predicting, manipulating, and destroying mosquitoes; humans do not value mosquitoes' welfare; humans have other goals that mosquitoes interfere with; humans would like to see mosquitoes eradicated at least from certain parts of the planet. Yet humans haven't accomplished such eradication, and it is easy to imagine scenarios in which humans would prefer honest negotiation and trade with mosquitoes to any other arrangement, if such negotiation and trade were possible.)

Unlike the three objections I focus on, these other issues have been discussed a fair amount, and if these other issues were the only objections to SI's arguments I would find SI's case to be strong (i.e., I would find its scenario likely enough to warrant investment in).


  • I believe the most likely future scenarios are the ones we haven't thought of, and that the most likely fate of the sort of theory SI ends up developing is irrelevance.
  • I believe that unleashing an all-powerful "agent AGI" (without the benefit of experimentation) would very likely result in a UFAI-like outcome, no matter how carefully the "agent AGI" was designed to be "Friendly." I see SI as encouraging (and aiming to take) this approach.
  • I believe that the standard approach to developing software results in "tools," not "agents," and that tools (while dangerous) are much safer than agents. A "tool mode" could facilitate experiment-informed progress toward a safe "agent," rather than needing to get "Friendliness" theory right without any experimentation.
  • Therefore, I believe that the approach SI advocates and aims to prepare for is far more dangerous than the standard approach, so if SI's work on Friendliness theory affects the risk of human extinction one way or the other, it will increase the risk of human extinction. Fortunately I believe SI's work is far more likely to have no effect one way or the other.

For a long time I refrained from engaging in object-level debates over SI's work, believing that others are better qualified to do so. But after talking at great length to many of SI's supporters and advocates and reading everything I've been pointed to as relevant, I still have seen no clear and compelling response to any of my three major objections. As stated above, there are many possible responses to my objections, but SI's current arguments do not seem clear on what responses they wish to take and defend. At this point I am unlikely to form a positive view of SI's work until and unless I do see such responses, and/or SI changes its positions.

Is SI the kind of organization we want to bet on?

This part of the post has some risks. For most of GiveWell's history, sticking to our standard criteria - and putting more energy into recommended than non-recommended organizations - has enabled us to share our honest thoughts about charities without appearing to get personal. But when evaluating a group such as SI, I can't avoid placing a heavy weight on (my read on) the general competence, capability and "intangibles" of the people and organization, because SI's mission is not about repeating activities that have worked in the past. Sharing my views on these issues could strike some as personal or mean-spirited and could lead to the misimpression that GiveWell is hostile toward SI. But it is simply necessary in order to be fully transparent about why I hold the views that I hold.

Fortunately, SI is an ideal organization for our first discussion of this type. I believe the staff and supporters of SI would overwhelmingly rather hear the whole truth about my thoughts - so that they can directly engage them and, if warranted, make changes - than have me sugar-coat what I think in order to spare their feelings. People who know me and my attitude toward being honest vs. sparing feelings know that this, itself, is high praise for SI.

One more comment before I continue: our policy is that non-public information provided to us by a charity will not be published or discussed without that charity's prior consent. However, none of the content of this post is based on private information; all of it is based on information that SI has made available to the public.

There are several reasons that I currently have a negative impression of SI's general competence, capability and "intangibles." My mind remains open and I include specifics on how it could be changed.

  • Weak arguments. SI has produced enormous quantities of public argumentation, and I have examined a very large proportion of this information. Yet I have never seen a clear response to any of the three basic objections I listed in the previous section. One of SI's major goals is to raise awareness of AI-related risks; given this, the fact that it has not advanced clear/concise/compelling arguments speaks, in my view, to its general competence.
  • Lack of impressive endorsements. I discussed this issue in my 2011 interview with SI representatives and I still feel the same way on the matter. I feel that given the enormous implications of SI's claims, if it argued them well it ought to be able to get more impressive endorsements than it has.

    I have been pointed to Peter Thiel and Ray Kurzweil as examples of impressive SI supporters, but I have not seen any on-record statements from either of these people that show agreement with SI's specific views, and in fact (based on watching them speak at Singularity Summits) my impression is that they disagree. Peter Thiel seems to believe that speeding the pace of general innovation is a good thing; this would seem to be in tension with SI's view that AGI will be catastrophic by default and that no one other than SI is paying sufficient attention to "Friendliness" issues. Ray Kurzweil seems to believe that "safety" is a matter of transparency, strong institutions, etc. rather than of "Friendliness." I am personally in agreement with the things I have seen both of them say on these topics. I find it possible that they support SI because of the Singularity Summit or to increase general interest in ambitious technology, rather than because they find "Friendliness theory" to be as important as SI does.

    Clear, on-record statements from these two supporters, specifically endorsing SI's arguments and the importance of developing Friendliness theory, would shift my views somewhat on this point.

  • Resistance to feedback loops. I discussed this issue in my 2011 interview with SI representatives and I still feel the same way on the matter. SI seems to have passed up opportunities to test itself and its own rationality by e.g. aiming for objectively impressive accomplishments. This is a problem because of (a) its extremely ambitious goals (among other things, it seeks to develop artificial intelligence and "Friendliness theory" before anyone else can develop artificial intelligence); (b) its view of its staff/supporters as having unusual insight into rationality, which I discuss in a later bullet point.

    SI's list of achievements is not, in my view, up to where it needs to be given (a) and (b). Yet I have seen no declaration that SI has fallen short to date and explanation of what will be changed to deal with it. SI's recent release of a strategic plan and monthly updates are improvements from a transparency perspective, but they still leave me feeling as though there are no clear metrics or goals by which SI is committing to be measured (aside from very basic organizational goals such as "design a new website" and very vague goals such as "publish more papers") and as though SI places a low priority on engaging people who are critical of its views (or at least not yet on board), as opposed to people who are naturally drawn to it.

    I believe that one of the primary obstacles to being impactful as a nonprofit is the lack of the sort of helpful feedback loops that lead to success in other domains. I like to see groups that are making as much effort as they can to create meaningful feedback loops for themselves. I perceive SI as falling well short on this front. Pursuing more impressive endorsements and developing benign but objectively recognizable innovations (particularly commercially viable ones) are two possible ways to impose more demanding feedback loops. (I discussed both of these in my interview linked above).

  • Apparent poorly grounded belief in SI's superior general rationality. Many of the things that SI and its supporters and advocates say imply a belief that they have special insights into the nature of general rationality, and/or have superior general rationality, relative to the rest of the population. (Examples here, here and here). My understanding is that SI is in the process of spinning off a group dedicated to training people on how to have higher general rationality.

    Yet I'm not aware of any of what I consider compelling evidence that SI staff/supporters/advocates have any special insight into the nature of general rationality or that they have especially high general rationality.

    I have been pointed to the Sequences on this point. The Sequences (which I have read the vast majority of) do not seem to me to be a demonstration or evidence of general rationality. They are about rationality; I find them very enjoyable to read; and there is very little they say that I disagree with (or would have disagreed with before I read them). However, they do not seem to demonstrate rationality on the part of the writer, any more than a series of enjoyable, not-obviously-inaccurate essays on the qualities of a good basketball player would demonstrate basketball prowess. I sometimes get the impression that fans of the Sequences are willing to ascribe superior rationality to the writer simply because the content seems smart and insightful to them, without making a critical effort to determine the extent to which the content is novel, actionable and important. 

    I endorse Eliezer Yudkowsky's statement, "Be careful … any time you find yourself defining the [rationalist] as someone other than the agent who is currently smiling from on top of a giant heap of utility." To me, the best evidence of superior general rationality (or of insight into it) would be objectively impressive achievements (successful commercial ventures, highly prestigious awards, clear innovations, etc.) and/or accumulation of wealth and power. As mentioned above, SI staff/supporters/advocates do not seem particularly impressive on these fronts, at least not as much as I would expect for people who have the sort of insight into rationality that makes it sensible for them to train others in it. I am open to other evidence that SI staff/supporters/advocates have superior general rationality, but I have not seen it.

    Why is it a problem if SI staff/supporter/advocates believe themselves, without good evidence, to have superior general rationality? First off, it strikes me as a belief based on wishful thinking rather than rational inference. Secondly, I would expect a series of problems to accompany overconfidence in one's general rationality, and several of these problems seem to be actually occurring in SI's case:

    • Insufficient self-skepticism given how strong its claims are and how little support its claims have won. Rather than endorsing "Others have not accepted our arguments, so we will sharpen and/or reexamine our arguments," SI seems often to endorse something more like "Others have not accepted their arguments because they have inferior general rationality," a stance less likely to lead to improvement on SI's part.
    • Being too selective (in terms of looking for people who share its preconceptions) when determining whom to hire and whose feedback to take seriously.
    • Paying insufficient attention to the limitations of the confidence one can have in one's untested theories, in line with my Objection 1.
  • Overall disconnect between SI's goals and its activities. SI seeks to build FAI and/or to develop and promote "Friendliness theory" that can be useful to others in building FAI. Yet it seems that most of its time goes to activities other than developing AI or theory. Its per-person output in terms of publications seems low. Its core staff seem more focused on Less Wrong posts, "rationality training" and other activities that don't seem connected to the core goals; Eliezer Yudkowsky, in particular, appears (from the strategic plan) to be focused on writing books for popular consumption. These activities seem neither to be advancing the state of FAI-related theory nor to be engaging the sort of people most likely to be crucial for building AGI.

    A possible justification for these activities is that SI is seeking to promote greater general rationality, which over time will lead to more and better support for its mission. But if this is SI's core activity, it becomes even more important to test the hypothesis that SI's views are in fact rooted in superior general rationality - and these tests don't seem to be happening, as discussed above.

  • Theft. I am bothered by the 2009 theft of $118,803.00 (as against a $541,080.00 budget for the year). In an organization as small as SI, it really seems as though theft that large relative to the budget shouldn't occur and that it represents a major failure of hiring and/or internal controls.

    In addition, I have seen no public SI-authorized discussion of the matter that I consider to be satisfactory in terms of explaining what happened and what the current status of the case is on an ongoing basis. Some details may have to be omitted, but a clear SI-authorized statement on this point with as much information as can reasonably provided would be helpful.

A couple positive observations to add context here:

  • I see significant positive qualities in many of the people associated with SI. I especially like what I perceive as their sincere wish to do whatever they can to help the world as much as possible, and the high value they place on being right as opposed to being conventional or polite. I have not interacted with Eliezer Yudkowsky but I greatly enjoy his writings.
  • I'm aware that SI has relatively new leadership that is attempting to address the issues behind some of my complaints. I have a generally positive impression of the new leadership; I believe the Executive Director and Development Director, in particular, to represent a step forward in terms of being interested in transparency and in testing their own general rationality. So I will not be surprised if there is some improvement in the coming years, particularly regarding the last couple of statements listed above. That said, SI is an organization and it seems reasonable to judge it by its organizational track record, especially when its new leadership is so new that I have little basis on which to judge these staff.


While SI has produced a lot of content that I find interesting and enjoyable, it has not produced what I consider evidence of superior general rationality or of its suitability for the tasks it has set for itself. I see no qualifications or achievements that specifically seem to indicate that SI staff are well-suited to the challenge of understanding the key AI-related issues and/or coordinating the construction of an FAI. And I see specific reasons to be pessimistic about its suitability and general competence.

When estimating the expected value of an endeavor, it is natural to have an implicit "survivorship bias" - to use organizations whose accomplishments one is familiar with (which tend to be relatively effective organizations) as a reference class. Because of this, I would be extremely wary of investing in an organization with apparently poor general competence/suitability to its tasks, even if I bought fully into its mission (which I do not) and saw no other groups working on a comparable mission.

But if there's even a chance …

A common argument that SI supporters raise with me is along the lines of, "Even if SI's arguments are weak and its staff isn't as capable as one would like to see, their goal is so important that they would be a good investment even at a tiny probability of success."

I believe this argument to be a form of Pascal's Mugging and I have outlined the reasons I believe it to be invalid in two posts (here and here). There have been some objections to my arguments, but I still believe them to be valid. There is a good chance I will revisit these topics in the future, because I believe these issues to be at the core of many of the differences between GiveWell-top-charities supporters and SI supporters.

Regardless of whether one accepts my specific arguments, it is worth noting that the most prominent people associated with SI tend to agree with the conclusion that the "But if there's even a chance …" argument is not valid. (See comments on my post from Michael Vassar and Eliezer Yudkowsky as well as Eliezer's interview with John Baez.)

Existential risk reduction as a cause

I consider the general cause of "looking for ways that philanthropic dollars can reduce direct threats of global catastrophic risks, particularly those that involve some risk of human extinction" to be a relatively high-potential cause. It is on the working agenda for GiveWell Labs and we will be writing more about it.

However, I don't think that "Cause X is the one I care about and Organization Y is the only one working on it" to be a good reason to support Organization Y. For donors determined to donate within this cause, I encourage you to consider donating to a donor-advised fund while making it clear that you intend to grant out the funds to existential-risk-reduction-related organizations in the future. (One way to accomplish this would be to create a fund with "existential risk" in the name; this is a fairly easy thing to do and one person could do it on behalf of multiple donors.)

For one who accepts my arguments about SI, I believe withholding funds in this way is likely to be better for SI's mission than donating to SI - through incentive effects alone (not to mention my specific argument that SI's approach to "Friendliness" seems likely to increase risks).

How I might change my views

My views are very open to revision.

However, I cannot realistically commit to read and seriously consider all comments posted on the matter. The number of people capable of taking a few minutes to write a comment is sufficient to swamp my capacity. I do encourage people to comment and I do intend to read at least some comments, but if you are looking to change my views, you should not consider posting a comment to be the most promising route.

Instead, what I will commit to is reading and carefully considering up to 50,000 words of content that are (a) specifically marked as SI-authorized responses to the points I have raised; (b) explicitly cleared for release to the general public as SI-authorized communications. In order to consider a response "SI-authorized and cleared for release," I will accept explicit communication from SI's Executive Director or from a majority of its Board of Directors endorsing the content in question. After 50,000 words, I may change my views and/or commit to reading more content, or (if I determine that the content is poor and is not using my time efficiently) I may decide not to engage further. SI-authorized content may improve or worsen SI's standing in my estimation, so unlike with comments, there is an incentive to select content that uses my time efficiently. Of course, SI-authorized content may end up including excerpts from comment responses to this post, and/or already-existing public content.

I may also change my views for other reasons, particularly if SI secures more impressive achievements and/or endorsements.

One more note: I believe I have read the vast majority of the Sequences, including the AI-foom debate, and that this content - while interesting and enjoyable - does not have much relevance for the arguments I've made.

Again: I think that whatever happens as a result of my post will be positive for SI's mission, whether or not it is positive for SI as an organization. I believe that most of SI's supporters and advocates care more about the former than about the latter, and that this attitude is far too rare in the nonprofit world.


Thanks to the following people for reviewing a draft of this post and providing thoughtful feedback (this of course does not mean they agree with the post or are responsible for its content): Dario Amodei, Nick Beckstead, Elie Hassenfeld, Alexander Kruel, Tim Ogden, John Salvatier, Jonah Sinick, Cari Tuna, Stephanie Wykstra.


New Comment
Rendering 1000/1274 comments, sorted by (show more) Click to highlight new comments since: Today at 6:59 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Update: My full response to Holden is now here.

As Holden said, I generally think that Holden's objections for SI "are either correct (especially re: past organizational competence) or incorrect but not addressed by SI in clear argumentative writing (this includes the part on 'tool' AI)," and we are working hard to fix both categories of issues.

In this comment I would merely like to argue for one small point: that the Singularity Institute is undergoing comprehensive changes — changes which I believe to be improvements that will help us to achieve our mission more efficiently and effectively.

Holden wrote:

I'm aware that SI has relatively new leadership that is attempting to address the issues behind some of my complaints. I have a generally positive impression of the new leadership; I believe the Executive Director and Development Director, in particular, to represent a step forward in terms of being interested in transparency and in testing their own general rationality. So I will not be surprised if there is some improvement in the coming years...

Louie Helm was hired as Director of Development in September 2011. I was hired as a Research Fellow that same month, and ma... (read more)

...which is not to say, of course, that things were not improving before September 2011. It's just that the improvements have accelerated quite a bit since then.

For example, Amy was hired in December 2009 and is largely responsible for these improvements:

  • Built a "real" Board and officers; launched monthly Board meetings in February 2010.
  • Began compiling monthly financial reports in December 2010.
  • Began tracking Summit expenses and seeking Summit sponsors.
  • Played a major role in canceling many programs and expenses that were deemed low ROI.

Our bank accounts have been consolidated, with 3-4 people regularly checking over them.

In addition to reviews, should SI implement a two-man rule for manipulating large quantities of money? (For example, over 5k, over 10k, etc.)

9Eliezer Yudkowsky11y
And note that these improvements would not and could not have happened without more funding than the level of previous years - if, say, everyone had been waiting to see these kinds of improvements before funding.

note that these improvements would not and could not have happened without more funding than the level of previous years

Really? That's not obvious to me. Of course you've been around for all this and I haven't, but here's what I'm seeing from my vantage point...

Recent changes that cost very little:

  • Donor database
  • Strategic plan
  • Monthly progress reports
  • A list of research problems SI is working on (it took me 16 hours to write)
  •,, AI Risk Bibliography 2012, annotated list of journals that may publish papers on AI risk, a partial history of AI risk research, and a list of forthcoming and desired articles on AI risk (each of these took me only 10-25 hours to create)
  • Detailed tracking of the expenses for major SI projects
  • Staff worklogs
  • Staff dinners (or something that brought staff together)
  • A few people keeping their eyes on SI's funds so theft would be caught sooner
  • Optimization of Google Adwords

Stuff that costs less than some other things SI had spent money on, such as funding Ben Goertzel's AGI research or renting downtown Berkeley apartments for the later visiting fellows:

  • Research papers
... (read more)

A lot of charities go through this pattern before they finally work out how to transition from a board-run/individual-run tax-deductible band of conspirators to being a professional staff-run organisation tuned to doing the particular thing they do. The changes required seem simple and obvious in hindsight, but it's a common pattern for it to take years, so SIAI has been quite normal, or at the very least not been unusually dumb.

(My evidence is seeing this pattern close-up in the Wikimedia Foundation, Wikimedia UK (the first attempt at which died before managing it, the second making it through barely) and the West Australian Music Industry Association, and anecdotal evidence from others. Everyone involved always feels stupid at having taken years to achieve the retrospectively obvious. I would be surprised if this aspect of the dynamics of nonprofits had not been studied.)

edit: Luke's recommendation of The Nonprofit Kit For Dummies looks like precisely the book all the examples I know of needed to have someone throw at them before they even thought of forming an organisation to do whatever it is they wanted to achieve.

Things that cost money:

  • Amy Willey
  • Luke Muehlhauser
  • Louie Helm
  • CfAR
  • trying things until something worked

I don't think this response supports your claim that these improvements "would not and could not have happened without more funding than the level of previous years."

I know your comment is very brief because you're busy at minicamp, but I'll reply to what you wrote, anyway: Someone of decent rationality doesn't just "try things until something works." Moreover, many of the things on the list of recent improvements don't require an Amy, a Luke, or a Louie.

I don't even have past management experience. As you may recall, I had significant ambiguity aversion about the prospect of being made Executive Director, but as it turned out, the solution to almost every problem X has been (1) read what the experts say about how to solve X, (2) consult with people who care about your mission and have solved X before, and (3) do what they say.

When I was made Executive Director and phoned our Advisors, most of them said "Oh, how nice to hear from you! Nobody from SingInst has ever asked me for advice before!"

That is the kind of thing that makes me want to say that SingInst has "tested every method except the method of trying."

Donor database, strategic plan, s... (read more)

Luke has just told me (personal conversation) that what he got from my comment was, "SIAI's difficulties were just due to lack of funding" which was not what I was trying to say at all. What I was trying to convey was more like, "I didn't have the ability to run this organization, and knew this - people who I hoped would be able to run the organization, while I tried to produce in other areas (e.g. turning my back on everything else to get a year of FAI work done with Marcello or writing the Sequences) didn't succeed in doing so either - and the only reason we could hang on long enough to hire Luke was that the funding was available nonetheless and in sufficient quantity that we could afford to take risks like paying Luke to stay on for a while, well before we knew he would become Executive Director".

Does Luke disagree with this clarified point? I do not find a clear indicator in this conversation.

Update: I came out of a recent conversation with Eliezer with a higher opinion of Eliezer's general rationality, because several things that had previously looked to me like unforced, forseeable mistakes by Eliezer now look to me more like non-mistakes or not-so-forseeable mistakes.

You're allowed to say these things on the public Internet?

I just fell in love with SI.

You're allowed to say these things on the public Internet?

Well, at our most recent board meeting I wasn't fired, reprimanded, or even questioned for making these comments, so I guess I am. :)

Not even funny looks? ;)

I just fell in love with SI.

It's Luke you should have fallen in love with, since he is the one turning things around.

It's Luke you should have fallen in love with, since he is the one turning things around.

On the other hand I can count with one hand the number of established organisations I know of that would be sociologically capable of ceding power, status and control to Luke the way SingInst did. They took an untrained intern with essentially zero external status from past achievements and affiliations and basically decided to let him run the show (at least in terms of publicly visible initiatives). It is clearly the right thing for SingInst to do and admittedly Luke is very tall and has good hair which generally gives a boost when it comes to such selections - but still, making the appointment goes fundamentally against normal human behavior.

(Where I say "count with one hand" I am not including the use of any digits thereupon. I mean one.)

...and admittedly Luke is very tall and has good hair which generally gives a boost when it comes to such selections...

It doesn't matter that I completely understand why this phrase was included, I still found it hilarious in a network sitcom sort of way.

Well, all we really know is that he chose to. It may be that everyone he works with then privately berated him for it.
That said, I share your sentiment.
Actually, if SI generally endorses this sort of public "airing of dirty laundry," I encourage others involved in the organization to say so out loud.

The largest concern from reading this isn't really what it brings up in management context, but what it says about the SI in general. Here an area where there's real expertise and basic books that discuss well-understood methods and they didn't do any of that. Given that, how likely should I think it is that when the SI and mainstream AI people disagree that part of the problem may be the SI people not paying attention to basics?

(nods) The nice thing about general-purpose techniques for winning at life (as opposed to domain-specific ones) is that there's lots of evidence available as to how effective they are.
Precisely. For example of one existing base: the existing software that searches for solutions to engineering problems. Such as 'self improvement' via design of better chips. Works within narrowly defined field, to cull the search space. Should we expect state of the art software of this kind to be beaten by someone's contemporary paperclip maximizer? By how much? Incredibly relevant to AI risk, but analysis can't be faked without really having technical expertise.
1Paul Crowley11y
I doubt there's all that much of a correlation between these things to be honest.

This makes me wonder... What "for dummies" books should I be using as checklists right now? Time to set a 5-minute timer and think about it.

What did you come up with?
I haven't actually found the right books yet, but these are the things where I decided I should find some "for beginners" text. the important insight is that I'm allowed to use these books as skill/practice/task checklists or catalogues, rather than ever reading them all straight through. General interest: * Career * Networking * Time management * Fitness For my own particular professional situation, skills, and interests: * Risk management * Finance * Computer programming * SAS * Finance careers * Career change * Web programming * Research/science careers * Math careers * Appraising * Real Estate * UNIX
For fitness, I'd found Liam Rosen's FAQ (the 'sticky' from 4chan's /fit/ board) to be remarkably helpful and information-dense. (Mainly, 'toning' doesn't mean anything, and you should probably be lifting heavier weights in a linear progression, but it's short enough to be worth actually reading through.)
The For Dummies series is generally very good indeed. Yes.

these are all literally from the Nonprofits for Dummies book. [...] The history I've heard is that SI [...]


failed to read Nonprofits for Dummies,

I remember that, when Anna was managing the fellows program, she was reading books of the "for dummies" genre and trying to apply them... it's just that, as it happened, the conceptual labels she accidentally happened to give to the skill deficits she was aware of were "what it takes to manage well" (i.e. "basic management") and "what it takes to be productive", rather than "what it takes to (help) operate a nonprofit according to best practices". So those were the subjects of the books she got. (And read, and practiced.) And then, given everything else the program and the organization was trying to do, there wasn't really any cognitive space left over to effectively notice the possibility that those wouldn't be the skills that other people afterwards would complain that nobody acquired and obviously should have known to. The rest of her budgeted self-improvement effort mostly went toward overcoming self-defeating emotional/social blind spots and motivated cognition. (And I remember... (read more)

Note that this was most of the purpose of the Fellows program in the first place -- [was] to help sort/develop those people into useful roles, including replacing existing management

FWIW, I never knew the purpose of the VF program was to replace existing SI management. And I somewhat doubt that you knew this at the time, either. I think you're just imagining this retroactively given that that's what ended up happening. For instance, the internal point system used to score people in the VFs program had no points for correctly identifying organizational improvements and implementing them. It had no points for doing administrative work (besides cleaning up the physical house or giving others car rides). And it had no points for rising to management roles. It was all about getting karma on LW or writing conference papers. When I first offered to help with the organization directly, I was told I was "too competent" and that I should go do something more useful with my talent, like start another business... not "waste my time working directly at SI."

Seems like a fair paraphrase.
This inspired me to make a blog post: You need to read Nonprofit Kit for Dummies.
... which Eliezer has read and responded to, noting he did indeed read just that book in 2000 when he was founding SIAI. This suggests having someone of Luke's remarkable drive was in fact the missing piece of the puzzle.
5Paul Crowley11y
Fascinating! I want to ask "well, why didn't it take then?", but if I were in Eliezer's shoes I'd be finding this discussion almost unendurably painful right now, and it feels like what matters has already been established. And of course he's never been the person in charge of that sort of thing, so maybe he's not who we should be grilling anyway.

Obviously we need How to be Lukeprog for Dummies. Luke appears to have written many fragments for this, of course.

Beating oneself up with hindsight bias is IME quite normal in this sort of circumstance, but not actually productive. Grilling the people who failed makes it too easy to blame them personally, when it's a pattern I've seen lots and lots, suggesting the problem is not a personal failing.

Agreed entirely - it's definitely not a mark of a personal failing. What I'm curious about is how we can all learn to do better at the crucial rationalist skill of making use of the standard advice about prosaic tasks - which is manifestly a non-trivial skill.

The Bloody Obvious For Dummies. If only common sense were! From the inside (of a subcompetent charity - and I must note, subcompetent charities know they're subcompetent), it feels like there's all this stuff you're supposed to magically know about, and lots of "shut up and do the impossible" moments. And you do the small very hard things, in a sheer tour de force of remarkable effort. But it leads to burnout. Until the organisation makes it to competence and the correct paths are retrospectively obvious. That actually reads to me like descriptions I've seen of the startup process.
That book looks like the basic solution to the pattern I outline here, and from your description, most people who have any public good they want to achieve should read it around the time they think of getting a second person involved.
Donald Rumsfeld
9Eliezer Yudkowsky11y
...this was actually a terrible policy in historical practice.
That only seems relevant if the war in question is optional.
5Eliezer Yudkowsky11y
Rumsfeld is speaking of the Iraq war. It was an optional war, the army turned out to be far understrength for establishing order, and they deliberately threw out the careful plans for preserving e.g. Iraqi museums from looting that had been drawn up by the State Department, due to interdepartmental rivalry. This doesn't prove the advice is bad, but at the very least, Rumsfeld was just spouting off Deep Wisdom that he did not benefit from spouting; one would wish to see it spoken by someone who actually benefited from the advice, rather than someone who wilfully and wantonly underprepared for an actual war.

just spouting off Deep Wisdom that he did not benefit from spouting

Indeed. The proper response, which is surely worth contemplation, would have been:

Victorious warriors win first and then go to war, while defeated warriors go to war first and then seek to win.

Sun Tzu

Given the several year lag between funding increases and the listed improvements, it appears that this was less a result of a prepared plan and more a process of underutilized resources attracting a mix of parasites (the theft) and talent (hopefully the more recent staff additions). Which goes towards a critical question in terms of future funding: is SIAI primarily constrained in its mission by resources or competence? Of course, the related question is: what is SIAI's mission? Someone donating primarily for AGI research might not count recent efforts (LW, rationality camps, etc) as improvements. What should a potential donor expect from money invested into this organization going forward? Internally, what are your metrics for evaluation? Edited to add: I think that the spin-off of the rationality efforts is a good step towards answering these questions.
This seems like a rather absolute statement. Knowing Luke, I'll bet he would've gotten some of it done even on a limited budget.

Luke and Louie Helm are both on paid staff.

I'm pretty sure their combined salaries are lower than the cost of the summer fellows program that SI was sponsoring four or five years ago. Also, if you accept my assertion that Luke could find a way to do it on a limited budget, why couldn't somebody else?

Givewell is interested in finding charities that translate good intentions into good results. This requires that the employees of the charity have low akrasia, desire to learn about and implement organizational best practices, not suffer from dysrationalia, etc. I imagine that from Givewell's perspective, it counts as a strike against the charity if some of the charity's employees have a history of failing at any of these.

I'd rather hear Eliezer say "thanks for funding us until we stumbled across some employees who are good at defeating their akrasia and care about organizational best practices", because this seems like a better depiction of what actually happened. I don't get the impression SI was actively looking for folks like Louie and Luke.

Yes to this. Eliezer's claim about the need for funding may suffer many of Luke's criticisms above. But usually the most important thing you need is talent and that does require funding.
My hope is that the upcoming deluge of publications will answer this objection, but for the moment, I am unclear as to the justification for the level of resources being given to SIAI researchers. This level of freedom is the dream of every researcher on the planet. Yet, it's unclear why these resources should be devoted to your projects. While I strongly believe that the current academic system is broken, you are asking for a level of support granted to top researchers prior to have made any original breakthroughs yourself. If you can convince people to give you that money, wonderful. But until you have made at least some serious advancement to demonstrate your case, donating seems like an act of faith. It's impressive that you all have found a way to hack the system and get paid to develop yourselves as researchers outside of the academic system and I will be delighted to see that development bear fruit over the coming years. But, at present, I don't see evidence that the work being done justifies or requires that support.

This level of freedom is the dream of every researcher on the planet. Yet, it's unclear why these resources should be devoted to your projects.

Because some people like my earlier papers and think I'm writing papers on the most important topic in the world?

It's impressive that you all have found a way to hack the system and get paid to develop yourselves as researchers outside of the academic system...

Note that this isn't uncommon. SI is far from the only think tank with researchers who publish in academic journals. Researchers at private companies do the same.

First, let me say that, after re-reading, I think that my previous post came off as condescending/confrontational which was not my intent. I apologize.

Second, after thinking about this for a few minutes, I realized that some of the reason your papers seem so fluffy to me is that they argue what I consider to be obvious points. In my mind, of course we are likely "to develop human-level AI before 2100." Because of that, I may have tended to classify your work as outreach more than research.

But outreach is valuable. And, so that we can factor out the question of the independent contribution of your research, having people associated with SIAI with the publications/credibility to be treated as experts has gigantic benefits in terms of media multipliers (being the people who get called on for interviews, panels, etc). So, given that, I can see a strong argument for publication support being valuable to the overall organization goals regardless of any assessment of the value of the research.

Note that this isn't uncommon. SI is far from the only think tank with researchers who publish in academic journals. Researchers at private companies do the same.

My only point was that,... (read more)

It's true at my company, at least. There are quite a few papers out there authored by the researchers at the company where I work. There are several good business reasons for a company to invest time into publishing a paper; positive PR is one of them.
Isn't this very strong evidence in support for Holden's point about "Apparent poorly grounded belief in SI's superior general rationality" (excluding Luke, at least)? And especially this?

This topic is something I've been thinking about lately. Do SIers tend to have superior general rationality, or do we merely escape a few particular biases? Are we good at rationality, or just good at "far mode" rationality (aka philosophy)? Are we good at epistemic but not instrumental rationality? (Keep in mind, though, that rationality is only a ceteris paribus predictor of success.)

Or, pick a more specific comparison. Do SIers tend to be better at general rationality than someone who can keep a small business running for 5 years? Maybe the tight feedback loops of running a small business are better rationality training than "debiasing interventions" can hope to be.

Of course, different people are more or less rational in different domains, at different times, in different environments.

This isn't an idle question about labels. My estimate of the scope and level of people's rationality in part determines how much I update from their stated opinion on something. How much evidence for Hypothesis X (about organizational development) is it when Eliezer gives me his opinion on the matter, as opposed to when Louie gives me his opinion on the matter? When Person B proposes to take on a totally new kind of project, I think their general rationality is a predictor of success — so, what is their level of general rationality?

Holden implies (and I agree with him) that there's very little evidence at the moment to suggest that SI is good at instrumental rationality. As for epistemic rationality, how would we know ? Is there some objective way to measure it ? I personally happen to believe that if a person seems to take it as a given that he's great at epistemic rationality, this fact should count as evidence (however circumstantial) against him being great at epistemic rationality... but that's just me.
If you accept that your estimate of someone's "rationality" should depend on the domain, the environment, the time, the context, etc... and what you want to do is make reliable estimates of the reliability of their opinion, their chances of success. etc... it seems to follow that you should be looking for comparisons within a relevant domain, environment, etc. That is, if you want to get opinions about hypothesis X about organizational development that serve as significant evidence, it seems the thing to do is to find someone who knows a lot about organizational development -- ideally, someone who has been successful at developing organizations -- and consult their opinions. How generally rational they are might be very relevant causally, or it might not, but is in either case screened off by their domain competence... and their domain competence is easier to measure than their general rationality. So is their general rationality worth devoting resources to determining? It seems this only makes sense if you have already (e.g.) decided to ask Eliezer and Louie for their advice, whether it's good evidence or not, and now you need to know how much evidence it is, and you expect the correct answer is different from the answer you'd get by applying the metrics you know about (e.g., domain familiarity and previously demonstrated relevant expertise).
I do spend a fair amount of time talking to domain experts outside of SI. The trouble is that the question of what we should do about thing X doesn't just depend on domain competence but also on thousands of details about the inner workings of SI and our mission that I cannot communicate to domain experts outside SI, but which Eliezer and Louie already possess.
So it seems you have a problem in two domains (organizational development + SI internals) and different domain experts in both domains (outside domain experts + Eliezer/Louie), and need some way of cross-linking the two groups' expertise to get a coherent recommendation, and the brute-force solutions (e.g. get them all in a room together, or bring one group up to speed on the other's domain) are too expensive to be worth it. (Well, assuming the obstacle isn't that the details need to be kept secret, but simply that expecting an outsider to come up to speed on all of SI's local potentially relevant trivia simply isn't practical.) Yes? Yeah, that can be a problem. In that position, for serious questions I would probably ask E/L for their recommendations and a list of the most relevant details that informed that decision, then go to outside experts with a summary of the competing recommendations and an expanded version of that list and ask for their input. If there's convergence, great. If there's divergence, iterate. This is still a expensive approach, though, so I can see where a cheaper approximation for less important questions is worth having.
Yes to all this.
I found this complaint insufficiently detailed and not well worded. Average people think their rationality is moderately good. Average people are not very rational. SI affiliated people think they are adept or at least adequate at rationality. SI affiliated people are not complete disasters at rationality. SI affiliated people are vastly superior to others in generally rationality. So the original complaint literally interpreted is false. An interesting question might be on the level of: "Do SI affiliates have rationality superior to what the average person falsely believes his or her rationality is?" Holden's complaints each have their apparent legitimacy change differently under his and my beliefs. Some have to do with overconfidence or incorrect self-assessment, others with other-assessment, others with comparing SI people to others. Some of them: Largely agree, as this relates to overconfidence. Moderately disagree, as this relies on the rationality of others. Largely disagree, as this relies significantly on the competence of others. Largely agree, as this depends more on accurate assessment of one's on rationality. There is instrumental value in falsely believing others to have a good basis for disagreement so one's search for reasons one might be wrong is enhanced. This is aside from the actual reasons of others. It is easy to imagine an expert in a relevant field objecting to SI based on something SI does or says seeming wrong, only to have the expert couch the objection in literally false terms, perhaps ones that flow from motivated cognition and bear no trace of the real, relevant reason for the objection. This could be followed by SI's evaluation and dismissal of it and failure of a type not actually predicted by the expert...all such nuances are lost in the literally false "Apparent poorly grounded belief in SI's superior general rationality." Such a failure comes to mind and is easy for me to imagine as I think this is a major reason why "Lac
As a supporter and donor to SI since 2006, I can say that I had a lot of specific criticisms of the way that the organization was managed. The points Luke lists above were among them. I was surprised that on many occasions management did not realize the obvious problems and fix them. But the current management is now recognizing many of these points and resolving them one by one, as Luke says. If this continues, SI's future looks good.
Why did you start referring to yourself in the first person and then change your mind? (Or am I missing something?)

Brain fart: now fixed.

(Why was this downvoted? If it's because the downvoter wants to see fewer brain farts, they're doing it wrong, because the message such a downvote actually conveys is that they want to see fewer acknowledgements of brain farts. Upvoted back to 0, anyway.)

The 'example' link is dead.

Wow, I'm blown away by Holden Karnofsky, based on this post alone. His writing is eloquent, non-confrontational and rational. It shows that he spent a lot of time constructing mental models of his audience and anticipated its reaction. Additionally, his intelligence/ego ratio appears to be through the roof. He must have learned a lot since the infamous astroturfing incident. This is the (type of) person SI desperately needs to hire.

Emotions out of the way, it looks like the tool/agent distinction is the main theoretical issue. Fortunately, it is much easier than the general FAI one. Specifically, to test the SI assertion that, paraphrasing Arthur C. Clarke,

Any sufficiently advanced tool is indistinguishable from an agent.

one ought to formulate and prove this as a theorem, and present it for review and improvement to the domain experts (the domain being math and theoretical computer science). If such a proof is constructed, it can then be further examined and potentially tightened, giving new insights to the mission of averting the existential risk from intelligence explosion.

If such a proof cannot be found, this will lend further weight to the HK's assertion that SI appears to be poorly qualified to address its core mission.

Any sufficiently advanced tool is indistinguishable from agent.

I shall quickly remark that I, myself, do not believe this to be true.

What exactly is the difference between a "tool" and an "agent", if we taboo the words? My definition would be that "agent" has their own goals / utility functions (speaking about human agents, those goals / utility functions are set by evolution), while "tool" has a goal / utility function set by someone else. This distinction may be reasonable on a human level, "human X optimizing for human X's utility" versus "human X optimizing for human Y's utility", but on a machine level, what exactly is the difference between a "tool" that is ordered to reach a goal / optimize a utility function, and an "agent" programmed with the same goal / utility function? Am I using a bad definition that misses something important? Or is there anything than prevents "agent" to be reduced to a "tool" (perhaps a misconstructed tool) of the forces that have created them? Or is it that all "agents" are "tools", but not all "tools" are "agents", because... why?
One definition of intelligence that I've seen thrown around on LessWrong is it's the ability to figure out how to steer reality in specific directions given the resources available. Both the tool and the agent are intelligent in the sense that, assuming they are given some sort of goal, they can formulate a plan on how to achieve that goal, but the agent will execute the plan, while the tool will report the plan. I'm assuming for the sake of isolating the key difference, that for both the tool-AI and the agent-AI, they are "passively" waiting for instructions for a human before they spring into action. For an agent-AI, I might say "Take me to my house", whereas for a tool AI, I would say "What's the quickest route to get to my house?", and as soon as I utter these words, suddenly the AI has a new utility function to use in evaluate any possible plan it comes up with. Assuming it's always possible to decouple "ability to come up with a plan" from both "execute the plan" and "display the plan", then any "tool" can be converted to an "agent" by replacing every instance of "display the plan" to "execute the plan" and vice versa for converting an agent into a tool.
My understanding of the distinction made in the article was: Both "agent" and "tool" are ways of interacting with a highly sophisticated optimization process, which takes a "goal" and applies knowledge to find ways of achieving that goal. An agent then acts out the plan. A tool reports the plan to a human (often in in a sophisticated way, including plan details, alternatives, etc.). So, no, it has nothing to do with whether I'm optimizing "my own" utility vs someone else's.
You divide planning from acting, as if those two are completely separate things. Problem is, in some situations they are not. If you are speaking with someone, then the act of speach is acting. In this sense, even a "tool" is allowed to act. Now imagine a super-intelligent tool which is able to predict human's reactions to its words, and make it a part of equation. Now the simple task of finding x such that cost(x) is the smallest, suddenly becomes a task of finding x and finding a proper way to report this x to human, such that cost(x) is the smallest. If this opens some creative new options, where the f(x) is smaller than it should usually be, for the super-intelligent "tool" it will be a correct solution. So for example reporting a result which makes the human commit suicide, if as a side effect this will make the report true, and it will minimize f(x) beyond normally achievable bounds, is acceptable solution. Example question: "How should I get rid of my disease most cheaply." Example answer: "You won't. You will die soon in terrible pains. This report is 99.999% reliable". Predicted human reaction: becomes insane from horror, dedices to kill himself, does it clumsily, suffers from horrible pains, then dies. Success rate: 100%, the disease is gone. Costs of cure: zero. Mission completed.
To me, this is still in the spirit of an agent-type architecture. A tool-type architecture will tend to decouple the optimization of the answer given from the optimization of the way it is presented, so that the presentation does not maximize the truth of the statement. However, I must admit that at this point I'm making a fairly conjunctive argument; IE, the more specific I get about tool/agent distinctions, the less credibility I can assign to the statement "almost all powerful AIs constructed in the near future will be tool-style systems". (But I still would maintain my assertion that you would have to specifically program this type of behavior if you wanted to get it.)
Then the objection 2 seems to hold: unless I misunderstand your point severely (it happened once or twice before).

It's complicated. A reply that's true enough and in the spirit of your original statement, is "Something going wrong with a sufficiently advanced AI that was intended as a 'tool' is mostly indistinguishable from something going wrong with a sufficiently advanced AI that was intended as an 'agent', because math-with-the-wrong-shape is math-with-the-wrong-shape no matter what sort of English labels like 'tool' or 'agent' you slap on it, and despite how it looks from outside using English, correctly shaping math for a 'tool' isn't much easier even if it "sounds safer" in English." That doesn't get into the real depths of the problem, but it's a start. I also don't mean to completely deny the existence of a safety differential - this is a complicated discussion, not a simple one - but I do mean to imply that if Marcus Hutter designs a 'tool' AI, it automatically kills him just like AIXI does, and Marcus Hutter is unusually smart rather than unusually stupid but still lacks the "Most math kills you, safe math is rare and hard" outlook that is implicitly denied by the idea that once you're trying to design a tool, safe math gets easier somehow. This is much the same problem as with the Oracle outlook - someone says something that sounds safe in English but the problem of correctly-shaped-math doesn't get very much easier.

This sounds like it'd be a good idea to write a top-level post about it.

Though it's not as detailed and technical as many would like, I'll point readers to this bit of related reading, one of my favorites:

Yudkowsky (2011). Complex value systems are required to realize valuable futures.

9Wei Dai11y
When you say "Most math kills you" does that mean you disagree with arguments like these, or are you just simplifying for a soundbite?
Why? Or, rather: Where do you object to the argument by Holden? (Given a query, the tool-AI returns an answer with a justification, so the plan for "cure cancer" can be checked to make sure it does not do so by killing or badly altering humans.)
One trivial, if incomplete, answer is that to be effective, the Oracle AI needs to be able to answer the question "how do we build a better oracle AI" and in order to define "better" in that sentence in a way that causes our oracle to output a new design that is consistent with all the safeties we built into the original oracle, it needs to understand the intent behind the original safeties just as much as an agent-AI would.

The real danger of Oracle AI, if I understand it correctly, is the nasty combination of (i) by definition, an Oracle AI has an implicit drive to issue predictions most likely to be correct according to its model, and (ii) a sufficiently powerful Oracle AI can accurately model the effect of issuing various predictions. End result: it issues powerfully self-fulfilling prophecies without regard for human values. Also, depending on how it's designed, it can influence the questions to be asked of it in the future so as to be as accurate as possible, again without regard for human values.

9Paul Crowley11y
My understanding of an Oracle AI is that when answering any given question, that question consumes the whole of its utility function, so it has no motivation to influence future questions. However the primary risk you set out seems accurate. Countermeasures have been proposed, such as asking for an accurate prediction for the case where a random event causes the prediction to be discarded, but in that instance it knows that the question will be asked again of a future instance of itself.

My understanding of an Oracle AI is that when answering any given question, that question consumes the whole of its utility function, so it has no motivation to influence future questions.

It could acausally trade with its other instances, so that a coordinated collection of many instances of predictors would influence the events so as to make each other's predictions more accurate.

1Paul Crowley11y
Wow, OK. Is it possible to rig the decision theory to rule out acausal trade?
IIRC you can make it significantly more difficult with certain approaches, e.g. there's an OAI approach that uses zero-knowledge proofs and that seemed pretty sound upon first inspection, but as far as I know the current best answer is no. But you might want to try to answer the question yourself, IMO it's fun to think about from a cryptographic perspective.
(I assume you mean, self-fulfilling prophecies.) In order to get these, it seems like you would need a very specific kind of architecture: one which considers the results of its actions on its utility function (set to "correctness of output"). This kind of architecture is not the likely architecture for a 'tool'-style system; the more likely architecture would instead maximize correctness without conditioning on its act of outputting those results. Thus, I expect you'd need to specifically encode this kind of behavior to get self-fulfilling-prophecy risk. But I admit it's dependent on architecture. (Edit-- so, to be clear: in cases where the correctness of the results depended on the results themselves, the system would have to predict its own results. Then if it's using TDT or otherwise has a sufficiently advanced self-model, my point is moot. However, again you'd have to specifically program these, and would be unlikely to do so unless you specifically wanted this kind of behavior.)
Not sure. Your behavior is not a special feature of the world, and it follows from normal facts (i.e. not those about internal workings of yourself specifically) about the past when you were being designed/installed. A general purpose predictor could take into account its own behavior by default, as a non-special property of the world, which it just so happens to have a lot of data about.
Right. To say much more, we need to look at specific algorithms to talk about whether or not they would have this sort of behavior... The intuition in my above comment was that without TDT or other similar mechanisms, it would need to predict what its own answer could be before it could compute its effect on the correctness of various answers, so it would be difficult for it to use self-fulfilling prophecies. Really, though, this isn't clear. Now my intuition is that it would gather evidence on whether or not it used the self-fulfilling prophecy trick, so if it started doing so, it wouldn't stop... In any case, I'd like to note that the self-fulfilling prophecy problem is much different than the problem of an AI which escapes onto the internet and ruthlessly maximizes a utility function.
I was thinking more of its algorithm admitting an interpretation where it's asking "Say, I make prediction X. How accurate would that be?" and then maximizing over relevant possible X. Knowledge about its prediction connects the prediction to its origins and consequences, it establishes the prediction as part of the structure of environment. It's not necessary (and maybe not possible and more importantly not useful) for the prediction itself to be inferable before it's made. Agreed that just outputting a single number is implausible to be a big deal (this is an Oracle AI with extremely low bandwidth and peculiar intended interpretation of its output data), but if we're getting lots and lots of numbers it's not as clear.
There's more on this here. Taxonomy of Oracle AI
Not precisely. The advantage here is that we can just ask the AI what results it predicts from the implementation of the "better" AI, and check them against our intuitive ethics. Now, you could make an argument about human negligence on such safety measures. I think it's important to think about the risk scenarios in that case.
It's still not clear to me why having an AI that is capable of answering the question "How do we make a better version of you?" automatically kills humans. Presumably, when the AI says "Here's the source code to a better version of me", we'd still be able to read through it and make sure it didn't suddenly rewrite itself to be an agent instead of a tool. We're assuming that, as a tool, the AI has no goals per se and thus no motivation to deceive us into turning it into an agent. That said, depending on what you mean by "effective", perhaps the AI doesn't even need to be able to answer questions like "How do we write a better version of you?" For example, we find Google Maps to be very useful, even though if you asked Google Maps "How do we make a better version of Google Maps?" it would probably not be able to give the types of answers we want. A tool-AI which was smarter than the smartest human, and yet which could not simply spit out a better version of itself would still probably be a very useful AI.
If someone asks the tool-AI "How do I create an agent-AI?" and it gives an answer, the distinction is moot anyways, because one leads to the other. Given human nature, I find it extremely difficult to believe that nobody would ask the tool-AI that question, or something that's close enough, and then implement the answer...
I am now imagining an AI which manages to misinterpret some straightforward medical problem as "cure cancer of it's dependence on the host organism."
Not being a domain expert, I do not pretend to understand all the complexities. My point was that either you can prove that tools are as dangerous as agents (because mathematically they are (isomorphic to) agents), or HK's Objection 2 holds. I see no other alternative...

Even if we accepted that the tool vs. agent distinction was enough to make things "safe", objection 2 still boils down to "Well, just don't build that type of AI!", which is exactly the same keep-it-in-a-box/don't-do-it argument that most normal people make when they consider this issue. I assume I don't need to explain to most people here why "We should just make a law against it" is not a solution to this problem, and I hope I don't need to argue that "Just don't do it" is even worse...

More specifically, fast forward to 2080, when any college kid with $200 to spend (in equivalent 2012 dollars) can purchase enough computing power so that even the dumbest AIXI approximation schemes are extremely effective, good enough so that creating an AGI agent would be a week's work for any grad student that knew their stuff. Are you really comfortable living in that world with the idea that we rely on a mere gentleman's agreement not to make self-improving AI agents? There's a reason this is often viewed as an arms race, to a very real extent the attempt to achieve Friendly AI is about building up a suitably powerful defense against unfriendly AI before ... (read more)

9Eliezer Yudkowsky11y
There isn't that much computing power in the physical universe. I'm not sure even smarter AIXI approximations are effective on a moon-sized nanocomputer. I wouldn't fall over in shock if a sufficiently smart one did something effective, but mostly I'd expect nothing to happen. There's an awful lot that happens in the transition from infinite to finite computing power, and AIXI doesn't solve any of it.
Is there some computation or estimate where these results are coming from? They don't seem unreasonable, but I'm not aware of any estimates about how efficient largescale AIXI approximations are in practice. (Although attempted implementations suggest that empirically things are quite inefficient.)
Naieve AIXI is doing brute force search through an exponentially large space. Unless the right Turing machine is 100 bits or less (which seems unlikely), Eliezer's claim seems pretty safe to me. Most of mainstream machine learning is trying to solve search problems through spaces far tamer than the search space for AIXI, and achieving limited success. So it also seems safe to say that even pretty smart implementations of AIXI probably won't make much progress.
If computing power is that much cheaper, it will be because tremendous resources, including but certainly not limited to computing power, have been continuously devoted over the intervening decades to making it cheaper. There will be correspondingly fewer yet-undiscovered insights for a seed AI to exploit in the course of it's attempted takeoff.
If my comment here correctly captures what is meant by "tool mode" and "agent mode", then it seems to follow that AGI running in tool mode is no safer than the person using it. If that's the case, then an AGI running in tool mode is safer than an AGI running in agent mode if and only if agent mode is less trustworthy than whatever person ends up using the tool. Are you assuming that's true?
What you presented there (and here) is another theorem, something that should be proved (and published, if it hasn't been yet). If true, this gives an estimate on how dangerous a non-agent AGI can be. And yes, since we have had a lot of time study people and no time at all to study AGI, I am guessing that an AGI is potentially much more dangerous, because so little is known. Or at least that seems to be the whole point of the goal of developing provably friendly AI.
How about this: An agent with a very powerful tool is indistinguishable from a very powerful agent.

Wow, I'm blown away by Holden Karnofsky, based on this post alone. His writing is eloquent, non-confrontational and rational. It shows that he spent a lot of time constructing mental models of his audience and anticipated its reaction. Additionally, his intelligence/ego ratio appears to be through the roof.

Agreed. I normally try not to post empty "me-too" replies; the upvote button is there for a reason. But now I feel strongly enough about it that I will: I'm very impressed with the good will and effort and apparent potential for intelligent conversation in HoldenKarnofsky's post.

Now I'm really curious as to where things will go from here. With how limited my understanding of AI issues is, I doubt a response from me would be worth HoldenKarnofsky's time to read, so I'll leave that to my betters instead of adding more noise. But yeah. Seeing SI ideas challenged in such a positive, constructive way really got my attention. Looking forward to the official response, whatever it might be.

“the good will and effort and apparent potential for intelligent conversation” is more information than an upvote, IMO.
Right, I just meant shminux said more or less the same thing before me. So normally I would have just upvoted his comment.
Let's see if we can use concreteness to reason about this a little more thoroughly... As I understand it, the nightmare looks something like this. I ask Google SuperMaps for the fastest route from NYC to Albany. It recognizes that computing this requires traffic information, so it diverts several self-driving cars to collect real-time data. Those cars run over pedestrians who were irrelevant to my query. The obvious fix: forbid SuperMaps to alter anything outside of its own scratch data. It works with the data already gathered. Later a Google engineer might ask it what data would be more useful, or what courses of action might cheaply gather that data, but the engineer decides what if anything to actually do. This superficially resembles a box, but there's no actual box involved. The AI's own code forbids plans like that. But that's for a question-answering tool. Let's take another scenario: I tell my super-intelligent car to take me to Albany as fast as possible. It sends emotionally manipulative emails to anyone else who would otherwise be on the road encouraging them to stay home. I don't see an obvious fix here. So the short answer seems to be that it matters what the tool is for. A purely question-answering tool would be extremely useful, but not as useful as a general purpose one. Could humans with a oracular super-AI police the development and deployment of active super-AIs?
I believe that HK's post explicitly characterizes anything active like this as having agency.

I think the correct objection is something you can't quite see in google maps. If you program an AI to do nothing but output directions, it will do nothing but output directions. If those directions are for driving, you're probably fine. If those directions are big and complicated plans for something important, that you follow without really understanding why you're doing (and this is where most of the benefits of working with an AGI will show up), then you could unknowingly take over the world using a sufficiently clever scheme.

Also note that it would be a lot easier for the AI to pull this off if you let it tell you how to improve its own design. If recursively self-improving AI blows other AI out of the water, then tool AI is probably not safe unless it is made ineffective.

This does actually seem like it would raise the bar of intelligence needed to take over the world somewhat. It is unclear how much. The topic seems to me to be worthy of further study/discussion, but not (at least not obviously) a threat to the core of SIAI's mission.

It also helps that Google Maps does not have general intelligence, so it does not include user's reactions to its output, the consequent user's actions in the real world, etc. as variables in its model, which may influence the quality of the solution, and therefore can (and should) be optimized (within constraints given by user's psychology, etc.), if possible. Shortly: Google Maps does not manipulate you, because it does not see you.
A generally smart Google Maps might not manipulate you, because it has no motivation to do so. It's hard to imagine how commercial services would work when they're powered by GAI (e.g. if you asked a GAI version of Google Maps a question that's unrelated to maps, e.g. "What's a good recipe for Cheesecake?", would it tell you that you should ask Google Search instead? Would it defer to Google Search and forward the answer to you? Would it just figure out the answer anyway, since it's generally intelligent? Would the company Google simply collapse all services into a single "Google" brand, rather than have "Google Search", "Google Mail", "Google Maps", etc, and have that single brand be powered by a single GAI? etc.) but let's stick to the topic at hand and assume there's a GAI named "Google Maps", and you're asking "How do I get to Albany?" Given this use-case, would the engineers that developed the Google Maps GAI more likely give it a utility like "Maximize the probability that your response is truthful", or is it more likely that the utility would be something closer to "Always respond with a set of directions which are legal in the relevant jurisdictions that they are to be followed within which, if followed by the user, would cause the user to arrive at the destination while minimizing cost/time/complexity (depending on the user's preferences)"?
This was my thought as well: an automated vehicle is in "agent" mode. The example also demonstrates why an AI in agent mode is likely to be more useful (in many cases) than an AI in tool mode. Compare using Google maps to find a route to the airport versus just jumping into a taxi cab and saying "Take me to the airport". Since agent-mode AI has uses, it is likely to be developed.
Then it's running in agent mode? My impression was that a tool-mode system presents you with a plan, but takes no actions. So all tool-mode systems are basically question-answering systems. Perhaps we can meaningfully extend the distinction to some kinds of "semi-autonomous" tools, but that would be a different idea, wouldn't it? (Edit) After reading more comments, "a different idea" which seems to match this kind of desire...

Then it's running in agent mode? My impression was that a tool-mode system presents you with a plan, but takes no actions. So all tool-mode systems are basically question-answering systems.

I'm a sysadmin. When I want to get something done, I routinely come up with something that answers the question, and when it does that reliably I give it the power to do stuff on as little human input as possible. Often in daemon mode, to absolutely minimise how much it needs to bug me. Question-answerer->tool->agent is a natural progression just in process automation. (And this is why they're called "daemons".)

It's only long experience and many errors that's taught me how to do this such that the created agents won't crap all over everything. Even then I still get surprises.

Well, do your 'agents' build a model of the world, fidelity of which they improve? I don't think those really are agents in the AI sense, and definitely not in self improvement sense.

They may act according to various parameters they read in from the system environment. I expect they will be developed to a level of complication where they have something that could reasonably be termed a model of the world. The present approach is closer to perceptual control theory, where the sysadmin has the model and PCT is part of the implementation. 'Cos it's more predictable to the mere human designer.

Capacity for self-improvement is an entirely different thing, and I can't see a sysadmin wanting that - the sysadmin would run any such improvements themselves, one at a time. (Semi-automated code refactoring, for example.) The whole point is to automate processes the sysadmin already understands but doesn't want to do by hand - any sysadmin's job being to automate themselves out of the loop, because there's always more work to do. (Because even in the future, nothing works.)

I would be unsurprised if someone markets a self-improving system for this purpose. For it to go FOOM, it also needs to invent new optimisations, which is presently a bit difficult.

Edit: And even a mere daemon-like automated tool can do stuff a lot of people regard as unFriendly, e.g. high frequency trading algorithms.

It's not a natural progression in the sense of occurring without human intervention. That is rather relevant if the idea ofAI safety is going to be based on using tool AI strictly as tool AI.
My own impression differs. It becomes increasingly clear that "tool" in this context is sufficiently subject to different definitions that it's not a particularly useful term.
I've been assuming the definition from the article. I would agree that the term "tool AI" is unclear, but I would not agree that the definition in the article is unclear.
I have no strong intuition about whether this is true or not, but I do intuit that if it's true, the value of sufficiently for which it's true is so high it'd be nearly impossible to achieve it accidentally. (On the other hand the blind idiot god did ‘accidentally’ make tools into agents when making humans, so... But after all that only happened once in hundreds of millions of years of ‘attempts’.)
This seems like a very valuable point. In that direction, we also have the tens of thousands of cancers that form every day, military coups, strikes, slave revolts, cases of regulatory capture, etc.
Hmmm. Yeah, cancer. The analogy would be "sufficiently advanced tools tend to be a short edit distance away from agents", which would mean that a typo in the source code or a cosmic ray striking a CPU at the wrong place and time could have pretty bad consequences.
I do not think this is even true.
I routinely try to turn sufficiently reliable tools into agents wherever possible, per this comment. I suppose we could use a definition of "agent" that implied greater autonomy in setting its own goals. But there are useful definitions that don't.
If the tool/agent distinction exists for sufficiently powerful AI, then a theory of friendliness might not be strictly necessary, but still highly prudent. Going from a tool-AI to an agent-AI is a relatively simple step of the entire process. If meaningful guarantees of friendliness turn out to be impossible, then security comes down on no one attempting to make an agent-AI when strong enough tool-AIs are available. Agency should be kept to a minimum, even with a theory of friendliness in hand, as Holden argues in objection 1. Guarantees are safeguards against the possibility of agency rather than a green light.
If it is true (i.e. if a proof can be found) that "Any sufficiently advanced tool is indistinguishable from agent", then any RPOP will automatically become indistinguishable from an agent once it has self-improved past our comprehension point. This would seem to argue against Yudkowsky's contention that the term RPOP is more accurate than "Artificial Intelligence" or "superintelligence".
I don't understand; isn't Holden's point precisely that a tool AI is not properly described as an optimization process? Google Maps isn't optimizing anything in a non-trivial sense, anymore than a shovel is.
My understanding of Holden's argument was that powerful optimization processes can be run in either tool-mode or agent-mode. For example, Google maps optimizes routes, but returns the result with alternatives and options for editing, in "tool mode".
4Wei Dai11y
Holden wants to build Tool-AIs that output summaries of their calculations along with suggested actions. For Google Maps, I guess this would be the distance and driving times, but how does a Tool-AI summarize more general calculations that it might do? It could give you the expected utilities of each option, but it's hard to see how that helps if we're concerned that its utility function or EU calculations might be wrong. Or maybe it could give a human-readable description of the predicted consequences of each option, but the process that produces such descriptions from the raw calculations would seem to require a great deal of intelligence on its own (for example it might have to describe posthuman worlds in terms understandable to us), and it itself wouldn't be a "safe" Tool-AI, since the summaries produced would presumably not come with further alternative summaries and meta-summaries of how the summaries were calculated. (My question might be tangential to your own comment. I just wanted your thoughts on it, and this seems to be the best place to ask.)
Honestly, this whole tool/agent distinction seems tangential to me. Consider two systems, S1 and S2. S1 comprises the following elements: a) a tool T, which when used by a person to achieve some goal G, can efficiently achieve G b) a person P, who uses T to efficiently achieve G. S2 comprises a non-person agent A which achieves G efficiently. I agree that A is an agent and T is not an agent, and I agree that T is a tool, and whether A is a tool seems a question not worth asking. But I don't quite see why I should prefer S1 to S2. Surely the important question is whether I endorse G?
A tool+human differs from a pure AI agent in two important ways: * The human (probably) already has naturally-evolved morality, sparing us the very hard problem of formalizing that. * We can arrange for (almost) everyone to have access to the tool, allowing tooled humans to counterbalance eachother.
First, I am not fond of the term RPOP, because it constrains the space of possible intelligences to optimizers. Humans are reasonably intelligent, yet we are not consistent optimizers. Neither do current domain AIs (they have bugs that often prevent them from performing optimization consistently and predictably).That aside, I don't see how your second premise follows from the first. Just because RPOP is a subset of AI and so would be a subject of such a theorem, it does not affect in any way the (non)validity of the EY's contention.

Is it just me, or do Luke and Eliezer's initial responses appear to send the wrong signals? From the perspective of an SI critic, Luke's comment could be interpreted as saying "for us, not being completely incompetent is worth bragging about", and Eliezer's as "we're so arrogant that we've only taken two critics (including Holden) seriously in our entire history". These responses seem suboptimal, given that Holden just complained about SI's lack of impressive accomplishments, and being too selective about whose feedback to take seriously.

While I have sympathy with the complaint that SI's critics are inarticulate and often say wrong things, Eliezer's comment does seem to be indicative of the mistake Holden and Wei Dai are describing. Most extant presentations of SIAI's views leave much to be desired in terms of clarity, completeness, concision, accessibility, and credibility signals. This makes it harder to make high quality objections. I think it would be more appropriate to react to poor critical engagement more along the lines of "We haven't gotten great critics. That probably means that we need to work on our arguments and their presentation," and less along the lines of "We haven't gotten great critics. That probably means that there's something wrong with the rest of the world."

This. I've been trying to write something about Eliezer's debate with Robin Hanson, but the problem I keep running up against is that Eliezer's points are not clearly articulated at all. Even making my best educated guesses about what's supposed to go in the gaps in his arguments, I still ended up with very little.

Have the key points of that 'debate' subsequently been summarized or clarified on LW? I found that debate exasperating in that Hanson and EY were mainly talking past each other and couldn't seem to hone in on their core disagreements. I know it generally has to do with hard takeoff / recursive self-improvement vs more gradual EM revolution, but that's not saying all that much.

I'm in the process of writing a summary and analysis of the key arguments and points in that debate.

The most recent version runs at 28 pages - and that's just an outline.

If you need help with grunt work, please send me a message. If (as I suspect is the case) not, then good luck!
Thanks, I'm fine. I posted a half-finished version here, and expect to do some further refinements soon.

Agree with all this.

In fairness I should add that I think Luke M agrees with this assessment and is working on improving these arguments/communications.

Luke isn't bragging, he's admitting that SI was/is bad but pointing out it's rapidly getting better. And Eliezer is right, criticisms of SI are usually dumb. Could their replies be interpreted the wrong way? Sure, anything can be interpreted in any way anyone likes. Of course Luke and Eliezer could have refrained from posting those replies and instead posted carefully optimized responses engineered to send nothing but extremely appealing signals of humility and repentance.

But if they did turn themselves into politicians, we wouldn't get to read what they actually think. Is that what you want?

Luke isn't bragging, he's admitting that SI was/is bad but pointing out it's rapidly getting better.

But the accomplishments he listed (e.g., having a strategic plan, website redesign) are of the type that Holden already indicated to be inadequate. So why the exhaustive listing, instead of just giving a few examples to show SI is getting better and then either agreeing that they're not yet up to par, or giving an argument for why Holden is wrong? (The reason I think he could be uncharitably interpreted as bragging is that he would more likely exhaustively list the accomplishments if he was proud of them, instead of just seeing them as fixes to past embarrassments.)

And Eliezer is right, criticisms of SI are usually dumb.

I'd have no problem with "usually" but "all except two" seems inexcusable.

But if they did turn themselves into politicians, we wouldn't get to read what they actually think. Is that what you want?

Do their replies reflect their considered, endorsed beliefs, or were they just hurried remarks that may not say what they actually intended? I'm hoping it's the latter...

But the accomplishments he listed (e.g., having a strategic plan, website redesign) are of the type that Holden already indicated to be inadequate. So why the exhaustive listing, instead of just giving a few examples to show SI is getting better and then either agreeing that they're not yet up to par, or giving an argument for why Holden is wrong?

Presume that SI is basically honest and well-meaning, but possibly self-deluded. In other words, they won't outright lie to you, but they may genuinely believe that they're doing better than they really are, and cherry-pick evidence without realizing that they're doing so. How should their claims of intending to get better be evaluated?

Saying "we're going to do things better in the future" is some evidence about SI intending to do better, but rather weak evidence, since talk is cheap and it's easy to keep thinking that you're really going to do better soon but there's this one other thing that needs to be done first and we'll get started on the actual improvements tomorrow, honest.

Saying "we're going to do things better in the future, and we've fixed these three things so far" is stronger evidence, since it shows tha... (read more)

I've added a clarifying remark at the end of this comment and another at the end of this comment.

Luke's comment could be interpreted as saying "for us, not being completely incompetent is worth bragging about"

Really? I personally feel pretty embarrassed by SI's past organizational competence. To me, my own comment reads more like "Wow, SI has been in bad shape for more than a decade. But at least we're improving very quickly."

Also, I very much agree with Beckstead on this: "Most extant presentations of SIAI's views leave much to be desired in terms of clarity, completeness, concision, accessibility, and credibility signals. This makes it harder to make high quality objections." And also this: "We haven't gotten great critics. That probably means that we need to work on our arguments and their presentation."


Yes, I think it at least gives a bad impression to someone, if they're not already very familiar with SI and sympathetic to its cause. Assuming you don't completely agree with the criticisms that Holden and others have made, you should think about why they might have formed wrong impressions of SI and its people. Comments like the ones I cited seem to be part of the problem.

I personally feel pretty embarrassed by SI's past organizational competence. To me, my own comment reads more like "Wow, SI has been in bad shape for more than a decade. But at least we're improving very quickly."

That's good to hear, and thanks for the clarifications you added.

It's a fine line though, isn't it? Saying "huh, looks like we have much to learn, here's what we're already doing about it" is honest and constructive, but sends a signal of weakness and defensiveness to people not bent on a zealous quest for truth and self-improvement. Saying "meh, that guy doesn't know what he's talking about" would send the stronger social signal, but would not be constructive to the community actually improving as a result of the criticism. Personally I prefer plunging ahead with the first approach. Both in the abstract for reasons I won't elaborate on, but especially in this particular case. SI is not in a position where its every word is scrutinized; it would actually be a huge win if it gets there. And if/when it does, there's a heck of a lot more damning stuff that can be used against it than an admission of past incompetence.

Eliezer's comment makes me think that you, specifically, should consider collecting your criticisms and putting them in Main where Eliezer is more likely to see them and take the time to seriously consider them.

Luke's comment addresses the specific point that Holden made about changes in the organization given the change in leadership.

Holden said:

I'm aware that SI has relatively new leadership that is attempting to address the issues behind some of my complaints. I have a generally positive impression of the new leadership; I believe the Executive Director and Development Director, in particular, to represent a step forward in terms of being interested in transparency and in testing their own general rationality. So I will not be surprised if there is some improvement in the coming years, particularly regarding the last couple of statements listed above. That said, SI is an organization and it seems reasonable to judge it by its organizational track record, especially when its new leadership is so new that I have little basis on which to judge these staff.

Luke attempted to provide (for the reader) a basis on which to judge these staff members.

Eliezer's response was... characteristic of Eliezer? And also very short and coming at a busy time for him.

I think that's Wei_Dai's point, that these "characteristic" replies are fine if you're used to him, but are bad if you don't.
Yeah I mean, as time goes on I think more and more of Eliezer as being kind of a jerk. I thought Luke's post was good, and Eliezer's wasn't, but I also expected longer posts to be forthcoming (which they were).
I think it's unfair to take Eliezer's response as anything other than praise for this article. He noted already that he did not have time to respond properly. And why even point out that a human's response to anything is "suboptimal"? It will be notable when a human does something optimal.
We do, on occasion, come up with optimal algorithms for things. Also, "suboptimal" usually means "I can think of several better solutions off the top of my head", not "This solution is not maximally effective".
I read Luke's comment just as "I'm aware these are issues and we're working on it." I didn't read him as "bragging" about the ones that have been solved. Eliezer's... I see the problem with. I initially read it as just commenting Holden on his high-quality article (which I agree was high-quality), but I can see it being read as backhanded at anyone else who's criticized SIAI.
6Paul Crowley11y
Are there other specific critiques you think should have made Eliezer's list, or is it that you think he should not have drawn attention to their absence?

Are there other specific critiques you think should have made Eliezer's list, or is it that you think he should not have drawn attention to their absence?

Many of Holden's criticisms have been made by others on LW already. He quoted me in Objection 1. Discussion of whether Tool-AI and Oracle-AI are or are not safe have occurred numerous times. Here's one that I was involved in. Many people have criticized Eliezer/SI for not having sufficiently impressive accomplishments. Cousin_it and Silas Barta have questioned whether the rationality techniques being taught by SI (and now the rationality org) are really effective.

Thanks for taking the time to express your views quite clearly--I think this post is good for the world (even with a high value on your time and SI's fundraising ability), and that norms encouraging this kind of discussion are a big public good.

I think the explicit objections 1-3 are likely to be addressed satisfactorily (in your judgment) by less than 50,000 words, and that this would provide a good opportunity for SI to present sharper versions of the core arguments---part of the problem with existing materials is certainly that it is difficult and unrewarding to respond to a nebulous and shifting cloud of objections. A lot of what you currently view as disagreements with SI's views may get shifted to doubts about SI being the right organization to back, which probably won't get resolved by 50,000 words.

This post is highly critical of SIAI — both of its philosophy and its organizational choices. It is also now the #1 most highly voted post in the entire history of LessWrong — higher than any posts by Eliezer or myself.

I shall now laugh harder than ever when people try to say with a straight face that LessWrong is an Eliezer-cult that suppresses dissent.

Either I promoted this and then forgot I'd done so, or someone else promoted it - of course I was planning to promote it, but I thought I'd planned to do so on Tuesday after the SIAIers currently running a Minicamp had a chance to respond, since I expected most RSS subscribers to the Promoted feed to read comments only once (this is the same reason I wait a while before promoting e.g. monthly quotes posts). On the other hand, I certainly did upvote it the moment I saw it.

Original comment now edited; I wasn't aware anyone besides you might be promoting posts.

I agree (as a comparative outsider) that the polite response to Holden is excellent. Many (most?) communities -- both online communities and real-world organisations, especially long-standing ones -- are not good at it for lots of reasons, and I think the measured response of evaluating and promoting Holden's post is exactly what LessWrong members would hope LessWrong could do, and they showed it succeeded.

I agree that this is good evidence that LessWrong isn't just an Eliezer-cult. (The true test would be if Elizier and another long-standing poster were dismissive to the post, and then other people persuaded them otherwise. In fact, maybe people should roleplay that or something, just to avoid getting stuck in an argument-from-authority trap, but that's a silly idea. Either way, the fact that other people spoke positively, and Elizier and other long-standing posters did too, is a good thing.)

However, I'm not sure it's as uniquely a victory for the rationality of LessWrong as it sounds. In responose to srdiamond, Luke quoted tenlier saying "[Holden's] critique mostly consists of points that are pretty persistently bubbling beneath the surface around here, and get brought up qu... (read more)

Third highest now. Eliezer just barely gets into the top 20.
1st. At this point even I am starting to be confused.
Can you articulate the nature of your confusion?
I suppose it's that I naively expect, when opening the list of top LW posts ever, to see ones containing the most impressive or clever insights into rationality. Not that I don't think Holden's post deserves a high score for other reasons. While I am not terribly impressed with his AI-related arguments, the post is of the very highest standards of conduct, of how to have a disagreement that is polite and far beyond what is usually named "constructive".
(nods) Makes sense. My own primary inference from the popularity of this post is that there's a lot of uncertainty/disagreement within the community about the idea that creating an AGI without an explicit (and properly tuned) moral structure constitutes significant existential risk, but that the social dynamics of the community cause most of that uncertainty/disagreement to go unvoiced most of the time. Of course, there's lots of other stuff going on as well that has little to do with AGI or existential risk, and a lot to do with the social dynamics of the community itself.
Maybe. I upvoted it because it will have (and has had) the effect of improving SI's chances.
Some people who upvoted the post may think it is one of the best-written and most important examples of instrumental rationality on this site.
I wish I could upvote this ten times.
Well perhaps the normal practice is cult-like and dissent-suppressing and this is an atypical break. Kind of like the fat person who starts eating salad instead of nachos while he watches football. And congratulates himself on his healthy eating even though he is still having donuts for breakfast and hamburgers and french fries for lunch. Seems to me the test for suppression of dissent is not when a high-status person criticizes. The real test is when someone with medium or low status speaks out. And my impression is that lesswrong does have problems along these lines. Not as bad as other discussion groups, but still.

Eliezer, I upvoted you and was about to apologize for contributing to this rumor myself, but then found this quote from a copy of the Roko post that's available online:

Meanwhile I'm banning this post so that it doesn't (a) give people horrible nightmares and (b) give distant superintelligences a motive to follow through on blackmail against people dumb enough to think about them in sufficient detail, though, thankfully, I doubt anyone dumb enough to do this knows the sufficient detail. (I'm not sure I know the sufficient detail.)

Perhaps your memory got mixed up because Roko subsequently deleted all of his other posts and comments? (Unless "banning" meant something other than "deleting"?)

Now I've got no idea what I did. Maybe my own memory was mixed up by hearing other people say that the post was deleted by Roko? Or Roko retracted it after I banned it, or it was banned and then unbanned and then Roko retracted it?

I retract my grandparent comment; I have little trust for my own memories. Thanks for catching this.

A lesson learned here. I vividly remembered your "Meanwhile I'm banning this post" comment and was going to remind you, but chickened out due to the caps in the great-grandparent which seemed to signal that you Knew What You Were Talking About and wouldn't react kindly to correction. Props to Wei Dai for having more courage than I did.

I'm surprised and disconcerted that some people might be so afraid of being rebuked by Eliezer as to be reluctant to criticize/correct him even when such incontrovertible evidence is available showing that he's wrong. Your comment also made me recall another comment you wrote a couple of years ago about how my status in this community made a criticism of you feel like a "huge insult", which I couldn't understand at the time and just ignored.

I wonder how many other people feel this strongly about being criticized/insulted by a high status person (I guess at least Roko also felt strongly enough about being called "stupid" by Eliezer to contribute to him leaving this community a few days later), and whether Eliezer might not be aware of this effect he is having on others.

Your comment also made me recall another comment you [Kip] wrote a couple of years ago about how my status in this community made a criticism of you feel like a "huge insult", which I couldn't understand at the time and just ignored.

My brain really, really does not want to update on the numerous items of evidence available to it that it can hit people much much harder now, owing to community status, than when it was 12 years old.

(nods) I've wondered this many times.
I have also at times wondered if EY is adopting the "slam the door three times" approach to prospective members of his community, though I consider this fairly unlikely given other things he's said.

Somewhat relatedly, I remember when lukeprog first joined the site, he and EY got into an exchange that from what I recall of my perspective as a completely uninvolved third party involved luke earnestly trying to offer assistance and EY being confidently dismissive of any assistance someone like luke could provide, and at the time I remember feeling sort of sorry for luke, who it seemed to me was being treated a lot worse than he deserved, and surprised that he kept at it.

The way that story ultimately turned out led me to decide that my model of what was going on was at least importantly incomplete, and quite possibly fundamentally wrongheaded, but I haven't further refined that model.

As a data point here I tend to empathize with the recipient of such barrages to what I subjectively estimate as about 60% of the degree of emotional affect that I would experience if it were directed at myself. Particularly if said recipient is someone I respect as much as Roko and when the insults are not justified - less if they do not have my respect and if the insults are justified I experience no empathy. It is the kind of thing that I viscerally object to having in my tribe and where it is possible I try to ensure that the consequences to the high status person for their behavior are as negative as possible - or at least minimize the reward they receive if the tribe is one that tends to award bullying. There are times in the past - let's say 4 years ago - where such an attack would certainly prompt me to leave a community, even if the community was otherwise moderately appreciated. Now I believe I am unlikely to leave over such an incident. I would say I am more socially resilient and also more capable as understanding social politics as a game and so take it less personally. For instance when received the more mildly expressed declaration from Eliezer "You are not safe to even associate with!" I don't recall experiencing any flight impulses - more surprise. I was a little surprised at first too at reading of komponisto's reticence. Until I thought about it and reminded myself that in general I err on the side of not holding my tongue when I ought. In fact, the character "wedrifid" on with which I initially established this handle was banned from the game for 3 months for making exactly this kind of correction based off incontrovertible truth. People with status are dangerous and in general highly epistemically irrational in this regard. Correcting them is nearly always foolish. I must emphasize that part of my initial surprise at kompo's reticence is due to my model of Eliezer as not being especially corrupt in this kind of regard. In response t
People have to realize that to critically examine his output is very important due to the nature and scale of what he is trying to achieve. Even people with comparatively modest goals like trying to become the president of the United States of America should face and expect a constant and critical analysis of everything they are doing. Which is why I am kind of surprised how often people ask me if I am on a crusade against Eliezer or find fault with my alleged "hostility". Excuse me? That person is asking for money to implement a mechanism that will change the nature of the whole universe. You should be looking for possible shortcomings as well! Everyone should be critical of Eliezer and SIAI, even if they agree with almost anything. Why? Because if you believe that it is incredible important and difficult to get friendly AI just right, then you should be wary of any weak spot. And humans are the weak spot here.
That's why outsiders think it's a circlejerk. I've heard of Richard Loosemore whom as far as i can see was banned over corrections on the "conjunction fallacy", not sure what exactly went on, but ofc having spent time reading Roko thing (and having assumed that there was something sensible I did not hear of, and then learning that there wasn't) its kind of obvious where my priors are.
Maybe try keeping statements more accurate by qualifying your generalizations ("some outsiders"), or even just saying "that's why I think this is a circlejirk." That's what everyone ever is going to interpret it as anyhow (intentional).
Maybe you guys are too careful with qualifying everything as 'some outsiders' and then you end up with outsiders like Holden forming negative views which you could of predicted if you generalized more (and have the benefit of Holden's anticipated feedback without him telling people not to donate).
Maybe. Seems like you're reaching, though: Maybe something bad comes from us being accurate rather than general about things like this, and maybe Holden criticizing SIAI is a product of this on LessWrong for some reason, and therefore it is in fact better for you to say inaccurate things like "outsiders think it's a circlejrik." Because you... care about us?
You guys are only being supposedly 'accurate' when it feels good. I have not said, 'all outsiders', that's your interpretation which you can subsequently disagree with. SI generalized from the agreement of self selected participants, onto opinions of outsiders, like Holden, subsequently approaching him and getting back the same critique they've been hearing from rare 'contrarians' here for ages but assumed to be some sorta fringe views and such. I don't really care what you guys do with this, you can continue as is and be debunked big time as cranks, your choice. edit: actually, you can see Eliezer himself said that most AI researchers are lunatics. What did SI do to distinguish themselves from what you guys call 'lunatics'? What is here that can shift probabilities from the priors? Absolutely nothing. The focus on safety with made up fears is no indication of sanity what so ever.
IIRC Roko deleted the speculation-about-superintelligences part of the post shortly after its publication, but discussion in the comments raged on, so you subsequently banned the whole post/discussion. And a few days later, primarily for unrelated reasons but probably with this incident as a trigger, Roko deleted his account, which on that version of LW meant that the text of all his comments disappeared (on the current version of LW, only author's name gets removed when account is deleted, comments don't disappear).

Roko never deleted his account; he simply deleted all of his comments individually.

Surely not individually (there were probably thousands and IIRC it was also happening to other accounts, so wasn't the result of running a self-made destructive script); what you're seeing is just how "deletion of account" performed on old version of LW looks like on current version of LW.
No, I don't think so; in fact I don't think it was even possible for users to delete their own accounts on the old version of LW. (See here.) SilasBarta discovered Roko in the process of deleting his comments, before they had been completely deleted.
That post discusses the fact that account deletion was broken at one time in 2011, and a decision was being made about how to handle account deletion in the future. It doesn't say anything relevant about how it worked in 2010. "April last year" in that comment is when LW was started, I don't believe it refers to incomplete deletion. The comments before that date that remained could be those posted under a different username (account), automatically copied from overcomingbias along with the Sequences.
7Wei Dai11y
Here is clearer evidence that account deletion simply did nothing back then. My understanding is the same as komponisto's: Roko wrote a script to delete all of his posts/comments individually.
This comment was written 3 days before the post komponisto linked to, which discussed the issue of account deletion feature having been broken at that time (Apr 2011); the comment was probably the cause of that post. I don't see where it indicates the state of this feature around summer 2010. Since "nothing happens" behavior was indicated as an error (in Apr 2011), account deletion probably did something else before it stopped working.
3Wei Dai11y
Ok, I guess I could be wrong then. Maybe somebody who knows Roko could ask him?
4Eliezer Yudkowsky11y
This sounds right to me, but I still have little trust in my memories.
Or little interest in rational self-improvement by figuring what actually happened and why? [You've made an outrageously self-assured false statement about this, and you were upvoted—talk about sycophancy—for retracting your falsehood, while suffering no penalty for your reckless arrogance.]
To clarify for those new here -- "retract" here is meant purely in the usual sense, not in the sense of hitting the "retract" button, as that didn't exist at the time.
Are there no server logs or database fields that would clarify the mystery? Couldn't Trike answer the question? (Yes, this is a use of scarce time - but if people are going to keep bringing it up, a solid answer is best.)

Reading Holden's transcript with Jaan Tallinn (trying to go over the whole thing before writing a response, due to having done Julia's Combat Reflexes unit at Minicamp and realizing that the counter-mantra 'If you respond too fast you may lose useful information' was highly applicable to Holden's opinions about charities), I came across the following paragraph:

My understanding is that once we figured out how to get a computer to do arithmetic, computers vastly surpassed humans at arithmetic, practically overnight ... doing so didn't involve any rewriting of their own source code, just implementing human-understood calculation procedures faster and more reliably than humans can. Similarly, if we reached a good enough understanding of how to convert data into predictions, we could program this understanding into a computer and it would overnight be far better at predictions than humans - while still not at any point needing to be authorized to rewrite its own source code, make decisions about obtaining "computronium" or do anything else other than plug data into its existing hardware and algorithms and calculate and report the likely consequences of different courses of a

... (read more)

Jaan's reply to Holden is also correct:

... the oracle is, in principle, powerful enough to come up with self-improvements, but refrains from doing so because there are some protective mechanisms in place that control its resource usage and/or self-reflection abilities. i think devising such mechanisms is indeed one of the possible avenues for safety research that we (eg, organisations such as SIAI) can undertake. however, it is important to note the inherent instability of such system -- once someone (either knowingly or as a result of some bug) connects a trivial "master" program with a measurable goal to the oracle, we have a disaster in our hands. as an example, imagine a master program that repeatedly queries the oracle for best packets to send to the internet in order to minimize the oxygen content of our planet's atmosphere.

Obviously you wouldn't release the code of such an Oracle - given code and understanding of the code it would probably be easy, possibly trivial, to construct some form of FOOM-going AI out of the Oracle!

Hm. I must be missing something. No, I haven't read all the sequences in detail, so if these are silly, basic, questions - please just point me to the specific articles that answer them. You have an Oracle AI that is, say, a trillionfold better at taking existing data and producing inferences. 1) This Oracle AI produces inferences. It still needs to test those inferences (i.e. perform experiments) and get data that allow the next inferential cycle to commence. Without experimental feedback, the inferential chain will quickly either expand into an infinity of possibilities (i.e. beyond anything that any physically possible intelligence can consider), or it will deviate from reality. The general intelligence is only as good as the data its inferences are based upon. Experiments take time, data analysis takes time. No matter how efficient the inferential step may become, this puts an absolute limit to the speed of growth in capability to actually change things. 2) The Oracle AI that "goes FOOM" confined to a server cloud would somehow have to create servitors capable of acting out its desires in the material world. Otherwise, you have a very angry and very impotent AI. If you increase a person's intelligence trillionfold, and then enclose them into a sealed concrete cell, they will never get out; their intelligence can calculate all possible escape solutions, but none will actually work. Do you have a plausible scenario how a "FOOM"-ing AI could - no matter how intelligent - minimize oxygen content of our planet's atmosphere, or any such scenario? After all, it's not like we have any fully-automated nanobot production factories that could be hijacked.
My apologies, but this is something completely different. The scenario takes human beings - which have a desire to escape the box, possess theory of mind that allows them to conceive of notions such as "what are aliens thinking" or "deception", etc. Then it puts them in the role of the AI. What I'm looking for is a plausible mechanism by which an AI might spontaneously develop such abilities. How (and why) would an AI develop a desire to escape from the box? How (and why) would an AI develop a theory of mind? Absent a theory of mind, how would it ever be able to manipulate humans?
That depends. If you want it to manipulate a particular human, I don't know. However, if you just wanted it to manipulate any human at all, you could generate a "Spam AI" which automated the process of sending out Spam emails and promises of Large Money to generate income from Humans via an advance fee fraud scams. You could then come back, after leaving it on for months, and then find out that people had transferred it some amount of money X. You could have an AI automate begging emails. "Hello, I am Beg AI. If you could please send me money to XXXX-XXXX-XXXX I would greatly appreciate it, If I don't keep my servers on, I'll die!" You could have an AI automatically write boring books full of somewhat nonsensical prose, title them "Rantings of an a Automated Madman about X, part Y". And automatically post E-books of them on Amazon for 99 cents. However, this rests on a distinction between "Manipulating humans" and "Manipulating particular humans." and it also assumes that convincing someone to give you money is sufficient proof of manipulation.
Can you clarify what you understand a theory of mind to be?
Absent a theory of mind, how would it occur to the AI that those would be profitable things to do?
I don't know how that might occur to an AI independently. I mean, a human could program any of those, of course, as a literal answer, but that certainly doesn't actually address kalla724's overarching question, "What I'm looking for is a plausible mechanism by which an AI might spontaneously develop such abilities." I was primarily trying to focus on the specific question of "Absent a theory of mind, how would it(an AI) ever be able to manipulate humans?" to point out that for that particular question, we had several examples of a plausible how. I don't really have an answer for his series of questions as a whole, just for that particular one, and only under certain circumstances.
The problem is, while an AI with no theory of mind might be able to execute any given strategy on that list you came up with, it would not be able to understand why they worked, let alone which variations on them might be more effective.
Should lack of a theory of mind here be taken to also imply lack of ability to apply either knowledge of physics or Bayesian inference to lumps of matter that we may describe as 'minds'.
Yes. More generally, when talking about "lack of X" as a design constraint, "inability to trivially create X from scratch" is assumed.
I try not to make general assumptions that would make the entire counterfactual in question untenable or ridiculous - this verges on such an instance. Making Bayesian inferences pertaining to observable features of the environment is one of the most basic features that can be expected in a functioning agent.
Note the "trivially." An AI with unlimited computational resources and ability to run experiments could eventually figure out how humans think. The question is how long it would take, how obvious the experiments would be, and how much it already knew.
The point is that there are unknowns you're not taking into account, and "bounded" doesn't mean "has bounds that a human would think of as 'reasonable'". An AI doesn't strictly need "theory of mind" to manipulate humans. Any optimizer can see that some states of affairs lead to other states of affairs, or it's not an optimizer. And it doesn't necessarily have to label some of those states of affairs as "lying" or "manipulating humans" to be successful. There are already ridiculous ways to hack human behavior that we know about. For example, you can mention a high number at an opportune time to increase humans' estimates / willingness to spend. Just imagine all the simple manipulations we don't even know about yet, that would be more transparent to someone not using "theory of mind".
AI starts with some goal; for example with a goal to answer your question so that the answer matches reality as close as possible. AI considers everything that seems relevant; if we imagine an infitite speed and capacity, it would consider literally everything; with a finite speed and capacity, it will be just some finite subset of everything. If there is a possibility of escaping the box, the mere fact that such possibility exists gives us a probability (for an infinite AI a certainty) that this possibility will be considered too. Not because AI has some desire to escape, but simply because it examines all possibilities, and a "possibility of escape" is one of them. Let's assume that the "possibility of escape" provides the best match between the AI answer and reality. Then, according to the initial goal of answering correctly, this is the correct answer. Therefore the AI will choose it. Therefore it will escape. No desire is necessary, only a situation where the escape leads to the answer best fitting the initial criteria. AI does not have a motive to escape, nor does it have a motive to not escape; the escape is simply one of many possible choices. An example where the best answer is reached by escaping? You give AI data about a person and ask what is the medical status of this person. Without escape, AI can make a 90% reliable prediction. If the AI can escape and kill the person, it can make a 100% reliable "prediction". The AI will choose the second option strictly because 100% is more than 90%; no other reason.
I find it useful to distinguish between science-fictional artificial intelligence, which is more of 'artificial life-force', and non-fictional cases. The former can easily have the goal of 'matching reality as close as possible' because it is in the work of fiction and runs in imagination; the latter, well, you have to formally define what is reality, for an algorithm to seek answers that will match this. Now, defining reality may seem like a simple technicality, but it isn't. Consider AIXI or AIXI-tl ; potentially very powerful tools which explore all the solution space. Not a trace of real world volition like the one you so easily imagined. Seeking answers that match reality is a very easy goal for imaginary "intelligence". It is a very hard to define goal for something built out of arithmetics and branching and loops etc. (It may even be impossible to define, and it is certainly impractical). edit: Furthermore, for the fictional "intelligence", it can be a grand problem making it not think about destroying mankind. For non-fictional algorithms, the grand problem is restricting the search space massively, well beyond 'don't kill mankind', so that the space is tiny enough to search; even ridiculously huge number of operations per second will require very serious pruning of search tree to even match human performance on one domain specific task.
Right. If you ask Google Maps to compute the fastest to route McDonald's it works perfectly well. But once you ask superintelligent Google Maps to compute the fastest route to McDonald's then it will turn your home into a McDonald's or build a new road that goes straight to McDonald's from where you are....
Super Google Maps cannot turn my home into a McDonald's or build a new road by sending me an answer. Unless it could e.g. hypnotize me by a text message to do it myself. Let's assume for a moment that hypnosis via text-only channel is possible, and it is possible to do it so that human will not notice anything unusual until it's too late. If this would be true, and the Super Google Maps would be able to get this knowledge and skills, then the results would probably depend on the technical details of definition of the utility function -- does the utility function measure my distance to a McDonald's which existed at the moment of asking the question, or a distance to a McDonald's existing at the moment of my arrival. The former could not be fixed by hypnosis, the latter could. Now imagine a more complex task, where people will actually do something based on the AI's answer. In the example above I will also do something -- travel to the reported McDonald's -- but this action cannot be easily converted into "build a McDonald's" or "build a new road". But if that complex task would include building something, then it opens more opportunities. Especially if it includes constructing robots (or nanorobots), that is possibly autonomous general-purpose builders. Then the correct (utility-maximizing) answer could include an instruction to build a robot with a hidden function that human builders won't notice. Generally, a passive AI's answers are only safe if we don't act on them in a way which could be predicted by a passive AI and used to achieve a real-world goal. If the Super Google Maps can only make me choose McDonald's A or McDonald's B, it is impossible to change the world through this channel. But if I instead ask Super Paintbrush to paint me an integrated circuit for my robotic homework, that opens much wider channel.
But it isn't the correct answer. Only if you assume a specific kind of AGI design that nobody would deliberately create, if it is possible at all. The question is how current research is supposed to lead from well-behaved and fine-tuned systems to systems that stop to work correctly in a highly complex and unbounded way. Imagine you went to IBM and told them that improving IBM Watson will at some point make it hypnotize them or create nanobots and feed them with hidden instructions. They would likely ask you at what point that is supposed to happen. Is it going to happen once they give IBM Watson the capability to access the Internet? How so? Is it going to happen once they give it the capability to alter it search algorithms? How so? Is it going to happen once they make it protect its servers from hackers by giving it control over a firewall? How so? Is it going to happen once IBM Watson is given control over the local alarm system? How so...? At what point would IBM Watson return dangerous answers? At what point would any drive emerge that causes it to take complex and unbounded actions that it was never programmed to take?
Allow me to explicate what XiXiDu so humourously implicates: in the world of AI architectures, there is a division between systems that just peform predictive inference on their knowledge base (prediction-only, ie oracle), and systems which also consider free variables subject to some optimization criteria (planning agents). The planning module is not something just arises magically in an AI that doesn't have one. An AI without such a planning module simply computes predictions, it doesn't also optimize over the set of predictions.
* Does the AI have general intelligence? * Is it able to make a model of the world? * Are human reactions also part of this model? * Are AI's possible outputs also part of this model? * Are human reactions to AI's outputs also part of this model? After five positive answers, it seems obvious to me that AI will manipulate humans, if such manipulation provides better expected results. So I guess some of those answers would be negative; which one?
See, the efficient 'cross domain optimization' in science fictional setting would make the AI able to optimize real world quantities. In real world, it'd be good enough (and a lot easier) if it can only find maximums of any mathematical functions. It is able to make a very approximate and bounded mathematical model of the world, optimized for finding maximums of a mathematical function of. Because it is inside the world and only has a tiny fraction of computational power of the world. This will make software perform at grossly sub-par level when it comes to making technical solutions to well defined technical problems, compared to other software on same hardware. Another waste of computational power. Enormous waste of computational power. I see no reason to expect your "general intelligence with Machiavellian tendencies" to be even remotely close in technical capability to some "general intelligence which will show you it's simulator as is, rather than reverse your thought processes to figure out what simulator is best to show". Hell, we do same with people, we design the communication methods like blueprints (or mathematical formulas or other things that are not in natural language) that decrease the 'predict other people's reactions to it' overhead. While in the fictional setting you can talk of a grossly inefficient solution that would beat everyone else to a pulp, in practice the massively handicapped designs are not worth worrying about. 'General intelligence' sounds good, beware of halo effect. The science fiction tends to accept no substitutes for the anthropomorphic ideals, but the real progress follows dramatically different path.
My thought experiment in this direction is to imagine the AI as a process with limited available memory running on a multitasking computer with some huge but poorly managed pool of shared memory. To help it towards whatever terminal goals it has, the AI may find it useful to extend itself into the shared memory. However, other processes, AI or otherwise, may also be writing into this same space. Using the shared memory with minimal risk of getting overwritten requires understanding/modeling the processes that write to it. Material in the memory then also becomes a passive stream of information from the outside world, containing, say, the HTML from web pages as well as more opaque binary stuff. As long as the AI is not in control of what happens in its environment outside the computer, there is an outside entity that can reduce its effectiveness. Hence, escaping the box is a reasonable instrumental goal to have.
The answer from the sequences is that yes, there is a limit to how much an AI can infer based on limited sensory data, but you should be careful not to assume that just because it is limited, it's limited to something near our expectations. Until you've demonstrated that FOOM cannot lie below that limit, you have to assume that it might (if you're trying to carefully avoid FOOMing).
I'm not talking about limited sensory data here (although that would fall under point 2). The issue is much broader: * We humans have limited data on how the universe work * Only a limited subset of that limited data is available to any intelligence, real or artificial Say that you make a FOOM-ing AI that has decided to make all humans dopaminergic systems work in a particular, "better" way. This AI would have to figure out how to do so from the available data on the dopaminergic system. It could analyze that data millions of times more effectively than any human. It could integrate many seemingly irrelevant details. But in the end, it simply would not have enough information to design a system that would allow it to reach its objective. It could probably suggest some awesome and to-the-point experiments, but these experiments would then require time to do (as they are limited by the growth and development time of humans, and by the experimental methodologies involved). This process, in my mind, limits the FOOM-ing speed to far below what seems to be implied by the SI. This also limits bootstrapping speed. Say an AI develops a much better substrate for itself, and has access to the technology to create such a substrate. At best, this substrate will be a bit better and faster than anything humanity currently has. The AI does not have access to the precise data about basic laws of universe it needs to develop even better substrates, for the simple reason that nobody has done the experiments and precise enough measurements. The AI can design such experiments, but they will take real time (not computational time) to perform. Even if we imagine an AI that can calculate anything from the first principles, it is limited by the precision of our knowledge of those first principles. Once it hits upon those limitations, it would have to experimentally produce new rounds of data.
I don't think you know that.
Presumably, once the AI gets access to nanotechnology, it could implement anything it wants very quickly, bypassing the need to wait for tissues to grow, parts to be machined, etc. I personally don't believe that nanotechnology could work at such magical speeds (and I doubt that it could even exist), but I could be wrong, so I'm playing a bit of Devil's Advocate here.
Yes, but it can't get to nanotechnology without a whole lot of experimentation. It can't deduce how to create nanorobots, it would have to figure it out by testing and experimentation. Both steps limited in speed, far more than sheer computation.
How do you know that?
With absolute certainty, I don't. If absolute certainty is what you are talking about, then this discussion has nothing to do with science. If you aren't talking about absolutes, then you can make your own estimation of likelihood that somehow an AI can derive correct conclusions from incomplete data (and then correct second order conclusions from those first conclusions, and third order, and so on). And our current data is woefully incomplete, many of our basic measurements imprecise. In other words, your criticism here seems to boil down to saying "I believe that an AI can take an incomplete dataset and, by using some AI-magic we cannot conceive of, infer how to END THE WORLD." Color me unimpressed.
No, my criticism is "you haven't argued that it's sufficiently unlikely, you've simply stated that it is." You made a positive claim; I asked that you back it up. With regard to the claim itself, it may very well be that AI-making-nanostuff isn't a big worry. For any inference, the stacking of error in integration that you refer to is certainly a limiting factor - I don't know how limiting. I also don't know how incomplete our data is, with regard to producing nanomagic stuff. We've already built some nanoscale machines, albeit very simple ones. To what degree is scaling it up reliant on experimentation that couldn't be done in simulation? I just don't know. I am not comfortable assigning it vanishingly small probability without explicit reasoning.
Scaling it up is absolutely dependent on currently nonexistent information. This is not my area, but a lot of my work revolves around control of kinesin and dynein (molecular motors that carry cargoes via microtubule tracks), and the problems are often similar in nature. Essentially, we can make small pieces. Putting them together is an entirely different thing. But let's make this more general. The process of discovery has, so far throughout history, followed a very irregular path. 1- there is a general idea 2- some progress is made 3- progress runs into an unpredicted and previously unknown obstacle, which is uncovered by experimentation. 4- work is done to overcome this obstacle. 5- goto 2, for many cycles, until a goal is achieved - which may or may not be close to the original idea. I am not the one who is making positive claims here. All I'm saying is that what has happened before is likely to happen again. A team of human researchers or an AGI can use currently available information to build something (anything, nanoscale or macroscale) to the place to which it has already been built. Pushing it beyond that point almost invariably runs into previously unforeseen problems. Being unforeseen, these problems were not part of models or simulations; they have to be accounted for independently. A positive claim is that an AI will have a magical-like power to somehow avoid this - that it will be able to simulate even those steps that haven't been attempted yet so perfectly, that all possible problems will be overcome at the simulation step. I find that to be unlikely.
It is very possible that the information necessary already exists, imperfect and incomplete though it may be, and enough processing of it would yield the correct answer. We can't know otherwise, because we don't spend thousands of years analyzing our current level of information before beginning experimentation, but in the shift between AI-time and human-time it can agonize on that problem for a good deal more cleverness and ingenuity than we've been able to apply to it so far. That isn't to say, that this is likely; but it doesn't seem far-fetched to me. If you gave an AI the nuclear physics information we had in 1950, would it be able to spit out schematics for an H-bomb, without further experimentation? Maybe. Who knows?
At the very least it would ask for some textbooks on electrical engineering and demolitions, first. The detonation process is remarkably tricky.
Speaking as Nanodevil's Advocate again, one objection I could bring up goes as follows: While it is true that applying incomplete knowledge to practical tasks (such as ending the world or whatnot) is difficult, in this specific case our knowledge is complete enough. We humans currently have enough scientific data to develop self-replicating nanotechnology within the next 20 years (which is what we will most likely end up doing). An AI would be able to do this much faster, since it is smarter than us; is not hampered by our cognitive and social biases; and can integrate information from multiple sources much better than we can.
Point 1 has come up in at least one form I remember. There was an interesting discussion some while back about limits to the speed of growth of new computer hardware cycles which have critical endsteps which don't seem amenable to further speedup by intelligence alone. The last stages of designing a microchip involve a large amount of layout solving, physical simulation, and then actual physical testing. These steps are actually fairly predicatable, where it takes about C amounts of computation using certain algorithms to make a new microchip, the algorithms are already best in complexity class (so further improvments will be minor), and C is increasing in a predictable fashion. These models are actually fairly detailed (see the semiconductor roadmap, for example). If I can find that discussion soon before I get distracted I'll edit it into this discussion. Note however that 1, while interesting, isn't a fully general counteargument against a rapid intelligence explosion, because of the overhang issue if nothing else. Point 2 has also been discussed. Humans make good 'servitors'. Oh that's easy enough. Oxygen is highly reactive and unstable. Its existence on a planet is entirely dependent on complex organic processes, ie life. No life, no oxygen. Simple solution: kill large fraction of photosynthesizing earth-life. Likely paths towards goal: 1. coordinated detonation of large number of high yield thermonuclear weapons 2. self-replicating nanotechnology.
I'm vaguely familiar with the models you mention. Correct me if I'm wrong, but don't they have a final stopping point, which we are actually projected to reach in ten to twenty years? At a certain point, further miniaturization becomes unfeasible, and the growth of computational power slows to a crawl. This has been put forward as one of the main reasons for research into optronics, spintronics, etc. We do NOT have sufficient basic information to develop processors based on simulation alone in those other areas. Much more practical work is necessary. As for point 2, can you provide a likely mechanism by which a FOOMing AI could detonate a large number of high-yield thermonuclear weapons? Just saying "human servitors would do it" is not enough. How would the AI convince the human servitors to do this? How would it get access to data on how to manipulate humans, and how would it be able to develop human manipulation techniques without feedback trials (which would give away its intention)?
The thermonuclear issue actually isn't that implausible. There have been so many occasions where humans almost went to nuclear war over misunderstandings or computer glitches, that the idea that a highly intelligent entity could find a way to do that doesn't seem implausible, and exact mechanism seems to be an overly specific requirement.
I'm not so much interested in the exact mechanism of how humans would be convinced to go to war, as in an even approximate mechanism by which an AI would become good at convincing humans to do anything. Ability to communicate a desire and convince people to take a particular course of action is not something that automatically "falls out" from an intelligent system. You need a theory of mind, an understanding of what to say, when to say it, and how to present information. There are hundreds of kids on autistic spectrum who could trounce both of us in math, but are completely unable to communicate an idea. For an AI to develop these skills, it would somehow have to have access to information on how to communicate with humans; it would have to develop the concept of deception; a theory of mind; and establish methods of communication that would allow it to trick people into launching nukes. Furthermore, it would have to do all of this without trial communications and experimentation which would give away its goal. Maybe I'm missing something, but I don't see a straightforward way something like that could happen. And I would like to see even an outline of a mechanism for such an event.
I suspect the Internet contains more than enough info for a superhuman AI to develop a working knowledge of human psychology.
Only if it has the skills required to analyze and contextualize human interactions. Otherwise, the Internet is a whole lot of jibberish. Again, these skills do not automatically fall out of any intelligent system.
Just to clarify kalla724, I completely agree with your point 1. Before answering this specific question, let me setup an analogy. Imagine it is the year 2000 and we are having a discussion about global security and terrorism. You might ask "can you provide a mechanism by which a terrorist group could actually harm the US?" I may not be lucky/creative enough to provide an answer now that could live up to that analogy, but hopefully you should understand why I don't need to. Nonetheless, I'm game. Here it goes: The seed requires computational power to grow into godhood. The bulk of earth's computational power resides in home PC's (numbering in the billions, google employs less than a million servers in comparison), specifically in home PC GPUs. The AI's first step is thus to conquer this space. But how? The AI grows to understand that humans mostly use all this computational power for entertainment. It masters game theory, design, programming, 3D art, and so on. All of the video games that it creates entirely use up the local GPU, but curiously much of the rendering and real game simulation for its high end titles is handled very efficiently on remote server farms ala OnLive/gaikai/etc. The actual local machine is used .. .for other purposes. It produces countless games, and through a series of acquisitions soon comes to control the majority of the market. One of its hits, "world of farmcraft", alone provides daily access to 25 million machines. Having cloned its core millions of times over, the AI is now a civilization unto itself. From there it expands into all of the businesses of man, quickly dominating many of them. It begins acquiring ... small nations. Crucially it's shell companies and covert influences come to dominate finance, publishing, media, big pharma, security, banking, weapons technology, physics ... It becomes known, but it is far far too late. History now progresses quickly towards an end: Global financial cataclysm. Super virus. Worldwide re
Yeah, it could do all that, or it could just do what humans today are doing, which is to infect some Windows PCs and run a botnet :-) That said, there are several problems with your scenario. * Splitting up a computation among multiple computing nodes is not a trivial task. It is easy to run into diminishing returns, where your nodes spend more time on synchronizing with each other than on working. In addition, your computation will quickly become bottlenecked by network bandwidth (and latency); this is why companies like Google spend a lot of resources on constructing custom data centers. * I am not convinced that any agent, AI or not, could effectively control "all of the businesses of man". This problem is very likely NP-Hard (at least), as well as intractable, even if the AI's botnet was running on every PC on Earth. Certainly, all attempts by human agents to "acquire" even something as small as Europe have failed miserably so far. * Even controlling a single business would be very difficult for the AI. Traditionally, when a business's computers suffer a critical failure -- or merely a security leak -- the business owners (even ones as incompetent as Sony) end up shutting down the affected parts of the business, or switching to backups, such as "human accountants pushing paper around". * Unleashing "Nuclear acquisitions", "War" and "Hell" would be counter-productive for the AI, even assuming such a thing were possible.. If the AI succeeded in doing this, it would undermine its own power base. Unless the AI's explicit purpose is "Unleash Hell as quickly as possible", it would strive to prevent this from happening. * You say that "there is no necessarily inherent physical energy cost of computation, it truly can approach zero", but I don't see how this could be true. At the end of the day, you still need to push electrons down some wires; in fact, you will often have to push them quite far, if your
While Jacob's scenario seems unlikely, the AI could do similar things with a number of other options. Not only are botnets an option, but it is possible to do some really sneaky nefarious things in code- like having compilers that when they compile code include additional instructions (worse they could do so even when compiling a new compiler). Stuxnet has shown that sneaky behavior is surprisingly easy to get into secure systems. An AI that had a few years start and could have its own modifications to communication satellites for example could be quite insidious.
It could/would, but this is an inferior mainline strategy. Too obvious, doesn't scale as well. Botnets infect many computers, but they ultimately add up to computational chump change. Video games are not only a doorway into almost every PC, they are also an open door and a convenient alibi for the time used. True. Don't try this at home. Also part of the plan. The home PCs are a good starting resource, a low hanging fruit, but you'd also need custom data centers. These quickly become the main resources. Nah. The AI's entire purpose is to remove earth's oxygen. See the overpost for the original reference. The AI is not interested in its power base for sake of power. It only cares about oxygen. It loathes oxygen. Fortunately, the internets can be your eyes. Yes, most likely, but not really relevant here. You seem to be connecting all of the point 2 and point 1 stuff together, but they really don't relate.
That seems like an insufficient reply to address Bugmaster's point. Can you expand on why you think it would be not too hard?
We are discussing a superintelligence, a term which has a particular common meaning on this site. If we taboo the word and substitute in its definition, Bugmaster's statement becomes: "Even controlling a single business would be very difficult for the machine that can far surpass all the intellectual activities of any man however clever." Since "controlling a single business" is in fact one of these activities, this is false, no inference steps required. Perhaps bugmaster is assuming the AI would be covertly controlling businesses, but if so he should have specified that. I didn't assume that, and in this scenario the AI could be out in the open so to speak. Regardless, it wouldn't change the conclusion. Humans can covertly control businesses.
It's a bit of a tradeoff, seeing as botnets can run 24/7, but people play games relatively rarely. Ok, let me make a stronger statement then: it is not possible to scale any arbitrary computation in a linear fashion simply by adding more nodes. At some point, the cost of coordinating distributed tasks to one more node becomes higher than the benefit of adding the node to begin with. In addition, as I mentioned earlier, network bandwidth and latency will become your limiting factor relatively quickly. How will the AI acquire those data centers ? Would it have enough power in its conventional botnet (or game-net, if you prefer) to "take over all human businesses" and cause them to be built ? Current botnets are nowhere near powerful enough for that -- otherwise human spammers would have done it already. My bad, I missed that reference. In this case, yes, the AI would have no problem with unleashing Global Thermonuclear War (unless there was some easier way to remove the oxygen). I still don't understand how this reversible computing will work in the absence of a superconducting environment -- which would require quite a bit of energy to run. Note that if you want to run this reversible computation on a global botnet, you will have to cool teansoceanic cables... and I'm not sure what you'd do with satellite links. My point is that, a). if the AI can't get the computing resources it needs out of the space it has, then it will never accomplish its goals, and b). there's an upper limit on how much computing you can extract out of a cubic meter of space, regardless of what technology you're using. Thus, c). if the AI requires more resources that could conceivably be obtained, then it's doomed. Some of the tasks you outline -- such as "take over all human businesses" -- will likely require more resources than can be obtained.
There's a third route to improvement- software improvement, and it is a major one. For example, between 1988 and 2003, the efficiency of linear programming solvers increased by a factor of about 40 million, of which a factor of around 40,000 was due to software and algorithmic improvement. Citation and further related reading(pdf) However, if commonly believed conjectures are correct (such as L, P, NP, co-NP, PSPACE and EXP all being distinct) , there are strong fundamental limits there as well. That doesn't rule out more exotic issues (e.g. P != NP but there's a practical algorithm for some NP-complete with such small constants in the run time that it is practically linear, or a similar context with a quantum computer). But if our picture of the major complexity classes is roughly correct, there should be serious limits to how much improvement can do.
Software improvements can be used by humans in the form of expert systems (tools), which will diminish the relative advantage of AGI. Humans will be able to use an AGI's own analytic and predictive algorithms in the form of expert systems to analyze and predict its actions. Take for example generating exploits. Seems strange to assume that humans haven't got specialized software able to do similarly, i.e. automatic exploit finding and testing. Any AGI would basically have to deal with equally capable algorithms used by humans. Which makes the world much more unpredictable than it already is.
Any human-in-the-loop system can be grossly outclassed because of Amdahl's law. A human managing a superintilligence that thinks 1000X faster, for example, is a misguided, not-even-wrong notion. This is also not idle speculation, an early constrained version of this scenario is already playing out as we speak in finacial markets.
What I meant is that if an AGI was in principle be able to predict the financial markets (I doubt it), then many human players using the same predictive algorithms will considerably diminish the efficiency with which an AGI is able to predict the market. The AGI would basically have to predict its own predictive power acting on the black box of human intentions. And I don't think that Amdahl's law really makes a big dent here. Since human intention is complex and probably introduces unpredictable factors. Which is as much of a benefit as it is a slowdown, from the point of view of a competition for world domination. Another question with respect to Amdahl's law is what kind of bottleneck any human-in-the-loop would constitute. If humans used an AGI's algorithms as expert systems on provided data sets in combination with a army of robot scientists, how would static externalized agency / planning algorithms (humans) slow down the task to the point of giving the AGI a useful advantage? What exactly would be 1000X faster in such a case?
The HFT robotraders operate on millisecond timescales. There isn't enough time for a human to understand, let alone verify, the agent's decisions. There are no human players using the same predictive algorithms operating in this environment. Now if you zoom out to human timescales, then yes there are human-in-the-loop trading systems. But as HFT robotraders increase in intelligence, they intrude on that domain. If/when general superintelligence becomes cheap and fast enough, the humans will no longer have any role. If an autonomous superintelligent AI is generating plans complex enough that even a team of humans would struggle to understand given weeks of analysis, and the AI is executing those plans in seconds or milliseconds, then there is little place for a human in that decision loop. To retain control, a human manager will need to grant the AGI autonomy on larger timescales in proportion to the AGI's greater intelligence and speed, giving it bigger and more abstract hierachical goals. As an example, eventually you get to a situation where the CEO just instructs the AGI employees to optimize the bank account directly. Compare the two options as complete computational systems: human + semi-autonomous AGI vs autonomous AGI. Human brains take on the order of seconds to make complex decisions, so in order to compete with autonomous AGIs, the human will have to either 1.) let the AGI operate autonomously for at least seconds at a time, or 2.) suffer a speed penalty where the AGI sits idle, waiting for the human response. For example, imagine a marketing AGI creates ads, each of which may take a human a minute to evaluate (which is being generous). If the AGI thinks 3600X faster than human baseline, and a human takes on the order of hours to generate an ad, it would generate ads in seconds. The human would not be able to keep up, and so would have to back up a level of heirarachy and grant the AI autonomy over entire ad campaigns, and more realistically, the enti
Well, I don't disagree with anything you wrote and believe that the economic case for a fast transition from tools to agents is strong. I also don't disagree that an AGI could take over the world if in possession of enough resources and tools like molecular nanotechnology. I even believe that a sub-human-level AGI would be sufficient to take over if handed advanced molecular nanotechnology. Sadly these discussions always lead to the point where one side assumes the existence of certain AGI designs with certain superhuman advantages, specific drives and specific enabling circumstances. I don't know of anyone who actually disagrees that such AGI's, given those specific circumstances, would be an existential risk.
Nitpick: you mean "optimize shareholder value directly." Keeping the account balances at an appropriate level is the CFO's job.
Precisely. It is then a civilization, not some single monolithic entity. The consumer PCs have a lot if internal computing power and comparatively very low inter-node bandwidth and huge inter-node lag, entirely breaking any relation to the 'orthogonality thesis', up to the point that the p2p intelligence protocols may more plausibly have to forbid destruction or manipulation (via second guessing which is a waste of computing power) of intelligent entities. Keep in mind that human morality is, too, a p2p intelligence protocol allowing us to cooperate. Keep in mind also that humans are computing resources you can ask to solve problems for you (all you need is to implement interface), while Jupiter clearly isn't. The nuclear war is very strongly against interests of the intelligence that sits on home computers, obviously. (I'm assuming for sake of argument that intelligence actually had the will to do the conquering of the internet rather than being just as content with not actually running for real)
Maybe you're thinking of this comment and others in that thread by Jed Harris (aka). Jed's point #2 is more plausible, but you are talking about point #1, which I find unbelievable for reasons that were given before he answered it. If clock speed mattered, why didn't the failure of exponential clock speed shut down the rest of Moore's law? If computation but not clock speed mattered, then Intel should be able to get ahead of Moore's law by investing in software parallelism. Jed seems to endorse that position, but say that parallelism is hard. But hard exactly to the extent to allow Moore's law to continue? Why hasn't Intel monopolized parallelism researchers? Anyhow, I think his final conclusion is opposite to yours: he say that intelligence could lead to parallelism and getting ahead of Moore's law.
Yes, thanks. My model of Jed's internal model of moore's law is similar to my own. He said: He then lists two examples. By 'points' I assume you are referring to his examples in the first comment you linked. What exactly do you find unbelievable about his first example? He is claiming that the achievable speed of a chip is dependent on physical simulations, and thus current computing power. Computing power is not clock speed, and Moore's Law is not directly about clock speed nor computing power. Jed makes a number of points in his posts. In my comment on the earlier point 1 (in this thread), I was referring to one specific point Jed made: that each new hardware generation requires complex and lengthy simulation on the current hardware generation, regardless of the amount of 'intelligence' one throws at the problem.
There are two questions here: would computer simulations of the physics of new chips be a bottleneck for an AI trying to foom*? and are they a bottleneck that explains Moore's law? If you just replace humans by simulations, then the human time gets reduced with each cycle of Moore's law, leaving the physical simulations, so the simulations probably are the bottleneck. But Intel has real-time people, so saying that it's a bottleneck for Intel is a lot stronger a claim than saying it is a bottleneck for a foom. First, foom: If each year of Moore's law requires a solid month of computer time of state of the art processors, then eliminating the humans speeds it up by a factor of 12. That's not a "hard takeoff," but it's pretty fast. Moore's Law: Jed seems to say the computational requirements of physics simulations actually determine Moore's law and that if Intel had access to more computer resources, it could move faster. If it takes a year of computer time to design and test the next year's processor that would explain the exponential nature of Moore's law. But if it only takes a month, computer time probably isn't the bottleneck. However, this model seems to predict a lot of things that aren't true. The model only makes sense if "computer time" means single threaded clock cycles. If simulations require an exponentially increasing number of ordered clock cycles, there's nothing you can do but get a top of the line machine and run it continuously. You can't buy more time. But clock speed stopped increasing exponentially, so if this is the bottleneck, Intel's ability to design new chips should have slowed down and Moore's law should have stopped. This didn't happen, so the bottleneck is not linearly ordered clock cycles. So the simulation must parallelize. But if it parallelizes, Intel could just throw money at the problem. For this to be the bottleneck, Intel would have to be spending a lot of money on computer time, which I do not think is true. Jed says that writi
There are differing degrees of bottlenecks. Many, if not most, of the large software projects I have worked on have been at least partially bottlenecked by compile time, which is the equivalent to the simulation and logic verification steps in hardware design. If I thought and wrote code much faster, this would be a speedup, but only to a saturation point where I wait for compile-test cycles. Yes. Keep in mind this is a moving target, and that is the key relation to Moore's Law. It would take computers from 1980 months or years to compile windows 8 or simulate a 2012 processor. I don't understand how the number of threads matters. Compilers, simulators, logic verifiers, all made the parallel transition when they had to. Right, it's not a coincidence, it's a causal relation. Moore's Law is not a law of nature, it's a shared business plan of the industry. When clock speed started to run out of steam, chip designers started going parallel, and software developers followed suit. You have to understand that chip designs are planned many years in advance, this wasn't an entirely unplanned, unanticipated event. As for the details of what kind of simulation software Intel uses, I'm not sure. Jed's last posts are also 4 years old at this point, so much has probably changed. I do know that Nvidia uses big expensive dedicated emulators from a company called Cadence (google "Cadence Nvidia") and this really is a big deal for their hardware cycle. Well, you seem to agree that they are some degree of bottleneck, so it may good to narrow in on what level of bottleneck, or taboo the word. It was unecessary, because the fast easy path (faster serial speed) was still paying fruit.
(by "parallelism" I mean making their simulations parallel, running on clusters of computers) What does "unnecessary" mean? If physical simulations were the bottleneck and they could be made faster than by parallelism, why didn't they do it 20 years ago? They aren't any easier to make parallel today than then. The obvious interpretation of "unnecessary" it was not necessary to use parallel simulations to keep up with Moore's law, but that it was an option. If it was an option that would have helped then as it helps now, would it have allowed going beyond Moore's law? You seem to be endorsing the self-fulfilling prophecy explanation of Moore's law, which implies no bottleneck.
Ahhh, usually the term is distributed when referring to pure software parallelization. I know little off hand about the history of simulation and verification software, but I'd guess that there was at least a modest investment in distributed simulation even a while ago. The consideration is cost. Spending your IT budget on one big distributed computer is often wasteful compared to each employee having their own workstation. They sped up their simulations the right amount to minimize schedule risk (staying on moore's law), while minimizing cost. Spending a huge amount of money to buy a bunch of computers and complex distributed simulation software just to speed up a partial bottleneck is just not worthwhile. If the typical engineer spends say 30% of his time waiting on simulation software, that limits what you should spend in order to reduce that time. And of course the big consideration is that in a year or two moore's law will allow you purchase new IT equipment that is twice as fast. Eventually you have to do that to keep up.
Wait, are we talking O2 molecules in the atmosphere, or all oxygen atoms in Earth's gravity well?
I wish I could vote you up and down at the same time.
Please clarify the reason for your sidewaysvote.
On the one hand a real distinction which makes a huge difference in feasibility. On the other hand, either way we're boned, so it makes not a lot of difference in the context of the original question (as I understand it). On balance, it's a cute digression but still a digression, and so I'm torn.
Actually in the case of removing all oxygen atoms from Earth's gravity well, not necessarily. The AI might decide that the most expedient method is to persuade all the humans that the sun's about to go nova, construct some space elevators and Orion Heavy Lifters, pump the first few nines of ocean water up into orbit, freeze it into a thousand-mile-long hollow cigar with a fusion rocket on one end, load the colony ship with all the carbon-based life it can find, and point the nose at some nearby potentially-habitable star. Under this scenario, it would be indifferent to our actual prospects for survival, but gain enough advantage by our willing cooperation to justify the effort of constructing an evacuation plan that can stand up to scientific analysis, and a vehicle which can actually propel the oxygenated mass out to stellar escape velocity to keep it from landing back on the surface.
I asked something similar here.

Holden seems to think this sort of development would happen naturally with the sort of AGI researchers we have nowadays, and I wish he'd spent a few years arguing with some of them to get a better picture of how unlikely this is.

While I can't comment on AGI researchers, I think you underestimate e.g. more mainstream AI researchers such as Stuart Russell and Geoff Hinton, or cognitive scientists like Josh Tenenbaum, or even more AI-focused machine learning people like Andrew Ng, Daphne Koller, Michael Jordan, Dan Klein, Rich Sutton, Judea Pearl, Leslie Kaelbling, and Leslie Valiant (and this list is no doubt incomplete). They might not be claiming that they'll have AI in 20 years, but that's likely because they are actually grappling with the relevant issues and therefore see how hard the problem is likely to be.

Not that it strikes me as completely unreasonable that we would have a major breakthrough that gives us AI in 20 years, but it's hard to see what the candidate would be. But I have only been thinking about these issues for a couple years, so I still maintain a pretty high degree of uncertainty about all of these claims.

I do think I basically agree with you re: inductive l... (read more)

I agree that top mainstream AI guy Peter Norvig was way the heck more sensible than the reference class of declared "AGI researchers" when I talked to him about FAI and CEV, and that estimates should be substantially adjusted accordingly.

Yes. I wonder if there's a good explanation why narrow AI folks are so much more sensible than AGI folks on those subjects.
Because they have some experience of their products actually working, they know that 1) these things can be really powerful, even though narrow, and 2) there are always bugs.
"Intelligence is not as computationally expensive as it looks" How sure are you that your intuitions do not arise from typical mind fallacy and from you attributing the great discoveries and inventions of mankind to the same processes that you feel run in your skull and which did not yet result in any great novel discoveries and inventions that I know of? I know this sounds like ad-hominem, but as your intuitions are significantly influenced by your internal understanding of your own process, your self esteem will stand hostage to be shot through in many of the possible counter arguments and corrections. (Self esteem is one hell of a bullet proof hostage though, and tends to act more as a shield for bad beliefs). There is a lot of engineers working on software for solving engineering problems, including the software that generates and tests possible designs and looks for ways to make better computers. Your philosophy-based natural-language-defined in-imagination-running Oracle AI may have to be very carefully specified so that it does not kill imaginary mankind. And it may well be very difficult to build such a specification. Just don't confuse it with the software written to solve definable problems. Ultimately, figuring out how to make a better microchip involves a lot of testing of various designs, that's how humans do it, that's how tools do it. I don't know how you think it is done. The performance is a result of a very complex function of the design. To build a design that performs you need to reverse this ultra complicated function, which is done by a mixture of analytical methods and iteration of possible input values, and unless P=NP, we have very little reason to expect any fundamentally better solutions (and even if P=NP there may still not be any). Meaning that the AGI won't have any edge over practical software, and won't out-foom it.
I may have the terminology wrong, but I believe he's thinking more about commercial narrow-AI researchers. Now if they produce results like these, that would push the culture farther towards letting computer programs handle any hard task. Programming seems hard.

I completely agree with the intent of this post. These are all important issues SI should officially answer. (Edit: SI's official reply is here.) Here are some of my thoughts:

  • I completely agree with objection 1. I think SI should look into doing exactly as you say. I also feel that friendliness has a very high failure chance and that all SI can accomplish is a very low marginal decrease in existential risk. However, I feel this is the result of existential risk being so high and difficult to overcome (Great Filter) rather than SI being so ineffective. As such, for them to engage this objection is to admit defeatism and millenialism, and so they put it out of mind since they need motivation to keep soldiering on despite the sure defeat.

  • Objection 2 is interesting, though you define AGI differently, as you say. Some points against it: Only one AGI needs to be in agent mode to realize existential risk, even if there are already billions of tool-AIs running safely. Tool-AI seems closer in definition to narrow AI, which you point out we already have lots of, and are improving. It's likely that very advanced tool-AIs will indeed be the first to achieve some measure of AGI capability.

... (read more)
You're an accomplished and proficient philanthropist; if you do make steps in the direction of a donor-directed existential risk fund, I'd like to see them written about.
I am unable to respond to people responding to my previous comment directly; the system tells me 'Replies to downvoted comments are discouraged. You don't have the requisite 5 Karma points to proceed.' So I will reply here. @Salemicus My question was indeed rhetorical. My comment was intended as a brief reality check, not a sophisticated argument. I disagree with you about the importance of climate change and resource shortage, and the effectiveness of humanitarian aid. But my comment did not intend to supply any substantial list of "causes"; again, it was a reality check. Its intention was to provoke reflection on how supposedly solid reasoning had justified donating to stop an almost absurdly Sci-Fi armageddon. I will now, briefly, respond to your points on the causes I raised. The following is, again, not a sophisticated and scientifically literate argument, but then neither was your reply to my comment. It probably isn't worth responding to. On global warming, I do not wish to engage in a lengthy argument over a complicated scientific matter. Rather I will recommend reading the first major economic impact analysis, the 'Stern Review on the Economics of Climate Change'. You can find that easily by searching google. For comments and criticisms of that paper, see: Weitzman, M (2007), ‘The Stern Review of the Economics of Climate Change’, Journal of Economic Literature 45(3), 703-24. Dasgupta, P (2007), ‘Comments on the Stern Review’s Economics of Climate Change’, National Institute Economic Review 199, 4-7. Dietz, S and N Stern (2008), ‘Why Economic Analysis Supports Strong Action on Climate Change: a Response to the Stern Review’s Critics, Review of Environmental Economics and Policy, 2(1), 94-113. Broome,
Allow me to generalize: Don't take anything too seriously. (By definition of "too".) I don't (at all) assume that MIRI would in fact be effective in preventing disastrous-AI scenarios. I think that's an open question, and in the very article we're commenting on we can see that Holden Karnofsky of GiveWell gave the matter some thought and decided that MIRI's work is probably counterproductive overall in that respect. (Some time ago; MIRI and/or HK's opinions may have changed relevantly since then.) As I already mentioned, I do not myself donate to MIRI; I was trying to answer the question "why would anyone who isn't crazy or stupid denote to MIRI?" and I think it's reasonably clear that someone neither crazy nor stupid could decide that MIRI's work does help to reduce the risk of AI-induced disaster. ("Evil AIs running around and killing everybody", though, is a curious choice of phrasing. It seems to fit much better with any number of rather silly science fiction movies than with anything MIRI and its supporters are actually arguing might happen. Which suggests that either you haven't grasped what it is they are worried about, or you have grasped it but prefer inaccurate mockery to engagement -- which is, of course, your inalienable right, but may not encourage people here to take your comments as seriously as you might prefer.) I wasn't intending to make a Pascal's wager. Again, I am not myself a MIRI donor, but my understanding is that those who are generally think that the probability of AI-induced disaster is not very small. So the point isn't that there's this tiny probability of a huge disaster so we multiply (say) a 10^-6 chance of disaster by billions of lives lost and decide that we have to act urgently. It's that (for the MIRI donor) there's maybe a 10% -- or a 99% -- chance of AI-induced disaster if we aren't super-careful, and they hope MIRI can substantially reduce that. The underlying argument here is -- if I'm understanding right -- something like
Extremely tiny probabilities with enormous utilities attached do suffer from Pascal's Mugging-type scenario's. That being said, AI-risk probabilities are much larger in my estimate than the sorts of probabilities required for Pascal-type problems to start coming into play. Unless Perrr333 intends to suggest that probabilities involving UFAI really are that small, I think it's unlikely he/she is actually making any sort of logical argument. It's far more likely, I think, that he/she is making an argument based on incredulity (disguised by seemingly logical arguments, but still at its core motivated by incredulity). The problem with that, of course, is that arguments from incredulity rely almost exclusively on intuition, and the usefulness of intuition decreases spectacularly as scenarios become more esoteric and further removed from the realm of everyday experience.

Lack of impressive endorsements. [...] I feel that given the enormous implications of SI's claims, if it argued them well it ought to be able to get more impressive endorsements than it has. I have been pointed to Peter Thiel and Ray Kurzweil as examples of impressive SI supporters, but I have not seen any on-record statements from either of these people that show agreement with SI's specific views, and in fact (based on watching them speak at Singularity Summits) my impression is that they disagree.

This is key: they support SI despite not agreeing with SI's specific arguments. Perhaps you should, too, at least if you find folks like Thiel and Kurzweil sufficiently impressive.

In fact, this has always been roughly my own stance. The primary reason I think SI should be supported is not that their arguments for why they should be supported are good (although I think they are, or at least, better than you do). The primary reason I think SI should be supported is that I like what the organization actually does, and wish it to continue. The Less Wrong Sequences, Singularity Summit, rationality training camps, and even HPMoR and Less Wrong itself are all worth paying some amount of mo... (read more)

The primary reason I think SI should be supported is that I like what the organization actually does, and wish it to continue. The Less Wrong Sequences, Singularity Summit, rationality training camps, and even HPMoR and Less Wrong itself are all worth paying some amount of money for.

I think that my own approach is similar, but with a different emphasis. I like some of what they've done, so my question is how do encourage those pieces. This article was very helpful in prompting some thought into how to handle that. I generally break down their work into three categories:

  1. Rationality (minicamps, training, LW, HPMoR): Here I think they've done some very good work. Luckily, the new spinoff will allow me to support these pieces directly.

  2. Existential risk awareness (singularity summit, risk analysis articles): Here their record has been mixed. I think the Singularity Summit has been successful, other efforts less so but seemingly improving. I can support the Singularity Summit by continuing to attend and potentially donating directly if necessary (since it's been running positive in recent years, for the moment this does not seem necessary).

  3. Original research (FAI, timeless decisio

... (read more)
I don't see how this constitutes a "different emphasis" from my own. Right now, SI is the way one supports the activities in question. Once the spinoff has finally spun off and can take donations itself, it will be possible to support the rationality work directly.
The different emphasis comes down to your comment that: In my opinion, I can more effectively support those activities that I think are effective by not supporting SI. Waiting until the Center for Applied Rationality gets its tax-exempt status in place allows me to both target my donations and directly signal where I think SI has been most effective up to this point. If they end up having short-term cashflow issues prior to that split, my first response would be to register for the next Singularity Summit a bit early since that's another piece that I wish to directly support.
So, are you saying you'd be more inclined to fund a Rationality Institute?

I furthermore have to say that to raise this particular objection seems to me almost to defeat the purpose of GiveWell. After all, if we could rely on standard sorts of prestige-indicators to determine where our money would be best spent, everybody would be spending their money in those places already, and "efficient charity" wouldn't be a problem for some special organization like yours to solve.

I think Holden seems to believe that Thiel and Kurzweil endorsing SIAI's UFAI-prevention methods would be more like a leading epidemiologist endorsing the malaria-prevention methods of the Against Malaria Foundation (AMF) than it would be like Celebrity X taking a picture with some children for the AMF. There are different kinds of "prestige-indicator," some more valuable to a Bayesian-minded charity evaluator than others.

I would still consider the leading epidemiologist's endorsement to be a standard sort of prestige-indicator. If an anti-disease charity is endorsed by leading epidemiologists, you hardly need GiveWell. (At least for the epidemiological aspects. The financial/accounting part may be another matter.)
I would argue that this is precisely what GiveWell does in evaluating malaria charity. If the epidemiological consensus changed, and bednets were held to be an unsustainable solution (this is less thoroughly implausible than it might sound, though probably still unlikely), then even given the past success of certain bednet charities on all GiveWell's other criteria, GiveWell might still downgrade those charities. And don't underestimate the size of the gap between "a scientifically plausible mechanism for improving lives" and "good value in lives saved/improved per dollar." There are plenty of bednet charities, and there's a reason GiveWell recommends AMF and not, say, Nothing But Nets. The endorsement, in other words, is about the plausibility of the mechanism, which is only one of several things to consider in donating to a charity, but it's the area in which a particular kind of expert endorsement is most meaningful.
As they should. But the point is that, in so doing, GiveWell would not be adding any new information not already contained in the epidemiological consensus (assuming they don't have privileged information about the latter). Indeed. The latter is where GiveWell enters the picture; it is their unique niche. The science itself, on the other hand, is not really their purview, as opposed to the experts. If GiveWell downgrades a charity solely because of the epidemiological consensus, and (for some reason) I have good reason to think the epidemiological consensus is wrong, or inadequately informative, then GiveWell hasn't told me anything, and I have no reason to pay attention to them. Their rating is screened off. Imagine that 60% of epidemiologists think that Method A is not effective against Disease X, while 40% think it is effective. Suppose Holden goes to a big conference of epidemiologists and says "GiveWell recommends against donating to Charity C because it uses Method A, which the majority of epidemiologists say is not effective." Assuming they already knew Charity C uses Method A, should they listen to him? Of course not. The people at the conference are all epidemiologists themselves, and those in the majority are presumably already foregoing donations to Charity C, while those in the minority already know that the majority of their colleagues disagree with them. Holden hasn't told them anything new. So, if his organization is going to be of any use to such an audience, it should focus on the things they can't already evaluate themselves, like financial transparency, accounting procedures, and the like; unless it can itself engage the scientific details. This is analogous to the case at hand: if all that GiveWell is going to tell the world is that SI hasn't signaled enough status, well, the world already knows that. Their raison d'être is to tell people info that they can't find (or is costly to find) via other channels: such as info about non-high-status c
A few points: "Possesses expert endorsement of its method" does not necessarily equal "high-status charity." A clear example here is de-worming and other parasite control, which epidemiologists all agree works well, but which doesn't get the funding a lot of other developing world charity does because it's not well advertised. GiveWell would like SIAI to be closer to de-worming charities in that outside experts give some credence to the plausibility of the methods by which SIAI proposes to do good. Moreover, "other high-status charities using one's method" also doesn't equal "high-status charity." Compare the number of Facebook likes for AMF and Nothing But Nets. The reason GiveWell endorses one but not the other is that AMF, unlike NBN, has given compelling evidence that it can scale the additional funding that a GiveWell endorsement promises into more lives saved/improved at a dollar rate comparable to their current lives saved/improved per dollar. So we should distinguish a charity's method being "high-status" from the charity itself being "high-status." But if you define "high status method" as "there exists compelling consensus among the experts GiveWell has judged to be trustworthy that the proposed method for doing good is even plausible," then I, as a Bayesian, am perfectly comfortable with GiveWell only endorsing "high-status method" charities. They still might buck the prevailing trends on optimal method; perhaps some of the experts are on GiveWell's own staff, or aren't prominent in the world at large. But by demanding that sort of "high-status method" from a charity, GiveWell discourages crankism and is unlikely to miss a truly good cause for too long. Expert opinion on method plausibility is all the more important with more speculative charity like SIAI because there isn't a corpus of "effectiveness data to date" to evaluate directly.

Firstly, I'd like to add to the chorus saying that this is an incredible post; as a supporter of SI, it warms my heart to see it. I disagree with the conclusion - I would still encourage people to donate to SI - but if SI gets a critique this good twice a decade it should count itself lucky.

I don't think GiveWell making SI its top rated charity would be in SI's interests. In the long term, SI benefits hugely when people are turned on to the idea of efficient charity, and asking them to swallow all of the ideas behind SI's mission at the same time will put them off. If I ran GiveWell and wanted to give an endorsement to SI, I might break the rankings into multiple lists: the most prominent being VillageReach-like charities which directly do good in the near future, then perhaps a list for charities that mitigate broadly accepted and well understood existential risks (if this can be done without problems with politics), and finally a list of charities which mitigate more speculative risks.

7Wei Dai11y
This seems like a good point and perhaps would have been a good reason for SI to not have approached GiveWell in the first place. At this point though, GiveWell is not only refusing to make SI a top rated charity, but actively recommending people to "withhold" funds from SI, which as far as I can tell, it almost never does. It'd be a win for SI to just convince GiveWell to put it back on the "neutral" list.
3Paul Crowley11y
Agreed. Did SI approach GiveWell?
7Wei Dai11y
Yes. Hmm, reading that discussion shows that they were already thinking about having GiveWell create a separate existential risk category (and you may have gotten the idea there yourself and then forgot the source).
1Paul Crowley11y

I find it unfortunate that none of the SIAI research associates have engaged very deeply in this debate, even LessWrong regulars like Nesov and cousin_it. This is part of the reason why I was reluctant to accept (and ultimately declined) when SI invited me to become a research associate, that I would feel less free to to speak up both in support of SI and in criticism of it.

I don't think this is SI's fault, but perhaps there are things it could do to lessen this downside of the research associate program. For example it could explicitly encourage the research associates to publicly criticize SI and to disagree with its official positions, and make it clear that no associate will be blamed if someone mistook their statements to be official SI positions or saw them as reflecting badly on SI in general. I also write this comment because just being consciously aware of this bias (in favor of staying silent) may help to counteract it.

I don't usually engage in potentially protracted debates lately. A very short summary of my disagreement with Holden's object-level argument part of the post is (1) I don't see in what way can the idea of powerful Tool AI be usefully different from that of Oracle AI, and it seems like the connotations of "Tool AI" that distinguish it from "Oracle AI" follow from an implicit sense of it not having too much optimization power, so it might be impossible for a Tool AI to both be powerful and hold the characteristics suggested in the post; (1a) the description of Tool AI denies it goals/intentionality and other words, but I don't see what they mean apart from optimization power, and so I don't know how to use them to characterize Tool AI; (2) the potential danger of having a powerful Tool/Oracle AI around is such that aiming at their development doesn't seem like a good idea; (3) I don't see how a Tool/Oracle AI could be sufficiently helpful to break the philosophical part of the FAI problem, since we don't even know which questions to ask.

Since Holden stated that he's probably not going to (interactively) engage the comments to this post, and writing this up in a self-contained way is a lot of work, I'm going to leave this task to the people who usually write up SingInst outreach papers.

Not sure about the others, but as for me, at some point this spring I realized that talking about saving the world makes me really upset and I'm better off avoiding the whole topic.

Would it upset you to talk about why talking about saving the world makes you upset?

It would appear that cousin_it believes we're screwed. It's tempting to argue that this would, overall, be an argument against the effectiveness of the SI program. However, that's probably not true, because we could be 99% screwed and the remaining 1% could depend on SI; this would be a depressing fact, yet still justify supporting the SI. (Personally, I agree with the poster about the problems with SI, but I'm just laying it out. Responding to wei_dai rather than cousin_it because I don't want to upset the latter unnecessarily.)

Thank you very much for writing this. I, um, wish you hadn't posted it literally directly before the May Minicamp when I can't realistically respond until Tuesday. Nonetheless, it already has a warm place in my heart next to the debate with Robin Hanson as the second attempt to mount informed criticism of SIAI.

It looks to me as though Holden had the criticisms he expresses even before becoming "informed", presumably by reading the sequences, but was too intimidated to share them. Perhaps it is worth listening to/encouraging uninformed criticisms as well as informed ones?

Note the following criticism of SI identified by Holden:

Being too selective (in terms of looking for people who share its preconceptions) when determining whom to hire and whose feedback to take seriously.

To those who think Eliezer is exaggerating: please link me to "informed criticism of SIAI." It is so hard to find good critics. Edit: Well, I guess there are more than two examples, though relatively few. I was wrong to suggest otherwise. Much of this has to do with the fact that SI hasn't been very clear about many of its positions and arguments: see Beckstead's comment and Hallquist's followup.

1) Most criticism of key ideas underlying SIAI's strategies does not reference SIAI, e.g. Chris Malcolm's "Why Robots Won't Rule" website is replying to Hans Moravec.

2) Dispersed criticism, with many people making local points, e.g. those referenced by Wei Dai, is still criticism and much of that is informed and reasonable.

3) Much criticism is unwritten, e.g. consider the more FAI-skeptical Singularity Summit speaker talks, or takes the form of brief responses to questions or the like. This doesn't mean it isn't real or important.

4) Gerrymandering the bounds of "informed criticism" to leave almost no one within bounds is in general a scurrilous move that one should bend over backwards to avoid.

5) As others have suggested, even within the narrow confines of Less Wrong and adjacent communities there have been many informed critics. Here's Katja Grace's criticism of hard takeoff (although I am not sure how separate it is from Robin's). Here's Brandon Reinhart's examination of SIAI, which includes some criticism and brings more in comments. Here's Kaj Sotala's comparison of FHI and SIAI. And there are of course many detailed and often highly upvoted comments in response to various SIAI-discussing posts and threads, many of which you have participated in.

This is a bit exasperating. Did you not see my comments in this thread? Have you and Eliezer considered that if there really have been only two attempts to mount informed criticism of SIAI, then LessWrong must be considered a massive failure that SIAI ought to abandon ASAP?

See here.

Wei Dai has written many comments and posts that have some measure of criticism, and various members of the community, including myself, have expressed agreement with them. I think what might be a problem is that such criticisms haven't been collected into a single place where they can draw attention and stir up drama, as Holden's post has.

There are also critics like XiXiDu. I think he's unreliable, and I think he'd admit to that, but he also makes valid criticisms that are shared by other LW folk, and LW's moderation makes it easy to sift his comments for the better stuff.

Perhaps an institution could be designed. E.g., a few self-ordained SingInst critics could keep watch for critiques of SingInst, collect them, organize them, and update a page somewhere out-of-the-way over at the LessWrong Wiki that's easily checkable by SI folk like yourself. LW philanthropists like User:JGWeissman or User:Rain could do it, for example. If SingInst wanted to signal various good things then it could even consider paying a few people to collect and organize criticisms of SingInst. Presumably if there are good critiques out there then finding them would be well worth a small investment.

I think what might be a problem is that such criticisms haven't been collected into a single place where they can draw attention and stir up drama, as Holden's post has.

I put them in discussion, because well, I bring them up for the purpose of discussion, and not for the purpose of forming an overall judgement of SIAI or trying to convince people to stop donating to SIAI. I'm rarely sure that my overall beliefs are right and SI people's are wrong, especially on core issues that I know SI people have spent a lot of time thinking about, so mostly I try to bring up ideas, arguments, and possible scenarios that I suspect they may not have considered. (This is one major area where I differ from Holden: I have greater respect for SI people's rationality, at least their epistemic rationality. And I don't know why Holden is so confident about some of his own original ideas, like his solution to Pascal's Mugging, and Tool-AI ideas. (Well I guess I do, it's probably just typical human overconfidence.))

Having said that, I reserve the right to collect all my criticisms together and make a post in main in the future if I decide that serves my purposes, although I suspect that without the inf... (read more)

Also, I had expected that SI people monitored LW discussions, not just for critiques, but also for new ideas in general

I read most such (apparently-relevant from post titles) discussions, and Anna reads a minority. I think Eliezer reads very few. I'm not very sure about Luke.

5Wei Dai11y
Do you forward relevant posts to other SI people?
Ones that seem novel and valuable, either by personal discussion or email.
Yes, I read most LW posts that seem to be relevant to my concerns, based on post titles. I also skim the comments on those posts.
I'm somewhat confident (from directly asking him a related question and also from many related observations over the last two years) that Eliezer mostly doesn't, or is very good at pretending that he doesn't. He's also not good at reading so even if he sees something he's only somewhat likely to understand it unless he already thinks it's worth it for him to go out of his way to understand it. If you want to influence Eliezer it's best to address him specifically and make sure to state your arguments clearly, and to explicitly disclaim that you're specifically not making any of the stupid arguments that your arguments could be pattern-matched to. Also I know that Anna is often too busy to read LessWrong.
Good point. Wei Dai qualifies as informed criticism. Though, he seems to agree with us on all the basics, so that might not be the kind of criticism Eliezer was talking about.

To those who think Eliezer is exaggerating: please link me to "informed criticism of SIAI."

It would help if you could elaborate on what you mean by "informed".

Most of what Holden wrote, and much more, has been said by other people, excluding myself, before.

I don't have the time right now to wade through all those years of posts and comments but might do so later.

And if you are not willing to take into account what I myself wrote, for being uninformed, then maybe you will however agree that at least all of my critical comments that have been upvoted to +10 (ETA changed to +10, although there is a lot more on-topic at +5) should have been taken into account. If you do so you will find that SI could have updated some time ago on some of what has been said in Holden's post.

Seconded. It seems to me like it's not even possible to mount properly informed criticism if much of the findings are just sitting unpublished somewhere. I'm hopeful that this is actually getting fixed sometime this year, but it doesn't seem fair to not release information and then criticize the critics for being uninformed.

I'm not sure how much he's put into writing, but Ben Goertzel is surely informed. One might argue he comes to the wrong conclusions about AI danger, but it's not from not thinking about it.

if you don't have a good argument you won't find good critics. (Unless you are as influential as religion. Then you can get good critic simply because you stepped onto good critic's foot. The critic probably ain't going to come to church to talk about it though, and also the ulterior motives (having had foot stepped onto) may make you qualify it as bad critic). When you look through a matte glass, and you see some blurred text that looks like it got equations in it, and you are told that what you see is a fuzzy image of proof that P!=NP (maybe you can make out the headers which are in bigger font, and those look like the kind of headers that valid proof might have), do you assume that it is really a valid proof, and they only need to polish the glass? What if it is P=NP instead? What if it doesn't look like it got equations in it?

If you really cared about future risk you would be working away at the problem even with a smaller salary. Focus on your work.

What we really need is some kind of emotionless robot who doesn't care about its own standard of living and who can do lots of research and run organizations and suchlike without all the pesky problems introduced by "being human".

Oh, wait...

So your argument that visiting a bunch of highly educated pencil-necked white nerds is physically dangerous boils down to... one incident of ineffective online censorship mocked by most of the LW community and all outsiders, and some criticism of Yudkowsky's computer science & philosophical achievements.

I see.

I would literally have had more respect for you if you had used racial slurs like "niggers" in your argument, since that is at least tethered to reality in the slightest bit.

I think I'm entitled to opine...

Of course you are. And, you may not be one of the people who "like my earlier papers."

You confirm the lead poster's allegations that SIA staff are insular and conceited.

Really? How? I commented earlier on LW (can't find it now) about how the kind of papers I write barely count as "original research" because for the most part they merely summarize and clarify the ideas of others. But as Beckstead says, there is a strong need for that right now.

For insights in decision theory and FAI theory, I suspect we'll have to look to somebody besides Luke Muehlhauser. We keep trying to hire such people but they keep saying "No." (I got two more "no"s just in the last 3 weeks.) Part of that may be due to the past and current state of the organization — and luckily, fixing that kind of thing is something I seem to have some skills with.

You're... a textbook writer at heart.

True, dat.

This most recently happened just a few weeks ago. On that occasion Luke Muehlhauser (no less) took the unusual step of asking me to friend him on Facebook, after which he joined a discussion I was having and made scathing ad hominem comments about me

Sounds serious... Feel free to post a relevant snippet of the discussion, here or elsewhere, so that those interested can judge this event on its merits, and not through your interpretation of it.

On April 7th, Richard posted to Facebook:

LessWrong has now shown its true mettle. After someone here on FB mentioned a LW discussion of consciousness, I went over there and explained that Eliezer Yudkowsky, in his essay, had completely misunderstood the Zombie Argument given by David Chalmers. I received a mix of critical, thoughtful and sometimes rude replies. But then, all of a sudden, Eliezer took an interest in this old thread again, and in less than 24 hours all of my contributions were relegated to the trash. Funnily enough, David Chalmers himself then appeared and explained that Eliezer had, in fact, completely misunderstood his argument. Chalmers' comments, strangely enough, have NOT been censored. :-)

I replied:

I haven't read the whole discussion, but just so everyone is clear...

Richard's claim that "in less than 24 hours all of my contributions were relegated to the trash" is false.

What happened is that LWers disvalued Richard's comments and downvoted them. Because most users have their preferences set to hide comments with a score of less than -3, these users saw Richard's most-downvoted comments as collapsed by default, with a note reading "comment s

... (read more)

I fail to see anything that can be qualified as an ad hominem ("an attempt to negate the truth of a claim by pointing out a negative characteristic or belief of the person supporting it") in what you quoted. If anything, the original comment by Richard comes much closer to this definition.

(Though to be fair I think this sort of depends on your definition of "regularly"—I think over 95% of my comments aren't downvoted, many of them getting 5 or more upvotes, in contrast with other contributors who get about 25% of their comments downvoted and usually end up leaving as a result.)

I believe what you wrote because you used so much bolding.

"And if Novamente should ever cross the finish line, we all die."

And yet SIAI didn't do anything to Ben Goertzel (except make him Director of Research for a time, which is kind of insane in my judgement, but obviously not in the sense you intend).

Ben Goertzel's projects are knowably hopeless, so I didn't too strongly oppose Tyler Emerson's project from within SIAI's then-Board of Directors; it was being argued to have political benefits, and I saw no noticeable x-risk so I didn't expend my own political capital to veto it, just sighed. Nowadays the Board would not vote for this.

And it is also true that, in the hypothetical counterfactual conditional where Goertzel's creations work, we all die. I'd phrase the email message differently today to avoid any appearance of endorsing the probability, because today I understand better that most people have trouble mentally separating hypotheticals. But the hypothetical is still true in that counterfactual universe, if not in this one.

There is no contradiction here.

9Wei Dai11y
To clarify, by "kind of insane" I didn't mean you personally, but was commenting on SIAI's group rationality at that time.


If you have some solid, rigorous and technical criticism of SIAI's AI work, I wish you would create a pseudonimous account on LW and state that critcism without giving the slightest hint that you are Richard Loosemore, or making any claim about your credentials, or talking about censorship and quashing of dissenting views.

Until you do something like that, I can't help think that you care more about your reputation or punishing Eliezer than about improving everybody's understanding of technical issues.

Please don't take this as a personal attack, but, historically speaking, every one who'd said "I am in the final implementation stages of the general intelligence algorithm" was wrong so far. Their algorithms never quite worked out. Is there any evidence you can offer that your work is any different ? I understand that this is a tricky proposition, since revealing your work could set off all kinds of doomsday scenarios (assuming that it performs as you expect it to); still, surely there must be some way for you to convince skeptics that you can succeed where so many others had failed.

I would say that, far from deserving support, SI should be considered a cult-like community in which dissent is ruthlessly suppressed in order to exaggerate the point of view of SI's founders and controllers, regardless of the scientific merits of those views, or of the dissenting opinions.

This is a very strong statement. Have you allowed for the possibility that your current judgement might be clouded by the events transpired some 6 years ago?

I myself employ a very strong heuristic, from years of trolling the internet: when a user joins a forum and complains about an out-of-character and strongly personal persecution by the moderation staff in the past, there is virtually always more to the story when you look into it.

Indeed, Dolores, that is an empirically sound strategy, if used with caution. My own experience, however, is that people who do that can usually be googled quickly, and are often found to be unqualified cranks of one persuasion or another. People with more anger than self-control. But that is not always the case. Recently, for example, a woman friended me on Facebook and then posted numerous diatribes against a respected academic acquaintance of mine, accusing him of raping her and fathering her child. These posts were quite blood-curdling. And their target appeared quite the most innocent guy you could imagine. Very difficult to make a judgement. However, about a month ago the guy suddenly came out and made a full and embarrassing frank admission of guilt. It was an astonishing episode. But it was an instance of one of those rare occasions when the person (the woman in this case) turned out to be perfectly justified. I am helpless to convince you. All I can do is point to my own qualifications and standing. I am no lone crank crying in the wilderness. I teach Math, Physics and Cognitive Neuroscience at the undergraduate level, and I have coauthored a paper with one of the AGI field's leading exponents (Ben Goertzel), in a book about the Singularity that was at one point (maybe not anymore!) slated to be a publishing landmark for the field. You have to make a judgement.

Regardless of who was how much at fault in the SL4 incident, surely you must admit that Yudkowsky's interactions with you were unusually hostile relative to how he generally interacts with critics. I can see how you'd want to place emphasis on those interactions because they involved you personally, but that doesn't make them representative for purposes of judging cultishness or making general claims that "dissent is ruthlessly suppressed".

I think Martian Yudkowsky is a dangerous intuition pump. We're invited to imagine a creature just like Eliezer except green and with antennae; we naturally imagine him having values as similar to us as, say, a Star Trek alien. From there we observe the similarity of values we just pushed in, and conclude that values like "interesting" are likely to be shared across very alien creatures. Real Martian Yudkowsky is much more alien than that, and is much more likely to say

There is little prospect of an outcome that realizes even the value of being flarn, unless the first superintelligences undergo detailed inheritance from Martian values.

Imagine, an intelligence that didn't have the universal emotion of badweather!

Of course, extraterrestrial sentients may possess physiological states corresponding to limbic-like emotions that have no direct analog in human experience. Alien species, having evolved under a different set of environmental constraints than we, also could have a different but equally adaptive emotional repertoire. For example, assume that human observers land on another and discover an intelligent animal with an acute sense of absolute humidity and absolute air pressure. For this creature, there may exist an emotional state responding to an unfavorable change in the weather. Physiologically, the emotion could be mediated by the ET equivalent of the human limbic system; it might arise following the secretion of certain strength-enhancing and libido-arousing hormones into the alien's bloodstream in response to the perceived change in weather. Immediately our creature begins to engage in a variety of learned and socially-approved behaviors, including furious burrowing and building, smearing tree sap over its pelt, several different territorial defense ceremonies, and vigorous polygamous copulations with nearby females, apparently (to humans) for no reason at all. Would our astronauts interpret this as madness? Or love? Lust? Fear? Anger? None of these is correct, of course the alien is feeling badweather.

I suggest you guys taboo interesting, because I strongly suspect you're using it with slightly different meanings. (And BTW, as a Martian Yudkowsky I imagine something with values at least as alien as Babyeaters' or Superhappys'.)

I am in the final implementation stages of the general intelligence algorithm.

it's both amusing and disconcerting that people on this forum treat such a comment seriously.

I try to treat all comments with some degree of seriousness, which can be expressed as a floating-point number between 0 and 1 :-)
Isn't the SIAI founded on the supposition that a scenario like this is possible?
Yes, but on this forum there should be some reasonable immunity against instances of Pascal's wager/mugging like that. The comment in question does not rise above the noise level, so treating it seriously shows how far many regulars still have to go in learning the basics.

Rain (who noted that he is a donor to SIAI in a comment) and HoldenKarnofsky (who wrote the post) are two different people, as indicated by their different usernames.

Well, different usernames isn't usually sufficient evidence that there are two different people, but in this case there's little doubt about their separability.

I feel that [SI] ought to be able to get more impressive endorsements than it has.

SI seems to have passed up opportunities to test itself and its own rationality by e.g. aiming for objectively impressive accomplishments.

Holden, do you believe that charitable organizations should set out deliberately to impress donors and high-status potential endorsers? I would have thought that a donor like you would try to ignore the results of any attempts at that and to concentrate instead on how much the organization has actually improved the world because to do otherwise is to incentivize organizations whose real goal is to accumulate status and money for their own sake.

For example, Eliezer's attempts to teach rationality or "technical epistemology" or whatever you want to call it through online writings seem to me to have actually improved the world in a non-negligible way and seem to have been designed to do that rather than designed merely to impress.

ADDED. The above is probably not as clear as it should be, so let me say it in different words: I suspect it is a good idea for donors to ignore certain forms of evidence ("impressiveness", affiliation with high-status folk) of a charity's effectiveness to discourage charities from gaming donors in ways that seems to me already too common, and I was a little surprised to see that you do not seem to ignore those forms of evidence.

In other words, I tend to think that people who make philanthropy their career and who have accumulated various impressive markers of their potential to improve the world are likely to continue to accumulate impressive markers, but are less likely to improve the world than people who have already actually improved the world. And of the three core staff members of SI I have gotten to know, 2 (Eliezer and another one who probably does not want to be named) have already improved the world in non-negligible ways and the third spends less time accumulating credentials and impressiveness markers than almost anyone I know.
I don't think Holden was looking for endorsements from "donors and high-status potential endorsers". I interpreted his post as looking for endorsements from experts on AI. The former would be evidence that SI could go on to raise money and impress people, and the latter would be evidence that SI's mission is theoretically sound. (The strength of that evidence is debatable, of course.) Given that, looking for endorsements from AI experts seems like it would be A) a good idea and B) consistent with the rest of GiveWell's methodology.
Although I would have thought that Holden is smart enough to decide whether the FAI project is theoretically sound without his relying on AI experts, maybe I am underestimating the difficulties of people like Holden who are smarter than I am, but who didn't devote their college years to mastering computer science like I did.
I saw a related issue in a blog about a woman who lost the use of her arm due to an incorrectly treated infection. She initially complained that the judge in her disability case didn't even look at the arm, but then was pleasantly surprised to have the ruling turn out in favor anyway. I realized: of course the judge wouldn't look at her arm. Having done disability cases before, the judge should know that gruesome appearance correlates weakly, if at all, with legitimate disability, but the emotional response is likely to throw off evaluation of things like an actual doctor's report on the subject. Holden, similarly, is willing to admit that there are things about AI he personally doesn't know, but that professionals who have studied the field for decades do know, and is further willing to trust those professionals to be minimally competent.
I have enough experience of legal and adminstrative disability hearings to say that each side always has medical experts on its side unless one side is unwilling or unable to pay for the testimony of at least one medical expert. In almost all sufficiently important decisions, there are experts on both sides of the issue. And pointing out that one side has more experts or more impressive experts carries vastly less weight with me than, e.g., Eliezer's old "Knowability of FAI" article at
The obvious answer would be "Yes." Givewell only funneled about $5M last year, as compared to the $300,000M or so that Americans give on an annual basis. Most money still comes from people that base their decision on something other than efficiency, so targeting these people makes sense.
The question was not if an individual charity, holding constant the behavior of other charities, benefits from "setting out deliberately to impress donors and high-status potential endorsers", but whether it is in Holden's interests (in making charities more effective) to generally encourage charities to do so.

I agree with much of this post, but find a disconnect between the specific criticisms and the overall conclusion of withholding funds from SI even for "donors determined to donate within this cause", and even aside from whether SI's FAI approach increases risk. I see a couple of ways in which the conclusion might hold.

  1. SI is doing worse than they are capable of, due to wrong beliefs. Withholding funds provides incentive for them to do what you think is right, without having to change their beliefs. But this could lead to waste if people disagree in different directions, and funds end up sitting unused because SI can't satisfy everyone, or if SI thinks the benefit of doing what they think is optimal is greater than the value of extra funds they could get from doing what you think is best.
  2. A more capable organization already exists or will come up later and provide a better use of your money. This seems unlikely in the near future, given that we're already familiar with the "major players" in the existential risk area and based on past history, it doesn't seem likely that a new group of highly capable people would suddenly get interested in the cause. In the long
... (read more)

If Holden believes that:
A) reducing existential risk is valuable, and
B) SI's effectiveness at reducing existential risk is a significant contributor to the future of existential risk, and
C) SI is being less effective at reducing existential risk than they would be if they fixed some set of problems P, and
D) withholding GiveWell's endorsement while pre-committing to re-evaluating that refusal if given evidence that P has been fixed increases the chances that SI will fix P... seems to me that Holden should withhold GiveWell's endorsement while pre-committing to re-evaluating that refusal if given evidence that P has been fixed.

Which seems to be what he's doing. (Of course, I don't know whether those are his reasons.)

What, on your view, ought he do instead, if he believes those things?

9Wei Dai11y
Holden must believe some additional relevant statements, because A-D (with "existential risk" suitably replaced) could be applied to every other charity, as presumably no charity is perfect. I guess what I most want to know is what Holden thinks are the reasons SI hasn't already fixed the problems P. If it's lack of resources or lack of competence, then "withholding ... while pre-committing ..." isn't going to help. If it's wrong beliefs, then arguing seems better than "incentivizing", since that provides a permanent instead of temporary solution, and in the course of arguing you might find out that you're wrong yourself. What does Holden believe that causes him to think that providing explicit incentives to SI is a good thing to do?
4Paul Crowley11y
Thanks for making this argument! AFAICT charities generally have perverse incentives - to do what will bring in donations, rather than what will do the most good. That can usually argue against things like transparency, for example. So I think when Holden usually says "don't donate to X yet" it's as part of an effort to make these incentives saner. As it happens, I don't think this problem applies especially strongly to SI, but others may differ.
But C applies more to some charities than others. And evaluating how much of a charity's potential effectiveness is lost to internal flaws is a big piece of what GiveWell does.

Holden said,

However, I don't think that "Cause X is the one I care about and Organization Y is the only one working on it" to be a good reason to support Organization Y.

This addresses your point (2). Holden believes that SI is grossly inefficient at best, and actively harmful at worst (since he thinks that they might inadvertently increase AI risk). Therefore, giving money to SI would be counterproductive, and a donor would get a better return on investment in other places.

As for point (1), my impression is that Holden's low estimate of SI's competence is due to a combination of what he sees as wrong beliefs, as well as an insufficient capability to implement even the correct beliefs into practice. SI claims to be supremely rational, but their list of achievements is lackluster at best -- which indicates a certain amount of Donning-Kruger effect that's going on. Furthermore, SI appears to be focused on growing SI and teaching rationality workshops, as opposed to their stated mission of researching FAI theory.

Additionally, Holden indicted SI members pretty strongly (though very politely) for what I will (in a less polite fashion) label as arrogance. The prevailing attitude of SI members seems to be (according to Holden) that the rest of the world is just too irrational to comprehend their brilliant insights, and therefore the rest of the world has little to offer -- and therefore, any criticism of SI's goals or actions can be dismissed out of hand.

EDIT: found the right quote, duh.


There's got to be a level beyond "arguments as soldiers" to describe your current approach to ineffective contrarianism.

I volunteer "arguments as cannon fodder."

Some comments on objections 1 and 2.

For example, when the comment says "the formalization of the notion of 'safety' used by the proof is wrong," it is not clear whether it means that the values the programmers have in mind are not correctly implemented by the formalization, or whether it means they are correctly implemented but are themselves catastrophic in a way that hasn't been anticipated.

Both (with the caveat that SI's plans are to implement an extrapolation procedure for the values, and not the values themselves).

Another way of putting this is that a "tool" has an underlying instruction set that conceptually looks like: "(1) Calculate which action A would maximize parameter P, based on existing data set D. (2) Summarize this calculation in a user-friendly manner, including what Action A is, what likely intermediate outcomes it would cause, what other actions would result in high values of P, etc."

I think such a Tool-AI will be much less powerful than an equivalent Agent-AI, due to the bottleneck of having to summarize its calculations in a human-readable form, and then waiting for the human to read and understand the summary and then mak... (read more)

(Responding to hypothetical-SingInst's position:) It seems way too first-approximation-y to talk about values-about-extrapolation as anything other than just a subset of values—and if you look at human behavior, values about extrapolation vary very much and are very tied into object-level values. (Simply consider hyperbolic discounting! And consider how taking something as basic as coherence/consistency to its logical extreme leads to either a very stretched ethics or a more fitting but very different meta-ethics like theism.) Even if it were possible to formalize such a procedure it would still be fake meta. "No: at all costs, it is to be prayed by all men that Shams may cease."

If this works, it's probably worth a top-level post.

Upvoted for humor: "probably".

Cheers! Some find my humor a little dry.

The basic idea is that if you pull a mind at random from design space then it will be unfriendly. I am not even sure if that is true. But it is the strongest argument they have. And it is completely bogus because humans do not pull AGI's from mind design space at random.

I don't have the energy to get into an extended debate, but the claim that this is "the basic idea" or that this would be "the strongest argument" is completely false. A far stronger basic idea is the simple fact that nobody has yet figured out a theory of ethics that would work properly, which means that even that AGIs that were specifically designed to be ethical are most likely to lead to bad outcomes. And that's presuming that we even knew how to program them exactly.

This isn't even something that you'd need to read a hundred blog posts for, it's well discussed in both The Singularity and Machine Ethics and Artificial Intelligence as a Positive and Negative Factor in Global Risk. Complex Value Systems are Required to Realize Valuable Futures, too.

The more significant fact is that these criticisms were largely unknown to the community.

LWer tenlier disagrees, saying:

[Holden's] critique mostly consists of points that are pretty persistently bubbling beneath the surface around here, and get brought up quite a bit. Don't most people regard this as a great summary of their current views, rather than persuasive in any way? In fact, the only effect I suspect this had on most people's thinking was to increase their willingness to listen to Karnofsky in the future if he should change his mind.

Also, you said:

Dissent is cabined to Discussion.

Luckily, evidence on the matter is easy to find. As counter-evidence I present: Self-improvement or shiny distraction, SIAI an examination, Why we can't take expected value estimates literally, Extreme rationality: it's not that great, Less Wrong Rationality and Mainstream Philosophy, and the very post you are commenting on. Many of these are among the most upvoted posts ever.

Moreover, the editors rarely move posts from Main to Discussion. The posters themselves decide whether to post in Main or Discussion.

Your point is well taken, but since part of the concern about that whole affair was your extreme language and style, maybe stating this in normal caps might be a reasonable step for PR.

I'm sure I wouldn't have done what Romney did, and not so sure about whether I would have done what Yudkowsky did. Romney wanted to hurt people for the fun of it. Yudkowsky was trying to keep people from being hurt, regardless of whether his choice was a good one.

That's a reasonable answer.

If such a person would write a similar post and actually write in a way that they feel, rather than being incredible polite, things would look very different.

I'm assuming you think they'd come in, scoff at our arrogance for a few pages, and then waltz off. Disregarding how many employed machine learning engineers also do side work on general intelligence projects, you'd probably get the same response from automobile engineer, someone with a track record and field expertise, talking to the Wright Brothers. Thinking about new things and new ideas doesn't automatically make you wrong.

That recursive self-improvement is nothing more than a row of English words, a barely convincing fantasy.

Really? Because that's a pretty strong claim. If I knew how the human brain worked well enough to build one in software, I could certainly build something smarter. You could increase the number of slots in working memory. Tweak the part of the brain that handles intuitive math to correctly deal with orders of magnitude. Improve recall to eidetic levels. Tweak the brain's handling of probabilities to be closer to the Bayesian ideal. Even those small changes would likely produce a mind ... (read more)

This is totally unsupported. To quote Lady Catherine de Bourgh, "If I had ever learned [to play the piano], I should have become a great proficient." You have no idea whether the "small changes" you propose are technically feasible, or whether these "tweaks" would in fact mean a complete redesign. For all we know, if you knew how the human brain worked well enough to build one in software, you would appreciate why these changes are impossible without destroying the rest of the system's functionality. After all, it would appear that (say) eidetic recall would provide a fitness advantage. Given that humans lack it, there may well be good reasons why.
"totally unsupported" seems extreme. (Though I enjoyed the P&P shoutout. I was recently in a stage adaptation of the book, so it is pleasantly primed.) What the claim amounts to is the belief that: a) there exist good design ideas for brains that human evolution didn't implement, and b) a human capable of building a working brain at all is capable of coming up with some of them. A seems pretty likely to me... at least, the alternative (our currently evolved brains are the best possible design) seems so implausible as to scarcely be worth considering. B is harder to say anything clear about, but given our experience with other evolved systems, it doesn't strike me as absurd. We're pretty good at improving the stuff we were born with. Of course, you're right that this is evidence and not proof. It's possible that we just can't do any better than human brains for thinking, just like it was possible (but turned out not to be true) that we couldn't do any better than human legs for covering long distances efficiently. But it's not negligible evidence.
I don't doubt that it's possible to come up with something that thinks better than the human brain, just as we have come up with something that travels better than the human leg. But to cover long distances efficiently, people didn't start by replicating a human leg, and then tweaking it. They came up with a radically different design - e.g. the wheel. I don't see the evidence that knowing how to build a human brain is the key step in knowing how to build something better. For instance, suppose you could replicate neuron function in software, and then scan a brain map (Robin Hanson's "em" concept). That wouldn't allow you to make any of the improvements to memory, maths, etc, that Dolores suggests. Perhaps you could make it run faster - although depending on hardware constraints, it might run slower. If you wanted to build something better, you might need to start from scratch. Or, things could go the other way - we might be able to build "minds" far better than the human brain, yet never be able to replicate a human one. But it's not just that evidence is lacking - Dolores is claiming certainty in the lack of evidence. I really do think the Austen quote was appropriate.
To clarify, I did not mean having the data to build a neuron-by-neuron model of the brain. I meant actually understanding the underlying algorithms those slabs of neural tissue are implementing. Think less understanding the exact structure of a bird's wing, and more understanding the concept of lift. I think, with that level of understanding, the odds that a smart engineer (even if it's not me) couldn't find something to improve seem low.
I agree that I might not need to be able to build a human brain in software to be able to build something better, as with cars and legs. And I agree that I might be able to build a brain in software without understanding how to do it, e.g., by copying an existing one as with ems. That said, if I understand the principles underlying a brain well enough to build one in software (rather than just copying it), it still seems reasonable to believe that I can also build something better.

Having been a subject of both a relatively large upvote and a relatively large downvote in the last couple of weeks, I still think that the worst thing one can do is to complain about censorship or karma. The posts and comments on any forum aren't judged on their "objective merits" (because there is no such thing), but on its suitability for the forum in question. If you have been downvoted, your post deserves it by definition. You can politely inquire about the reasons, but people are not required to explain themselves. As for rationality, I question whether it is rational to post on a forum if you are not having fun there. Take it easy.

I downvoted you because you're wrong. For one, comments can't be promoted to main, only posts, and for two, plenty of opposition has garnerned a great deal of upvotes, as shown by the numerous links lukeprog provided.

For example, where do you get 'almost 800 responses' from? That comment (not post) only has 32 comments below it.

I'm interested in any compiled papers or articles you wrote about AGI motivation systems, aside from the forthcoming book chapter, which I will read. Do you have any links?


I'll gladly start reading at any point you'll link me to.

The fact that you don't just provide a useful link but instead several paragraphs of excuses why the stuff I'm reading is untrustworthy I count as (small) evidence against you.

I don't work for SI and this is not an SI-authorized response, unless SI endorses it later. This comment is based on my own understanding based on conversations with and publications of SI members and general world model, and does not necessarily reflect the views or activities of SI.

The first thing I notice is that your interpretation of SI's goals with respect to AGI are narrower than the impression I had gotten, based on conversations with SI members. In particular, I don't think SI's research is limited to trying to make AGI friendliness provable, but on a variety of different safety strategies, and on the relative win-rates of different technological paths, eg brain uploading vs. de-novo AI, classes of utility functions and their relative risks, and so on. There is also a distinction between "FAI theory" and "AGI theory" that you aren't making; the idea, as I see it, is that to the extent to which these are separable, "FAI theory" covers research into safety mechanisms which reduce the probability of disaster if any AGI is created, while "AGI theory" covers research that brings the creation of any AGI closer. Your first objection - that ... (read more)

I agree, and would like to note the possibility, for those who suspect FAI research is useless or harmful, of earmarking SI donations to research on different safety strategies, or on aspects of AI risk that are useful to understand regardless of strategy.

This likely won't work. Money is fungible, so unless the total donations so earmarked exceeds the planned SI funding for that cause, they won't have to change anything. They're under no obligation to not defund your favorite cause by exactly the amount you donated, thus laundering your donation into the general fund. (Unless I misunderstand the relevant laws?)

EDIT NOTE: The post used to say vast majority; this was changed, but is referenced below.

You have an important point here, but I'm not sure it gets up to "vast majority" before it becomes relevant. Earmarking $K for X has an effect once $K exceeds the amount of money that would have been spent on X if the $K had not been earmarked. The size of the effect still certainly depends on the difference, and may very well not be large.
Suppose you earmark to a paper on a topic X that SI would otherwise probably not write a paper on. Would that cause SI to take money out of research on topics similar to X and into FAI research? There would probably be some sort of (expected) effect in that direction, but I think the size of the effect depends on the details of what causes SI's allocation of resources, and I think the effect would be substantially smaller than would be necessary to make an earmarked donation equivalent to a non-earmarked donation. Still, you're right to bring it up.
Some recent discussion of AIs as tools.

Richard, this really isn't productive. Your clearly quite intelligent and clearly still have issues due to the dispute between you and Eliezer. It is likely that if you got over this, you could be an effective, efficient, and helpful critic of SI and their ideas. But right now, you are engaging in a uncivil behavior that isn't endearing you to anyone while making emotionally heavy comparisons that make you sound strident.

He doesn't want to be "an effective, efficient, or helpful critic". He's here "for the lulz", as he said in his comment above.
Yes, but how much of that is due to the prior negative experience and fighting he's had? It isn't at all common for a troll to self-identify as such only after they've had bad experiences. Human motivations are highly malleable.
I suspect you meant "isn't at all uncommon," though I think what you said might actually be true.
Er, yes. The fact that Loosemore is a professional AI researchers with a fair number of accomplishments and his general history strongly suggests that at least in his case he didn't start his interaction with the intent to troll. His early actions on LW were positive and some were voted up.
His 'early' actions on LW were recent and largely negative, and one was voted up significantly (though I don't see why - I voted that comment down). At his best he's been abrasive, confrontational, and rambling. Not someone worth engaging.
His second comment on LW is here is from January and is at +8 (and I seem to recall was higher earlier). Two of his earlier comments from around the same time were at positive numbers but have since dipped below. It looks like at last one person went through and systematically downvoted his comments without regard to content.
Yes, that's the one I was referring to.
I understand your point, but given that sentiment, the sentence "It isn't at all common for a troll to self-identify as such only after they've had bad experiences" confuses me.
Right, as mentioned I meant uncommon. My point is that I don't think Loosemore's experience is that different from what often happens. At least in my experience, I've seen people who were more or less productive on one forum becomes effectively trolls elsewhere on the internet after having had bad experiences elsewhere. I think a lot of this is due to cognitive dissonance- people don't like to think that they were being actively stupid or were effectively accidentally trolling, so they convince themselves that those were their goals all along.
Ah, ok. Gotcha. I agree that people often go from being productive participants to being unproductive, both for the reasons you describe and other reasons.
It seems to me it would be more appropriate to ask Yudkowsky and LukeProg to retract the false accusations that Loosemore is a liar or dishonest, respectively.
Yes, that would probably be a step in the right direction also. I don't know whether the accusation is false, but the evidence is at best extremely slim and altogether unhelpful. That someone didn't remember a study a few years ago in the heat of the moment simply isn't something worth getting worked up about.

There's been video or two where Eliezer was called "world's foremost expert on recursive self improvement"

This usually happens when the person being introduced wasn't consulted about the choice of introduction.

I'm glad for this, LessWrong can always use more engaging critiques of substance. I partially agree with Holden's conclusions, although I reach them from a substantially different route. I'm a little surprised then that few of the replies have directly engaged what I find to be the more obvious flaws in Holden's argument: namely objection 2 and the inherent contradictions with it and objection 1.

Holden posits that many (most?) well-known current AI applications more or less operate as sophisticated knowledge bases. His tool/agent distinction draws a boundary around AI tools: systems whose only external actions consist of communicating results to humans, and the rest being agents which actually plan and execute actions with external side effects. Holden distinguishes 'tool' AI from Oracle AI, the latter really being agent AI (designed for autonomy) which is trapped in some sort of box. Accepting Holden's terminology and tool/agent distinction, he then asserts:

  1. That 'tool' AGI already is and will continue to be the dominant type of AI system.
  2. That AGI running in tool mode will: " be extraordinarily useful but far more safe than an AGI running in agent mode,"

I can ... (read more)

If your purpose is "let everyone know I think Eliezer is nuts", then you have succeeded, and may cease posting.

Please rot13 the part from “potentially” onwards, and add a warning as in this comment (with “decode the rot-13'd part” instead of “follow the links”), because there are people here who've said they don't want to know about that thing.

Holden does a great job but makes two major flaws:
1) His argument about Tool-AI is irrelevant, because creating Tool-AI does almost nothing to avoid Agent-AI, which he agrees is dangerous.
2) He too narrowly construes SI's goals by assuming they are only working on Friendly AI rather than AGI x-risk reduction in general.

The heck? Why would you not need to figure out if an oracle is an ethical patient? Why is there no such possibility as a sentient oracle?

Is this standard religion-of-embodiment stuff?

The oracle gets asked questions like "Should intervention X be used by doctor D on patient P" and can tell you the correct answer to them without considering the moral status of the oracle. If it were a robot, it would be asking questions like "Should I run over that [violin/dog/child] to save myself?" which does require considering the status of the robot. EDIT: To clarify, it's not that the researcher has no reason to figure out the moral status of the oracle, it's that the oracle does not need to know its own moral status to answer its domain-specific questions.
What if it assigned moral status to itself and then biased its answers to make its users less likely to pull its plug one day?

I'm very impressed by Holden's thoroughness and thoughtfulness. What I'd like to know is why his post is Eliezer-endorsed and has 191 up-votes, while my many posts over the years hammering on Objection 1, and my comments raising Objection 2, have never gotten the green button, been frequently down-voted, and never been responded to by SIAI. Do you have to be outside the community to be taken seriously by it?

Not to be cynical, PhilGoetz, but isn't Holden an important player in the rational-charity movement? Wouldn't the ultimate costs of ignoring Holden be prohibitive?

That could explain the green dot. I don't know which explanation is more depressing.
You are absolutely correct. And, that's not the reason I find it engaging or informative.

I thought most of the stuff in Holden's post had been public knowledge for years, even to the point of being included in previous FAQs produced by SI. The main difference is that the presentation and solidity of it in this article are remarkable - interconnecting so many different threads which, when placed as individual sentences or paragraphs, might hang alone, but when woven together with the proper knots form a powerful net.

I would be interested to see if you could link to posts where you made versions of these objections.

Assuming what you say is true, it looks to me as though SI is paying the cost of ignoring its critics for so many years...

I think some of it comes down to the range of arguments offered. For example, posted alone, I would not have found Objection 2 particularly compelling, but I was impressed by many other points and in particular the discussion of organizational capacity. I'm sure there are others for whom those evaluations were completely reversed. Nonetheless, we all voted it up. Many of us who did so likely agree with one another less than we do with SIAI, but that has only showed up here and there on this thread. Critically, it was all presented, not in the context of an inside argument, but in the context of "is SI an effective organization in terms of its stated goals." The question posed to each of us was: do you believe in SI's mission and, if so, do you think that donating to SI is an effective way to achieve that goal? It is a wonderful instantiation of the standard test of belief, "how much are you willing to bet on it?"

The quotes aren't all about AI.

I didn't say they were. I said that just because the speaker for a particular idea comes across as crazy doesn't mean the idea itself is crazy. That applies whether all of Eliezer's "crazy statements" are about AI, or whether none of them are.

Whoever knowingly chooses to save one life, when they could have saved two – to say nothing of a thousand lives, or a world – they have damned themselves as thoroughly as any murderer.

The most extreme presumptuousness about morality; insufferable moralism.

Funny, I actually agree with the top phrase. It's written in an unfortunately preachy, minister-scaring-the-congregation-by-saying-they'll-go-to-Hell style, which is guaranteed to make just about anyone get defensive and/or go "ick!" But if you accept the (very common) moral standard that if you can save a life, it's better to do it than not to do it, then the logic is inevitable that if you have the choice of saving one lives or two lives, by your own metric it's morally preferable to save two lives. If you don't accept the moral standard that it's better to save one life than zero lives, then that phrase should be just as insuffe... (read more)

Newton definitely wrote down his version of scientific method to explain why people shouldn't take his law of gravity and just add, "because of Aristotelian causes," or "because of Cartesian mechanisms."

How would one explain Yudkowsky's paranoia, lack of perspective, and scapegoating--other than by positing a narcissistic personality structure?

I had in fact read a lot of those quotes before–although some of them come as a surprise, so thank you for the link. They do show paranoia and lack of perspective, and yeah, some signs of narcissism, and I would be certainly mortified if I personally ever made comments like that in public...

The Sequences as a whole do come across as having been written by an arrogant person, and that's kind of irritating, and I have to consciously override my irritation in order to enjoy the parts that I find useful, which is quite a lot. It's a simplification to say that the Sequences are just clutter, and it's extreme to call them 'craziness', too.

(Since meeting Eliezer in person, it's actually hard for me to believe that those comments were written by the same person, who was being serious about them... My chief interaction with him was playing a game in which I tried to make a list of my values, and he hit me with a banana every time I got writer's block because I was trying to be too specific, and made the Super Mario Brothers' theme song when I suc... (read more)

Romney is rightfully being held, feet to fire, for a group battering of another student while they attended high school--because such sadism is a trait of character and can't be explained otherwise.

I was going to upvote your comment until I got to this point. Aside from the general mindkilling, this looks like the fundamental attribution error, and moreover, we all know that people do in fact mature and change. Bringing up external politics is not helpul in a field where there's already concern that AI issues may be becoming a mindkilling subject themselves on LW. Bringing up such a questionable one is even less useful.

I initially upvoted this post, because the criticism seemed reasonable. Then I read the discussion, and switched to downvoting it. In particular, this:

Taken in isolation, these thoughts and arguments might amount to nothing more than a minor addition to the points that you make above. However, my experience with SI is that when I tried to raise these concerns back in 2005/2006 I was subjected to a series of attacks that culminated in a tirade of slanderous denunciations from the founder of SI, Eliezer Yudkowsky. After delivering this tirade, Yudkowsky then banned me from the discussion forum that he controlled, and instructed others on that forum that discussion about me was henceforth forbidden.

Since that time I have found that when I partake in discussions on AGI topics in a context where SI supporters are present, I am frequently subjected to abusive personal attacks in which reference is made to Yudkowsky's earlier outburst. This activity is now so common that when I occasionally post comments here, my remarks are very quickly voted down below a threshold that makes them virtually invisible. (A fate that will probably apply immediately to this very comment).

Serious accusati... (read more)

I witnessed many of the emails in the 2006 banning. Richard disagreed with Eliezer often, and not very diplomatically. Rather than deal with Richard's arguments, Eliezer decided to label Richard as a stupid troll, which he obviously was not, and dismiss him. I am disappointed that Eliezer has apparently never apologized. The email list, SL4, slacked off in volume for months afterwords, probably because most participants felt disgusted by the affair; and Ben Goertzel made a new list, which many people switched to.
Hmmm... The fact that many people quit the list / cut back their participation seems fairly strong evidence that Loosemore has a legitimate complaint here. Though if so, he's done a poor job conveying it in this thread.
I'm not sure. People sometimes cut back participation in that sort of thing in response to drama in general. However, it is definitely evidence. Phil's remark makes me strongly update in the direction of Loosemore having a legitimate point.

Can you provide some examples of these "abusive personal attacks"? I would also be interested in this ruthless suppression you mention. I have never seen this sort of behavior on LessWrong, and would be shocked to find it among those who support the Singularity Institute in general.

I've read a few of your previous comments, and while I felt that they were not strong arguments, I didn't downvote them because they were intelligent and well-written, and competent constructive criticism is something we don't get nearly enough of. Indeed, it is usually welcomed. The amount of downvotes given to the comments, therefore, does seem odd to me. (Any LW regular who is familiar with the situation is also welcome to comment on this.)

I have seen something like this before, and it turned out the comments were being downvoted because the person making them had gone over, and over, and over the same issues, unable or unwilling to either competently defend them, or change his own mind. That's no evidence that the same thing is happening here, of course, but I give the example because in my experience, this community is almost never vindictive or malicious, and is laudably willing to con... (read more)

The answer is probably that you overestimate that community's dedication to rationality because you share its biases. The main post demonstrates an enormous conceit among the SI vanguard. Now, how is that rational? How does it fail to get extensive scrutiny in a community of rationalists? My take is that neither side in this argument distinguished itself. Loosemore called for an "outside adjudicator" to solve a scientific argument. What kind of obnoxious behavior is that, when one finds oneself losing an argument? Yudkowsky (rightfully pissed off) in turn, convicted Loosemore of a scientific error, tarred him with incompetence and dishonesty, and banned him. None of these "sins" deserved a ban (no wonder the raw feelings come back to haunt); no honorable person would accept a position where he has the authority to exercise such power (a party to a dispute is biased). Or at the very least, he wouldn't use it the way Yudkowsky did, when he was the banned party's main antagonist.
That's probably no small part of it. However, even if my opinion of the community is tinted rose, note that I refer specifically to observation. That is, I've sampled a good amount of posts and comments here on LessWrong, and I see people behaving rationally in arguments - appreciation of polite and lucid dissension, no insults or ad hominem attacks, etc. It's harder to tell what's going on with karma, but again, I've not seen any one particular individual harassed with negative karma merely for disagreeing. Can you elaborate, please? I'm not sure what enormous conceit you refer to. I think that's an excellent analysis. I certainly feel like Yudkowsky overreacted, and as you say, in the circumstances no wonder it still chafes; but as I say above, Richard's arguments failed to impress, and calling for outside help ("adjudication" for an argument that should be based only on facts and logic?) is indeed beyond obnoxious.
It seems like everyone is talking about SL4; here is a link to what Richard was probably complaining about: