Another take with more links: AI: A Reason to Worry, A Reason to Donate

I have made a $10,000 donation to the Machine Intelligence Research Institute (MIRI) as part of their winter fundraiser. This is the best organization I know of to donate money to, by a wide margin, and I encourage others to also donate. This belief comes from a combination of public information, private information and my own analysis. This post will share some of my private information and analysis to help others make the best decisions.

I consider AI Safety the most important, urgent and under-funded cause. If your private information and analysis says  another AI Safety organization is a better place to give, give to there. I believe many AI Safety organizations do good work. If you have the talent and skills, and can get involved directly, or get others who have the talent and skills involved directly, that’s even better than donating money.

If you do not know about AI Safety and unfriendly artificial general intelligence, I encourage you to read about them. If you’re up for a book, read this one.

If you decide you care about other causes more, donate to those causes instead, in the way your analysis says is most effective. Think for yourself, do and share your own analysis, and contribute as directly as possible.


I am very confident in the following facts about artificial general intelligence. None of my conclusions in this section require my private information.

Humanity is likely to develop artificial general intelligence (AGI) vastly smarter and more powerful than humans. We are unlikely to know that far in advance when this is about to happen. There is wide disagreement and uncertainty on how long this will take, but certainly there is substantial chance this happens within our lifetimes.

Whatever your previous beliefs, the events of the last year, including AlphaGo Zero, should convince you that AGI is more likely to happen, and more likely to happen soon.

If we do build an AGI, its actions will determine what is done with the universe.

If the first such AGI we build turns out to be an unfriendly AI that is optimizing for something other than humans and human values, all value in the universe will be destroyed. We are made of atoms that could be used for something else.

If the first such AGI we build turns out to care about humans and human values, the universe will be a place of value many orders of magnitude greater than it is now.

Almost all AGIs that could be constructed care about something other than humans and human values, and would create a universe with zero value. Mindspace is deep and wide, and almost all of it does not care about us.

The default outcome, if we do not work hard and carefully now on AGI safety, is for AGI to wipe out all value in the universe.

AI Safety is a hard problem on many levels. Solving it is much harder than it looks even with the best of intentions, and incentives are likely to conspire to give those involved very bad personal incentives. Without security mindset, value alignment and tons of advance work, chances of success are very low.

We are currently spending ludicrously little time, attention and money on this problem.

For space reasons I am not further justifying these claims here. Jacob’s post has more links.


In these next two sections I will share what I can of my own private information and analysis.

I know many principles at MIRI, including senior research fellow Eliezer Yudkowsky and executive director Nate Sores. They are brilliant, and are as dedicated as one can be to the cause of AI Safety and ensuring a good future for the universe. I trust them, based on personal experience with them, to do what they believe is best to achieve these goals.

I believe they have already done much exceptional and valuable work. I have also read many of their recent papers and found them excellent.

MIRI has been invaluable in laying the groundwork for this field. This is true both on the level of the field existing at all, and also on the level of thinking in ways that might actually work.

Even today, most who talk about AI Safety suggest strategies that have essentially no chance of success, but at least they are talking about it at all. MIRI is a large part of why they’re talking at all. I believe that something as simple as these DeepMind AI Safety test environments is good, helping researchers understand there is a problem much more deadly than algorithmic discrimination. The risk is that researchers will realize a problem exists, then think ‘I’ve solved these problems, so I’ve done the AI Safety thing’ when we need the actual thing the most.

From the beginning, MIRI understood the AI Safety problem is hard, requiring difficult high-precision thinking, and long term development of new ideas and tools. MIRI continues to fight to turn concern about ‘AI Safety’ into concern about AI Safety.

AI Safety is so hard to understand that Eliezer Yudkowsky decided he needed to teach the world the art of rationality so we could then understand AI Safety. He did exactly that, which is why this blog exists.

MIRI is developing techniques to make AGIs we can understand and predict and prove things about. MIRI seeks to understand how agents can and should think. If AGI comes from such models, this is a huge boost to our chances of success. MIRI is also working on techniques to make machine learning based agents safer, in case that path leads to AGI first. Both tasks are valuable, but I am especially excited by MIRI’s work on logic.


Eliezer’s model was that if we teach people to think, then they can think about AI.

What I’ve come to realize is that when we try to think about AI, we also learn how to think in general.

The paper that convinced OpenPhil to increase its grant to MIRI was about Logical Induction. That paper was impressive and worth understanding, but even more impressive and valuable in my eyes is MIRI’s work on Functional Decision Theory. This is vital to creating an AGI that makes decisions, and has been invaluable to me as a human making decisions. It gave me a much better way to understand, work with and explain how to think about making decisions.

Our society believes in and praises Causal Decision Theory, dismissing other considerations as irrational. This has been a disaster on a level hard to comprehend. It destroys the foundations of civilization. If we could spread practical, human use of Functional Decision Theory, and debate on that basis, we could get out of much of our current mess. Thanks to MIRI, we have a strong formal statement of Functional Decision Theory.

Whenever I think about AI or AI Safety, read AI papers or try to design AI systems, I learn how to think as a human. As a side effect of MIRI’s work, my thinking, and especially my ability to formalize, explain and share my thinking, has been greatly advanced. Their work even this year has been a great help.

MIRI does basic research into how to think. We should expect such research to continue to pay large and unexpected dividends, even ignoring its impact on AI Safety.


I believe it is always important to use strategies that are cooperative and information creating, rather than defecting and information destroying, and that preserve good incentives for all involved. If we’re not using a decision algorithm that cares more about such considerations than maximizing revenue raised, even when raising for a cause as good as ‘not destroying all value in the universe,’ it will not end well.

This means that I need to do three things. I need to share my information, as best I can. I need to include my own biases, so others can decide whether and how much to adjust for them. And I need to avoid using strategies that would be distort or mislead.

I have not been able to share all my information above, due to a combination of space, complexity and confidentiality considerations. I have done what I can. Beyond that, I will simply say that what remaining private information I have on net points in the direction of MIRI being a better place to donate money.

My own biases here are clear. The majority of my friends come from the rationality community, which would not exist except for Eliezer Yudkowsky. I met my wife Laura at a community meetup. I know several MIRI members personally, consider them friends, and even ran a strategy meeting for them several years back at their request. It would not be surprising if such considerations influenced my judgment somewhat. Such concerns go hand in hand with being in a position to do extensive analysis and acquire private information. This is all the more reason to do your own thinking and analysis of these issues.

To avoid distortions, I am giving the money directly, without qualifications or gimmicks or matching funds. My hope is that this will be a costly signal that I have thought long and hard about such questions, and reached the conclusion that MIRI is an excellent place to donate money. OpenPhil has a principle that they will not fund more than half of any organization’s budget. I think this is an excellent principle. There is more than enough money in the effective altruist community to fully fund MIRI and other such worthy causes, but these funds represent a great temptation. They risk causing great distortions, and tying up action with political considerations, despite everyone’s best intentions.

As small givers (at least, relative to some) our biggest value lies not in the use of the money itself, but in the information value of the costly signal our donations give and in the virtues we cultivate in ourselves by giving. I believe MIRI can efficiently utilize far more money than it currently has, but more than that this is me saying that I know them, I know their work, and I believe in and trust them. I vouch for MIRI.







New Comment
9 comments, sorted by Click to highlight new comments since: Today at 3:10 AM

A related thought:

There's an issue with AI Safety funding: many people agree it's important for there to be an AI Safety field, and such a field should grow, evaluate the problem from different angles, build up a body of knowledge and cultivate people who think seriously about the problem full-time.

Non-safety AI Researchers should know that such a field exists, that it worth serious attention.

This idea is popular both because it is true, but (I think) also because it is a tempting way to pass the buck.

If you're a funder who thinks AI Alignment is important, well, you're not sure what actually needs doing, but you can at least agree to fund something generic that "builds the field." A specific approach might be wrong, a generic approach can't "be wrong", and so your bullshit detectors don't see any concrete red flags.

But... at some point, to actually make progress, you need to get into the weeds and make a plan that involves specific assumptions and try to implement that plan.

MIRI is (relatively) unique among AI Safety organizations in that it focuses on particular types of approaches. The researchers at MIRI aren't a monolith, but they tend to share particular assumptions about the constraints of the problem, and what solving the problem would look like. And then trying to actually create that solution.

And they may be wrong in some of their assumptions. But someone has to have a plan that is specific enough that it might be wrong and try to implement that plan.

For this reason, I expect MIRI (and things like it) to be generally underfunded relative to more "generic seeming" organizations that aren't making explicit claims.

If true, this claim should probably push you to be more willing to fund things that are attempting concrete solutions.

(I also happen to think they are not wrong in their assumptions, and one of the things that's impressed me over the years is seeing their strategy shift over time as the landscape of AI changed, which I've found reassuring. I trust them to continue to update as new information comes out)

I should note that I'm not sure whether OpenAI is a point against this claim or not (I think not but for complicated reasons). My vague impression is that they do tend to have their own set of assumptions, and are working on reasonably concrete things (I think those assumptions are wrong but am not that confident).

I do lean towards OpenAI and MIRI should both be fully funded, OpenAI just seems to be getting a lot more funding due to Elon's involvement and generally being more "traditionally prestigious".

Further thoughts here:

Insofar as I think OpenAI shouldn't be funded, it's because I think it might be actively harmful.

(epistemic status: I am not very informed about the current goings on at OpenAI, this is a random person rambling hearsay and making the best guesses they can without doing a thorough review of their blog, let alone talking to them)

The reasons it might be actively harmful is because it seems like a lot of their work is more like actually developing AI instead of AI Safety, and sharing AI developments with the world that might accelerate progress.

MIRI is the only organization I know of working directly on AI safety that I've heard talk extensively about differential-technological-development. i.e, do research that would only help build Aligned AGI and doesn't accelerate generic AI that might feed into Unaligned AGI.

Zvi, it's great that you vouch for MIRI! But I'm not sure it's always a good idea to use logical decision theories in real life. It's kind of an open problem actually. Wei had a great counterexample called 2TDT-1CDT. Imagine a tournament of blind one-shot prisoner's dilemma played among many TDT agents and one CDT agent. The TDT agents will cooperate because the opponent is likely to be TDT, while the CDT agent will defect and win. The relevant math idea is evolutionarily stable strategy, I think.

I agree that it's an open problem how best to make decisions in life. I do think that it's obvious that FDT>CDT as a policy, even if (or especially if!) you're a human and doing either one approximately. To put it more simply, ignoring the things FDT cares about and CDT does not care about seems like a pretty obvious mistake, and also a pretty easy way, if adopted by too many people, to have things go quite badly.

Interesting example there! But I don't think that example shows CDT>FDT even in this isolated designed-to-counterexample-me case. And I think that the fact that we're trying to construct an example where CDT>FDT should in and of itself be strong evidence of which you should choose.

One of these two things happens, depending on your perspective:

  1. Zvi runs FDT. Omega approaches Zvi and presents this situation. I think to myself "Is my decision to submit an FDT bot correlated to Omega's decision to submit FDT bots?" And my response is, no. So I submit CDT, or probably just "D". So CDT=FDT here because we both submit the same bot.
  2. Same setup. Zvi submits an FDT bot with the knowledge there are two bots submitted by Omega that were chosen first, this bot realizes it isn't correlated to Omega's bots (since they coopreate even if they know my bot defects), so it defects. Omega's bots realize they are correlated, so they cooperate. Same result. Setup matters.

The FDT agents are still making the right decision and still winning. Yes, the CDT agent doing better is annoying, but that's by definition not in your utility function, and you don't lose because someone else did better. The paperclip maximizer maximized paperclips, the staple maximizer maximized staples, versus either of them defecting or self-modifying to CDT or what not which goes very badly for them. The FDT agents are doing the right thing once the tournament starts. Note that if they submit one CDT and one FDT, you are forced to submit FDT, which is now correlated to the other FDT agent, which means the two of you now coopreate. So Omega isn't maximizing by submitting two FDT agents, but Omega rarely does. Omega is a shifty one. Odd motivations.

I'm with Calvin. I'm think misaligned agi is more likely to do local bad things when first invented . There are more partially rational agis than fully rational ones, so I expect those to be found first.

I do think it important who develops safe agi first. Whether there is the chance for a last minute power grab during the development process or not.

Can you expand on this point:

"If we do build an AGI, its actions will determine what is done with the universe.

If the first such AGI we build turns out to be an unfriendly AI that is optimizing for something other than humans and human values, all value in the universe will be destroyed. We are made of atoms that could be used for something else."

Especially in the sense of why the first unfriendly AI we built will immediately be uncontainable and surpass current human civilization's ability to destroy it?

Sorry I took awhile to find this. I think the best short answer to this question is Scott Alexander's Superintelligence FAQ.

The arguments involved here deal with a fair number of areas where human intuitions just aren't very good (i.e. we're used to dealing with linear systems, not exponential ones). So if Scott's post isn't persuasive to you, the next best answer I have is unfortunately "read the sequences."

Thanks for asking! This has been the subject of much virtual ink, much of it here (e.g. the book Superintelligence, and much of Eliezer's writings, including the whole "Foom" debate with Robin Hanson). Rather than try to summarize that debate, I'll encourage others to chime in with the links and explanations they think are best, since this is an important thing to get right and I'd like to see what our community currently thinks is the best way to explain this. I'm not thrilled that I don't have a great go-to here.

If no one comes up with anything I'm happy with within the week, I'll see if I can do better.

New to LessWrong?