Announcement: AI alignment prize winners and next round



We (Zvi Mowshowitz, Vladimir Slepnev and Paul Christiano) are happy to announce that the AI Alignment Prize is a success. From November 3 to December 31 we received over 40 entries representing an incredible amount of work and insight. That's much more than we dared to hope for, in both quantity and quality.

In this post we name six winners who will receive $15,000 in total, an increase from the originally planned $5,000.

We're also kicking off the next round of the prize, which will run from today until March 31, under the same rules as before.

The winners

First prize of $5,000 goes to Scott Garrabrant (MIRI) for his post Goodhart Taxonomy, an excellent write-up detailing the possible failures that can arise when optimizing for a proxy instead of the actual goal. Goodhart’s Law is simple to understand, impossible to forget once learned, and applies equally to AI alignment and everyday life. While Goodhart’s Law is widely known, breaking it down in this new way seems very valuable.

Five more participants receive $2,000 each:

  • Tobias Baumann (FRI) for his post Using Surrogate Goals to Deflect Threats. Adding failsafes to the AI's utility function is a promising idea and we're happy to see more detailed treatments of it.
  • Vadim Kosoy (MIRI) for his work on Delegative Reinforcement Learning (1, 2, 3). Proving performance bounds for agents that learn goals from each other is obviously important for AI alignment.
  • John Maxwell (unaffiliated) for his post Friendly AI through Ontology Autogeneration. We aren't fans of John's overall proposal, but the accompanying philosophical ideas are intriguing on their own.
  • Alex Mennen (unaffiliated) for his posts on legibility to other agents and learning goals of simple agents. The first is a neat way of thinking about some decision theory problems, and the second is a potentially good step for real world AI alignment.
  • Caspar Oesterheld (FRI) for his post and paper studying which decision theories would arise from environments like reinforcement learning or futarchy. Caspar's angle of attack is new and leads to interesting results.

We'll be contacting each winner by email to arrange transfer of money.

We would also like to thank everyone who participated. Even if you didn't get one of the prizes today, please don't let that discourage you!

The next round

We are now announcing the next round of the AI alignment prize.

As before, we're looking for technical, philosophical and strategic ideas for AI alignment, posted publicly between now and March 31, 2018. You can submit your entries in the comments here or by email to We may give feedback on early entries to allow improvement, though our ability to do this may become limited by the volume of entries.

The minimum prize pool this time will be $10,000, with a minimum first prize of $5,000. If the entries once again surpass our expectations, we will again increase that pool.

Thank you!

(Addendum: I've written a post summarizing the typical feedback we've sent to participants in the previous round.)