We (Zvi Mowshowitz and Vladimir Slepnev) are happy to announce the results of the second round of the AI Alignment Prize, funded by Paul Christiano. From January 15 to April 1 we received 37 entries. Once again, we received an abundance of worthy entries. In this post we name five winners who receive $15,000 in total, an increase from the planned $10,000.
We are also announcing the next round of the prize, which will run until June 30th, largely under the same rules as before.
First prize of $5,000 goes to Tom Everitt (Australian National University and DeepMind) and Marcus Hutter (Australian National University) for the paper The Alignment Problem for History-Based Bayesian Reinforcement Learners. We're happy to see such a detailed and rigorous write up of possible sources of misalignment, tying together a lot of previous work on the subject.
Second prize of $4,000 goes to Scott Garrabrant (MIRI) for these LW posts:
- Robustness to Scale
- Sources of Intuitions and Data on AGI
- Don't Condition on no Catastrophes
- Knowledge is Freedom
Each of these represents a small but noticeable step forward, adding up to a sizeable overall contribution. Scott also won first prize in the previous round.
Third prize of $3,000 goes to Stuart Armstrong (FHI) for his post Resolving human values, completely and adequately and other LW posts during this round. Human values can be under-defined in many possible ways, and Stuart has been very productive at teasing them out and suggesting workarounds.
Fourth prize of $2,000 goes to Vanessa Kosoy (MIRI) for the post Quantilal control for finite MDPs. The idea of quantilization might help mitigate the drawbacks of extreme optimization, and it's good to see a rigorous treatment of it. Vanessa is also a second time winner.
Fifth prize of $1,000 goes to Alex Zhu (unaffiliated) for these LW posts:
- Reframing misaligned AGI's: well-intentioned non-neurotypical assistants
- Metaphilosophical competence can't be disentangled from alignment
- Corrigible but misaligned: a superintelligent messiah
- My take on agent foundations: formalizing metaphilosophical competence
Alex's posts have good framings of several problems related to AI alignment, and led to a surprising amount of good discussion.
We will contact each winner by email to arrange transfer of money.
We would also like to thank everyone else who sent in their work! The only way to make progress on AI alignment is by working on it, so your participation is the whole point.
The next round
We are now also announcing the third round of the AI alignment prize.
We're looking for technical, philosophical and strategic ideas for AI alignment, posted publicly between January 1, 2018 and June 30, 2018 and not submitted for previous iterations of the AI alignment prize. You can submit your entries in the comments here or by email to firstname.lastname@example.org. We may give feedback on early entries to allow improvement, though our ability to do this may become limited by the volume of entries.
The minimum prize pool will again be $10,000, with a minimum first prize of $5,000.