Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

We (Zvi Mowshowitz and Vladimir Slepnev) are happy to announce the results of the third round of the AI Alignment Prize, funded by Paul Christiano. From April 15 to June 30 we received entries from 12 participants, and are awarding $10,000 to two winners.

We are also announcing the fourth round of the prize, which will run until December 31 of this year under slightly different rules. More details below.

The winners

First prize of $7,500 goes to Vanessa Kosoy for The Learning-Theoretic AI Alignment Research Agenda. We feel this is much more accessible than previous writing on this topic, and gives a lot of promising ideas for future research. Most importantly, it explains why she is working on the problems she’s working on, in concrete enough ways to encourage productive debate and disagreement.

Second prize of $2,500 goes to Alexander Turner for the posts Worrying About the Vase: Whitelisting and Overcoming Clinginess in Impact Measures. We are especially happy with the amount of good discussion these posts generated.

We will contact each winner by email to arrange transfer of money. Many thanks to everyone else who sent in their work!

The next round

We are now announcing the fourth round of the AI Alignment Prize. Due the drop in number of entries, we feel that 2.5 months might be too short, so this round will run until end of this year.

We are looking for technical, philosophical and strategic ideas for AI alignment, posted publicly between July 15 and December 31, 2018. You can submit links to entries by leaving a comment below, or by email to We will try to give feedback on all early entries to allow improvement. Another change from previous rounds is that we ask each participant to submit only one entry (though possibly in multiple parts), rather than a list of several entries on different topics.

The minimum prize pool will again be $10,000, with a minimum first prize of $5,000.

Thank you!

New to LessWrong?

New Comment
7 comments, sorted by Click to highlight new comments since: Today at 12:01 PM

Woop! Congratulations to both :D

Submitting my post for early feedback in order to improve it further:

Exponentially diminishing returns and conjunctive goals: Mitigating Goodhart’s law with common sense. Towards corrigibility and interruptibility.


Utility maximising agents have been the Gordian Knot of AI safety. Here a concrete VNM-rational formula is proposed for satisficing agents, which can be contrasted with the hitherto over-discussed and too general approach of naive maximisation strategies. For example, the 100 paperclip scenario is easily solved by the proposed framework, since infinitely rechecking whether exactly 100 paper clips were indeed produced yields to diminishing returns. The formula provides a framework for specifying how we want the agents to simultaneously fulfil or at least trade off between the many different common sense considerations, possibly enabling them to even surpass the relative safety of humans. A comparison with the formula introduced in “Low Impact Artificial Intelligences” paper by S. Armstrong and B. Levinstein is included.

Oh, right, this. My post wouldn't have stood a chance, right?