Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

We (Zvi Mowshowitz and Vladimir Slepnev) are happy to announce the results of the fourth round of the AI Alignment Prize, funded by Paul Christiano. From July 15 to December 31, 2018 we received 10 entries, and are awarding four prizes for a total of $20,000.

The winners

We are awarding two first prizes of $7,500 each. One of them goes to Alexander Turner for Penalizing Impact via Attainable Utility Preservation; the other goes to Abram Demski and Scott Garrabrant for the Embedded Agency sequence.

We are also awarding two second prizes of $2,500 each: to Ryan Carey for Addressing three problems with counterfactual corrigibility, and to Wei Dai for Three AI Safety Related Ideas and Two Neglected Problems in Human-AI Safety.

We will contact each winner by email to arrange transfer of money. Many thanks to everyone else who participated!

Moving on

This concludes the AI Alignment Prize for now. It has stimulated a lot of good work during its year-long run, but participation has been slowing down from round to round, and we don't think it's worth continuing in its current form.

Once again, we'd like to thank everyone who sent us articles! And special thanks to Ben and Oliver from the LW2.0 team for their enthusiasm and help.

New to LessWrong?

New Comment
41 comments, sorted by Click to highlight new comments since: Today at 5:06 PM

It has stim­u­lated a lot of good work dur­ing its year-long run, but par­ti­ci­pa­tion has been slow­ing down from round to round, and we don’t think it’s worth con­tin­u­ing in its cur­rent form.

Any guesses why that's happening? For future prizes, I wonder if it would make sense to accept nominations instead of requiring authors to submit their own work.

Prizes are something people have suggested as providing better incentives than most current forms of funding, so it's disappointing to see existing prizes shut down. (If the upcoming write-up of lessons learned will talk about this, I can wait for that.)

I think our prize was relatively small in terms of both money and prestige. Offering more money was possible, but people usually won't work for the mere chance of money unless you offer a stupidly large sum, which would lead to other problems. The real solution is offering more prestige, but that's hard, unless you have a stash of it somewhere.

I think our prize was relatively small in terms of both money and prestige.

That's true, but given that, the results don't seem so bad? What would have counted as a success for this experiment in your view? Is there another way to spend money that seems clearly more cost-effective at this point, and if so what? If someone wanted to spend say 1x to 10x the amount that was given out by this prize per year, do you think prizes are still worth trying (maybe with some design changes) or should they look for something else?

Also this doesn't seem to explain why participation declined over time, so I'm still curious about that.

Offering more money was possible, but people usually won’t work for the mere chance of money unless you offer a stupidly large sum, which would lead to other problems.

I think maybe there's a tipping point where prizes could work if they collectively gave out enough money on a regular basis that someone who is sufficiently productive in AI safety research could expect to make a living from prizes alone. (I'm thinking that instead of having fixed periods, just give out a prize whenever a new advance comes in that meets a certain subjective threshold.) Would you consider that a "stupidly large sum" and if so what kind of problems do you think it leads to?

The real solution is offering more prestige, but that’s hard, unless you have a stash of it somewhere.

More prestige certainly helps, but I feel that more money hasn't been tried hard enough yet, unless you know something that I don't.

Is there another way to spend money that seems clearly more cost-effective at this point, and if so what?

To be clear, I still think this is a good way to spend money. I think the main cost is time.

Whose time do you mean? The judges? Your own time? The participants' time?

Is there another way to spend money that seems clearly more cost-effective at this point, and if so what? In my opinion for example AI safety camps were significantly more effective. I have maybe 2-3 ideas which would be likely more effective (sorry but shareable in private only).

One possible factor is that there was initially a pool of people who wouldn't otherwise try to contribute to alignment research (~30 people, going from # of submissions to contest 1 - # of submissions to this contest) who tried their hand early on, but then became discouraged because the winners' entries seemed more polished and productive than they felt they could realistically hope for. In fact, I felt this way in round two. I imagine that I probably would've stopped if the alignment prize had been my sole motivation (i.e., totally ignoring how I feel about the necessity of work on this problem).

This and cousin_it's suggested novelty effect both make sense, but to me it just means that the prize givers got more than they bargained for in the first rounds and maybe it set people's expectations too high for what such a prize can accomplish. I failed to pay much attention to the first two rounds (should probably go back and look at them again) and to me the latter two rounds seem like a reasonable steady state result of the prize given the amount of money/prestige involved.

I wonder if another thing that discouraged people was a feeling that they had to compete with experienced professional researchers who already have funding from other sources. I think if I were to design a new prize with the experience of this one in mind, I'd split it into two prizes, one optimized for increasing prestige of the field, and one for funding people who otherwise couldn't get funding or to provide an alternative source of funding with better incentives. The former would look like conventional prestigious prizes in other fields, and the latter would run continuously and pay out as soon as some entry/nomination meets a certain subjective threshold of quality (which can be adjusted over time depending on the prize budget and quality of submissions), and the prize money would subtract out the amount of formal funding the recipient already received for the work (such as salaries, grants, or other prizes).

I agree with this point. Looking at the things that have won over time it eventually got to feel like it wasn't worth bothering to submit anything because the winners were going to end up mostly being folks who would have done their work anyway and meet certain levels of prestige. In this way I do sort of feel like the prize failed because it was set up in a way that rewarded work that would have happened anyway and failed to motivate work that wouldn't have happened otherwise. Maybe it's only in my mind that the value of a prize like this is to increase work on the margin rather than recognize outstanding work that would have otherwise been done, but I feel like beyond the first round it's been a prize of the form "here's money for the best stuff on AI alignment in the last x months" rather than "here's money to make AI alignment research happen that would otherwise not have happened". That made me much less interested in it, to the point I put the prize out of my mind until I saw this post reminding me of it today.

I disagree with the view that it's bad to spend the first few months prizing top researchers who would have done the work anyway. This _in and of itself_ is cleary burning cash, yet the point is to change incentives over a longer time-frame.

If you think research output is heavy-tailed, what you should expect to observe is something like this happening for a while, until promising tail-end researchers realise there's a stable stream of value to be had here, and put in the effort required to level up and contribute themselves. It's not implausible to me that would take a >1 year of prizes.

Expecting lots of important counterfactual work, that beats the current best work, to be come out of the woodwork within ~6 months seems to assume that A) making progress on alignment is quite tractable, and B) the ability to do so is fairly widely distributed across people; both to a seemingly unjustified extent.

I personally think prizes should be announced together with precommitments to keep delivering them for a non-trivial amount of time. I believe this because I think changing incentives involves changing expectations, in a way that changes medium-term planning. I expect people to have qualitatively different thoughts if their S1 reliably believes that fleshing out the-kinds-of-thoughts-that-take-6-months-to-flesh-out will be reward after those 6 months.

That's expensive, in terms of both money and trust.

As an anecdata point, it seems probable that I would not write the essay about the learning-theoretic research agenda without the prize, or at least, it would be significantly delayed. This is because I am usually reluctant to publish anything that doesn't contain non-trivial theorems, but it felt like for this prize it would be suitable (this preference is partially for objective reasons, but partially it is for entirely subjective motivation issues). In hindsight, I think that spending the time to write that essay was the right decision regardless of the prize.

As another anecdata point, I considered writing more to pursue the prize pool but ultimately didn't do any more (counterfactual) work!

fwiw, thirding this perception (although my take is less relevant since I didn't feel like I was in the target reference class in the first place)

I observe that, of the 16 awards of money from the AI alignmnet prize, as far as I can see none of the winners had a full-time project that wasn't working on AI alignment (i.e. they either worked on alignment full time, or else were financially supported in a way that gave them the space to devote their attention to it fully for the purpose of the prize). I myself, just now introspecting on why I didn't apply, didn't S1-expect to be able to produce anything I expected to win a prize without ~1 month of work, and I have to work on LessWrong. This suggests some natural interventions (e.g. somehow giving out smaller prizes for good efforts even if they weren't successful).

In round three, I was working on computational molecule design research and completing coursework; whitelisting was developed in my spare time.

In fact, during the school year I presently don't have research funding, so I spend some of my time as a teaching assistant.

Interesting. Can you talk a bit more about how much time you actually devoted to thinking about whitelisting in the lead up to the work that was awarded, and whether you considered it your top priority at the time?

Added: Was it the top idea in your mind for any substantial period of time?

Yes, it was the top idea on/off over a few months. I considered it my secret research and thought on my twice daily walks, in the shower, and in class when bored. I developed it for my CHAI application and extended it as my final Bayesian stats project. Probably 5-10 hours a week, plus more top idea time. However, the core idea came within the first hour of thinking about Concrete Problems.

The second piece, Overcoming Clinginess, was provoked by Abram’s comment that clinginess seemed like the most damning failure of whitelisting; at the time, I thought just finding a way to overcome clinginess would be an extremely productive use of my entire summer (lol). On an AMS - PDX flight, I put on some music and spent hours running through different scenarios to dissolve my confusion. I hit the solution after about 5 hours of work, spending 3 hours formalizing it a bit and 5 more making it look nice.

Yeah, this is similar to how I got into the game. Just thinking about it in my spare time for fun.

From your and others' comments, it sounds like a prize for best work isn't the best use of money. It's better to spend money on getting more people into the game. In that case it probably shouldn't be a competition: beginners need gradual rewards, not one-shot high stakes. Something like a more flat subsidy for studying and mentoring could work better. Thank you for making me realize that! I'll try to talk about it with folks.

I also think surveying applicants might be a good idea, since my experience may not be representative.

The first couple rounds attracted many people due to the novelty effect, but then it tapered off and we had no good ideas how to make it grow. Maybe offering 10x the money would solve that, but I think it would mostly attract bad entries.

Could there be some kind of mentorship incentive? Another problem at large in alignment research seems to be lack of mentors, since most of the people skilled enough to fill this role are desperately working against the clock. A naïve solution could be to offer a smaller prize to the mentor of a newer researcher if the newbie's submission details a significant amount of help on their part. Obviously, dishonest people could throw the name of their friend on the submission because "why not", but I'm not sure how serious this would be.

What would be nice would be some incentive for high quality mentorship / for bringing new people into the contest and research field, in a way that encourages the mentors to get their friends in the contest, even though that might end up increasing the amount of competition they have for their own proposal.

This might also modestly improve social incentives for mentors, since people like being associated with success and being seen as helpful / altruistic.

ETA: What about a flat prize (a few thousand dollars) you can only win once, but thence can mentor others and receive a slightly more modest sum for prizes they win? It might help kickstart people’s alignment careers if sufficiently selective / give them the confidence to continue work. Have to worry about the details for what counts as mentorship, depending on how cheaty we think people would try to be.

As Raemon noted, mentorship bottleneck is actually a bottleneck. Senior researchers who should mentor are the most bottlenecked resource in the field, and the problem is unlikely to be solved by financial or similar incentives. Motivating too much is probably wrong, because mentoring competes with time to do research, evaluate grants, etc. What can be done is

  • improve the utilization of time of the mentors (e.g. mentoring teams of people instead of individuals)
  • do what can be done on peer-to-peer basis
  • use mentors from other fields to teach people generic skills, e.g. how to do research
  • prepare better materials for onboarding

Some relevant bits from Critch's blog, relevant to the "use mentors from other fields for generic skills" include:

Leverage Academia

Using "Get into UC Berkeley" as a screening filter.

Deliberate Grad School.

I can probably spend some time (perhaps around 4 hours / week) on mentoring, especially for new researchers that want to contribute to the learning-theoretic research agenda or its vicinity. However, I am not sure how to make this known to the relevant people. Should I write a post that says "hey, who wants a mentor?" Is there a better way?

Important not to let the perfect be the enemy of the good. There's almost certainly a better way to find mentors, but this would be far better than not doing anything, so I'd say that if you can't find an actionable better option within (let's say) a month, you should just do it. Or just do it now and replace with better method when you find one.

Off-the-cuff: I think making that post is probably good. In the longterm hopefully we can come up with a more enduring solution.

I think the mentorship bottleneck is quite important, but my sense is it actually is a bottleneck, i.e. most people with the capacity to mentor people already are.

I want to post a marker here that if I don't write up my lessons learned from the prize process within the next month, people should bug me about that until I do.

I'm giving you a reminder about 12 hours early, just to signal how impatient I am to hear what lessons you learned. :) Also, can you please address my questions to cousin_it in your post? (I'm a bit confused about the relative lack of engagement on the part of all three people involved in this prize with people's questions and suggestions. If AI alignment is important, surely figuring out / iterating on better ways of funding AI alignment research should be a top priority?)

Gosh I'm so irritated that you gave the reminder before me, I was looking forward to showing off my basic calendar-use skills ;-)

Anyway, am also looking forward to Zvi's lessons and updates!

Reminder to do this.

(I will stop reminding you if you ask, but until then I am a fan of helping public commitment get acted on.)

Okay. I’ve added myself a calendar reminder.

Was this post ever published? (Looking at Zvi's posts since January, I don't see anything that looks like it.)

Reminder to do this.

Congratulations to all the winners!

I hadn't completed my intended submission in time for the last round, and was looking forward to compete in the next one, so it's slightly disappointing. Oh, well. IMHO it could work in the long run if it was substantially less frequent: say, once in 2 years.

In any case, I think it was a great initiative and I sincerely thank the organizing team for making it happen!

Woop! All the winners are awesome and I’m glad they’re getting money for making progress on the most important problem.

Slightly disappointed that this isn't continuing (though I didn't submit to the prize, I submitted to Paul Christiano's call for possible problems with his approach which was similarly structured). Was hoping that once I got further into my PhD, I'd have some more things worth writing up, and the recognition/a bit of prize money would provide some extra motivation to get them out the door.

What do you feel like is the limiting resource that keeps continuing this from being useful to continue in it's current form?

I think in its current form it would keep getting fewer and fewer entries.

It seems like you don't have many data points to be able to say that with much confidence? Why not run it a bit longer and see what happens (maybe with some adjustments that people have suggested)? Also it doesn't seem like a linear extrapolation makes sense here because having fewer entries would give people a stronger incentive to participate (since they'd each have a greater chance of winning) so the number of entries ought to stabilize in the positive region. Even in the worst case scenario, if no one submits a prize-worthy entry, can't you cancel the prize at that point and not lose very much?

Also, I'm curious if this was mostly your and Zvi's decision, or Paul's, and how is Paul planning to use the money that is "saved" from cancelling this prize? (In case you're wondering about my motivations here, I've been thinking over some related questions about how to spend my own money and/or other people's that I might have influence over.)

I also second William's question.

Sorry about not replying so long.

I don't think the money incentive is strong enough. Nobody will do good AI safety work just for a chance at 5K dollars. The prestige incentive is stronger, but if we get fewer entries over time, the prestige incentive falls and we get even fewer entries next time etc.

Canceling was my suggestion to which others agreed. Can't speak for others, but my main reason was that it's not fun to work on a project without growth, even if it's for an important cause. The choice was between canceling it and tweaking it to achieve growth, and I didn't have good ideas for tweaks.

I guess the question was more from the perspective of: if the cost was zero then it seems like it would worth running, so what part of the cost makes it not worth running (where I would think of cost as probably time to judge or availability of money to fund the contest).