We are announcing a $20k bounty for publicly-understandable explainers of AI safety concepts. We are also releasing the results of the AI Safety Arguments competition.
Of the technologists, ML researchers, and policymakers thinking about AI, very few are seriously thinking about AI existential safety. This results in less high-quality research and could also pose difficulties for deployment of safety solutions in the future.
There is no single solution to this problem. However, an increase in the number of publicly accessible discussions of AI risk can help to shift the Overton window towards more serious consideration of AI safety.
Capability advancements have surprised many in the broader ML community: as they have made discussion of AGI more possible, they can also contribute to making discussion of existential safety more possible. Still, there are not many good introductory resources to the topic or various subtopics. If somebody has no background, they might need to read books or very long sequences of posts to get an idea about why people are worried about AI x-risk. There are a few strong, short, introductions to AI x-risk, but some of them are out of date and they aren’t suited for all audiences.
Shane Legg, a co-founder of DeepMind, recently said the following about AGI:
If you go back 10-12 years ago the whole notion of Artificial General Intelligence was lunatic fringe. People [in the field] would literally just roll their eyes and just walk away. [I had that happen] multiple times. [...] [But] every year [the number of people who roll their eyes] becomes less.
We hope that the number of people rolling their eyes at AI safety can be reduced, too. In the case of AGI, increased AI capabilities and public relations efforts by major AI labs have fed more discussion. Similarly, conscious efforts to increase public understanding and knowledge of safety could have a similar effect.
The Center for AI Safety is announcing a $20,000 bounty for the best publicly-understandable explainers of topics in AI safety. Winners of the bounty will win $2,000 each, for a total of up to ten possible bounty recipients. The bounty is subject to the Terms and Conditions below.
By publicly understandable, we mean understandable to somebody who has never read a book or technical paper on AI safety and who has never read LessWrong or the EA Forum. Work may or may not assume technical knowledge of deep learning and related math, but should make minimal assumptions beyond that.
By explainer, we mean that it digests existing research and ideas into a coherent and comprehensible piece of writing. This means that the work should draw from multiple sources. This is not a bounty for original research, and is intended for work that covers more ground at a higher level than the distillation contest.
Below are some examples of public materials that we value. This should not be taken as an exhaustive list of all existing valuable public contributions.
Note that many of the works above are quite different and do not always agree with each other. Listing them isn’t to say that we agree with everything in them, and we don’t expect to necessarily agree with all claims in the pieces we award bounties to. However, we will not award bounties to work we believe is false or misleading.
Here are some categories of work we believe could be valuable:
There is no particular length of submission we are seeking, but we expect most winning submissions will take less than 30 minutes for a reader/viewer/listener to digest.
Judging will be conducted on a rolling basis, and we may award bounties at any time. Judging is at the discretion of the Center for AI Safety. Winners of the bounty are required to allow for their work to be reprinted with attribution to the author but not necessarily with a link to the original post.
We thank the FTX Future Fund regranting program for the funding for this competition.
We will accept several kinds of submissions:
If your submission is released somewhere other than the forums above, you may submit a link to it here.
The competition will run from today, August 4th, 2022 until December 31st, 2022. The bounty will be awarded on a rolling basis. If funds run out before the end date, we will edit this post to indicate that and also notify everyone who filled out the interest form below.
If you are interested in potentially writing something for this bounty, please fill out this interest form! We may connect you with others interested in working on similar things.
In our previously-announced AI Safety arguments competition, we aimed to compile short arguments for the importance of AI safety. The main intention of the competition was to compile a collection of points to riff on in other work.
We received over 800 submissions, and in this post we are releasing the top ~10% here. These submissions were selected through an effort by 29 volunteers followed by manual review and fact checking by our team, and prizes will be distributed amongst them in varying proportions. Many of the submissions were drawn from previously existing work. The spreadsheet format is inspired by Victoria Krakovna’s very useful spreadsheet of examples of specification gaming.
We would like to note that it is important to be mindful of potential negative risks when doing any kind of broader outreach, and it’s especially important that those doing outreach are familiar with the audiences they plan to interact with. If you are thinking of using the arguments for public outreach, please consider reaching out to us beforehand. You can contact us at firstname.lastname@example.org.
We hope that the arguments we have identified can serve as a useful compilation of common points of public outreach for AI safety, and can be used in a wide variety of work including the kind of work we are seeking in the competition above.
This was part of one of the winning submissions in the AI safety arguments competition, detailed below.
All winners have been notified, if you haven’t received notice yet via email or LessWrong/EA Forum message, then that unfortunately means you did not win.
For those who missed it, all of the AI Safety Arguments that won the competition can be found here, randomly ordered.
If you or anyone you know is ever having any sort of difficulty explaining anything about AI safety to anyone, these arguments are a good place to look for inspiration; other people have already done most of the work for you.
But if you really want to win this current contest, I highly recommend using bullet-pointed summaries of the 7 works stated in this post, as well as deeply reading the instructions instead of skimming them (because this post literally tells you how to win).
I'm not sure what you mean by "using bullet-pointed summaries of the 7 works stated in the post". If you mean the past examples of good materials, I'm not sure how good of an idea that is. We don't just want people to be rephrasings/"distillations" of single pieces of prior work.
I'm also not sure we literally tell you how to win, but yes, reading the instructions would be useful.
I meant, reading them and making bullet pointed lists of all valuable statements, in order to minimize the risk of forgetting something that could have been a valuable addition. You make a very good point that there's pitfalls with this strategy, like having a summary of too many details when the important thing is galaxy-brain framing that will demonstrate the problem to different types of influential people with the maximum success rate.
I think actually reading (and taking notes) on most/all of the 7 recommended papers that you guys listed is generally a winning strategy, both for winning the contest and for winning at solving alignment in time. But only for people who can do it without forgetting that they're making something optimal/inspirational for minimizing absurdity heuristic, not fitting as many cohesive logic statements as they can onto a single sheet of paper.
In my experience, constantly thinking about the reader (and even getting test-readers) is a pretty fail-safe way to get that right.
It sure would be nice if the best talking points were ordered by how effective they were, or ranked at all really. Categorization could also be a good idea.
These are already the top ~10%, the vast majority of the submissions aren't included. We didn't feel we really had enough data to accurately rank within these top 80 or so, though some are certainly better than others. Also, it really depends on the point you're trying to make or the audience, I don't think there really exists an objective ordering.
We did do categorization at one point, but many points fall into multiple categories and there are a lot of individual points such that we didn't find it very useful when we had them categorized.
Does this contest still run, given that the FTX Future Fund doesn't exist anymore?
Yup! The bounty is still ongoing, now funded by a different source. We have been awarding prizes throughout the duration of the bounty and will post an update in January detailing the results.
I'm following up on Leon's question - have the results already been posted? If not, when will they be posted (if they will be)? I'm curious to know. Thanks!
Has this already been posted? I could not find the post.