Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

TLDR

We are announcing a $20k bounty for publicly-understandable explainers of AI safety concepts. We are also releasing the results of the AI Safety Arguments competition.

Background

Of the technologists, ML researchers, and policymakers thinking about AI, very few are seriously thinking about AI existential safety. This results in less high-quality research and could also pose difficulties for deployment of safety solutions in the future.

There is no single solution to this problem. However, an increase in the number of publicly accessible discussions of AI risk can help to shift the Overton window towards more serious consideration of AI safety.

Capability advancements have surprised many in the broader ML community: as they have made discussion of AGI more possible, they can also contribute to making discussion of existential safety more possible. Still, there are not many good introductory resources to the topic or various subtopics. If somebody has no background, they might need to read books or very long sequences of posts to get an idea about why people are worried about AI x-risk. There are a few strong, short, introductions to AI x-risk, but some of them are out of date and they aren’t suited for all audiences.

Shane Legg, a co-founder of DeepMind, recently said the following about AGI:[1]

If you go back 10-12 years ago the whole notion of Artificial General Intelligence was lunatic fringe. People [in the field] would literally just roll their eyes and just walk away. [I had that happen] multiple times. [...] [But] every year [the number of people who roll their eyes] becomes less.

We hope that the number of people rolling their eyes at AI safety can be reduced, too. In the case of AGI, increased AI capabilities and public relations efforts by major AI labs have fed more discussion. Similarly, conscious efforts to increase public understanding and knowledge of safety could have a similar effect.

Bounty details

The Center for AI Safety is announcing a $20,000 bounty for the best publicly-understandable explainers of topics in AI safety. Winners of the bounty will win $2,000 each, for a total of up to ten possible bounty recipients. The bounty is subject to the Terms and Conditions below.

By publicly understandable, we mean understandable to somebody who has never read a book or technical paper on AI safety and who has never read LessWrong or the EA Forum. Work may or may not assume technical knowledge of deep learning and related math, but should make minimal assumptions beyond that.

By explainer, we mean that it digests existing research and ideas into a coherent and comprehensible piece of writing. This means that the work should draw from multiple sources. This is not a bounty for original research, and is intended for work that covers more ground at a higher level than the distillation contest.

Below are some examples of public materials that we value. This should not be taken as an exhaustive list of all existing valuable public contributions.

Note that many of the works above are quite different and do not always agree with each other. Listing them isn’t to say that we agree with everything in them, and we don’t expect to necessarily agree with all claims in the pieces we award bounties to. However, we will not award bounties to work we believe is false or misleading.

Here are some categories of work we believe could be valuable:

  • Executive summaries that lay out a case for the overall importance of AI safety.
  • Work that explains considerations around a particular area in AI safety, summarizes existing work in the area, and discusses its relative importance. We are especially interested in writing regarding the topics below. More discussion and explanation of each can be found here.
    • Deception and deceptive alignment
    • Power-seeking behavior
    • Emergent goals, intrasystem goals, mesa-optimization
    • Weaponization of AI
    • The “enfeeblement problem”
    • Eroded epistemics caused by persuasive AI
    • Proxy misspecification
    • Value lock-in
  • Any other explainer that presents something relevant to large scale and existential risk from AI that provides a valuable perspective and is publicly understandable.

There is no particular length of submission we are seeking, but we expect most winning submissions will take less than 30 minutes for a reader/viewer/listener to digest.

Award Process

Judging will be conducted on a rolling basis, and we may award bounties at any time. Judging is at the discretion of the Center for AI Safety. Winners of the bounty are required to allow for their work to be reprinted with attribution to the author but not necessarily with a link to the original post.

We thank the FTX Future Fund regranting program for the funding for this competition.

How to submit

We will accept several kinds of submissions:

  • New published work (originally published after August 4th, 2022). Could be published as a paper in an academic venue, a blog post, etc.
  • Posts on the EA Forum, Alignment Forum, and LessWrong tagged with the AI Safety Public Materials tag.
  • Other forms of media (for example, YouTube videos, podcasts, visual art, infographics) are also accepted.
  • Referral links to public materials previously published that weren’t already on our radar. Referrers of a winning entry will be given nominal prizes.

If your submission is released somewhere other than the forums above, you may submit a link to it here.

The competition will run from today, August 4th, 2022 until December 31st, 2022. The bounty will be awarded on a rolling basis. If funds run out before the end date, we will edit this post to indicate that and also notify everyone who filled out the interest form below.

If you are interested in potentially writing something for this bounty, please fill out this interest form! We may connect you with others interested in working on similar things.

AI Safety Arguments Competition Results

In our previously-announced AI Safety arguments competition, we aimed to compile short arguments for the importance of AI safety. The main intention of the competition was to compile a collection of points to riff on in other work.

We received over 800 submissions, and in this post we are releasing the top ~10% here. These submissions were selected through an effort by 29 volunteers followed by manual review and fact checking by our team, and prizes will be distributed amongst them in varying proportions.[2] Many of the submissions were drawn from previously existing work. The spreadsheet format is inspired by Victoria Krakovna’s very useful spreadsheet of examples of specification gaming.

We would like to note that it is important to be mindful of potential negative risks when doing any kind of broader outreach, and it’s especially important that those doing outreach are familiar with the audiences they plan to interact with. If you are thinking of using the arguments for public outreach, please consider reaching out to us beforehand. You can contact us at info@centerforaisafety.org.

We hope that the arguments we have identified can serve as a useful compilation of common points of public outreach for AI safety, and can be used in a wide variety of work including the kind of work we are seeking in the competition above. 

Terms and Conditions for Bounty

  1. Employees or current contractors of FTX and contest organizers are not eligible to win prizes.
  2. Entrants and Winners must be over the age of 18.
  3. By entering the contest, entrants agree to the Terms & Conditions.
  4. All taxes are the responsibility of the winners.
  5. The legality of accepting the prize in his or her country is the responsibility of the winners. Sponsor may confirm the legality of sending prize money to winners who are residents of countries outside of the United States.
  6. Winners will be notified by email (for submissions through the form) or direct message on the forum (for EA Forum, Alignment Forum, and LessWrong submissions).
  7. Winners grant to Sponsor the right to use their name and likeness for any purpose arising out of or related to the contest. Winners also grant to Sponsor a non-exclusive royalty-free license to reprint, publish and/or use the entry for any purpose arising out of related to the contest including linking to or re-publishing the work.
  8. Entrants warrant that they are eligible to receive the prize money from any relevant employer or from a contract standpoint.
  9. Entrants agree that FTX shall not be liable to entrants for any type of damages that arise out of or are related to the contest and/or the prizes.
  10. By submitting an entry, entrant represents and warrants that, consistent with the terms of the Terms and Conditions: (a) the entry is entrant’s original work; (b) entrant owns any copyright applicable to the entry; (c) the entry does not violate, in whole or in part, any existing copyright, trademark, patent or any other intellectual property right of any other person, organization or entity; (d) entrant has confirmed and is unaware of any contractual obligations entrant has which may be inconsistent with these Terms and Conditions and the rights entrant is required to have in the entry, including but not limited to any prohibitions, obligations or limitations arising from any current or former employment arrangement entrant may have; (e) entrant is not disclosing the confidential, trade secret or proprietary information of any other person or entity, including any obligation entrant may have in connection arising from any current or former employment, without authorization or a license; and (f) entrant has full power and all legal rights to submit an entry in full compliance with these Terms and Conditions.
  1. ^

    This was part of one of the winning submissions in the AI safety arguments competition, detailed below.

  2. ^

    All winners have been notified, if you haven’t received notice yet via email or LessWrong/EA Forum message, then that unfortunately means you did not win.

51

Ω 18

5 comments, sorted by Click to highlight new comments since: Today at 4:37 AM
New Comment

For those who missed it, all of the AI Safety Arguments that won the competition can be found here, randomly ordered.

If you or anyone you know is ever having any sort of difficulty explaining anything about AI safety to anyone, these arguments are a good place to look for inspiration; other people have already done most of the work for you.

But if you really want to win this current contest, I highly recommend using bullet-pointed summaries of the 7 works stated in this post, as well as deeply reading the instructions instead of skimming them (because this post literally tells you how to win).

I'm not sure what you mean by "using bullet-pointed summaries of the 7 works stated in the post". If you mean the past examples of good materials, I'm not sure how good of an idea that is. We don't just want people to be rephrasings/"distillations" of single pieces of prior work.

I'm also not sure we literally tell you how to win, but yes, reading the instructions would be useful.

I meant, reading them and making bullet pointed lists of all valuable statements, in order to minimize the risk of forgetting something that could have been a valuable addition.  You make a very good point that there's pitfalls with this strategy, like having a summary of too many details when the important thing is galaxy-brain framing that will demonstrate the problem to different types of influential people with the maximum success rate.

I think actually reading (and taking notes) on most/all of the 7 recommended papers that you guys listed is generally a winning strategy, both for winning the contest and for winning at solving alignment in time. But only for people who can do it without forgetting that they're making something optimal/inspirational for minimizing absurdity heuristic, not fitting as many cohesive logic statements as they can onto a single sheet of paper.

In my experience, constantly thinking about the reader (and even getting test-readers) is a pretty fail-safe way to get that right.

It sure would be nice if the best talking points were ordered by how effective they were, or ranked at all really. Categorization could also be a good idea.

These are already the top ~10%, the vast majority of the submissions aren't included. We didn't feel we really had enough data to accurately rank within these top 80 or so, though some are certainly better than others. Also, it really depends on the point you're trying to make or the audience, I don't think there really exists an objective ordering.

We did do categorization at one point, but many points fall into multiple categories and there are a lot of individual points such that we didn't find it very useful when we had them categorized.