AI Safety Ideas: An Open AI Safety Research Platform

Esben Kran

TLDR; We present the AI safety ideas and research platform AI Safety Ideas in open alpha. Add and explore research ideas on the website here: aisafetyideas.com.

AI Safety Ideas has been accessible for a while in an alpha state (4 months, on-and-off development) and we now publish it in open alpha to receive feedback and develop it continuously with the community of researchers and students in AI safety. All of the projects are either from public sources (e.g. AlignmentForum posts) or posted on the website itself.

The current website represents the first steps towards an accessible crowdsourced research platform for easier research collaboration and hypothesis testing.

The gap in AI safety

Research prioritization & development

Research prioritization is hard and even more so in a pre-paradigmatic field like AI safety. We can grok the highest-karma post on the AlignmentForum but is there another way?

With AI Safety Ideas, we introduce a collaborative way to prioritize and work on specific agendas together through social features. We hope this can become a scalable research platform for AI safety.

Successful examples of less systematized but similar, collaborative, online, and high quality output projects can be seen in Discord servers such as EleutherAI, CarperAI, Stability AI, and Yannic Kilcher’s Discord, in hackathons, and in competitions such as the inverse scaling competition.

Additionally, we are also missing an empirically driven impact evaluation of AI safety projects. With the next steps of development described further down, we hope to make this easier and more available while facilitating more iteration in AI safety research. Systemized hypotheses testing with bounties can help funders directly fund specific results and enables open evaluation of agendas and research projects.

Mid-career & student newcomers

Novice and entrant participation in AI safety research is mostly present in two forms at the moment: 1) Active or passive part-time course participation with a capstone project (AGISF, ML Safety) and 2) flying to London or Berkeley for three months to participate in full-time paid studies and research (MLAB, SERI MATS, PIBBSS, Refine).

Both are highly valuable but a third option seems to be missing: 3) An accessible, scalable, low time commitment, open research opportunity. Very few people work in AI safety and allowing decentralized, volunteer or bounty-driven research will allow many more to contribute to this growing field.

By allowing this flexible research opportunity, we can attract people who cannot participate in option (2) because of visa, school / life / work commitments, location, rejection, or funding while we can attract a more senior and active audience compared to option (1).

Next steps

Oct	Releasing and building up the user base and crowdsourced content. Create an insider build to test beta features. Apply to join the insider build here.
Nov	Implementing hypothesis testing features: Creating hypotheses, linking ideas and hypotheses, adding negative and positive results to hypotheses. Creating an email notification system.
Dec	Collaboration features: Contact others interested in the same idea and mentor ideas. A better commenting system with a results comment that can indicate if the project has been finished or not, what the results are, and by who was it done.
Jan	Adding moderation features: Accepting results, moderating hypotheses, admin users. Add bounty features for the hypotheses and a simple user karma system.
Feb	Share with ML researchers and academics in EleutherAI and CarperAI. Implement the ability to create special pages with specific private and public ideas curated for a specific purpose (title and description included). Will help integrate with local events, e.g. the Alignment Jams.
Mar<	Allow editing and save editing history of hypotheses and ideas. Get DOIs for reviewed hypothesis result pages. Implement the EigenKarma karma system. Implement automatic auditing by NLP. Monitor the progress on different clusters of hypotheses and research ideas (research agendas). Release meta-science research on the projects that have come out of the platform and the general progress.

Risks

Wrong incentives on the AI Safety ideas platform leads to people working on others’ agendas instead of working on their own inside view.
AI Safety Ideas does not receive traction and by extension becomes less useful than would be expected.
Some users who do alignment research without a profound understanding of why alignment is important, discover ideas that have the potential to help AI capabilities, without being worried enough about info hazards to contain them properly.
Project bounties on the AI Safety Ideas platform will be occupied by capabilities-first agendas and mislead new researchers.

Risk mitigation

Several of these are not implemented yet but will be as we develop it further.

Ensure that specific agendas do not get special attention compared to others and implement incentives to work on new or updated projects and hypotheses. Have structured meetings and feedback sessions with leaders in AI safety field-building and conduct regular research about how the platform is used.
Do regular, live user interviews and ensure giving feedback is quick and easy. We have interviewed 18 until now and have automated feedback monitoring on our server. We will embed a feedback form directly on the website. Evaluate usefulness of features by creating an insider build.
Restricting themes within AI safety and nudging towards safety thinking in communication. It is also a risk if these capabilities-grokking capable researchers work independently and we might be able to pivot their attitude towards safety by providing this platform.
Ensure vetting of the ideas and users. Make the purpose and policies of the website very clear. Invite admin users based on AlignmentForum karma with the ability to downvote ideas, leading to hiding it until further evaluation.

Feedback

Give anonymous feedback on the website here or write your feedback in the comments. If you end up using the website, we also appreciate your in-depth feedback here (2-5 min). If you want any of your ideas removed or rephrased on the website, please send an email to operations@apartresearch.com.

PS: It is still very much in alpha and there might be mistakes in the research project descriptions. Please do point out any problems in the "Report an issue".

Help out

The platform is open source and we appreciate any pull requests on the insider branch. Add any bugs or feature requests on the issues page.

Apply to join the insider builds here to give feedback for the next versions. Join our Discord to discuss the development.

Thanks to Plex, Maris Sala, Sabrina Zaki, Nonlinear, Thomas Steinthal, Michael Chen, Aqeel Ali, JJ Hepburn, Nicole Nohemi, and Jamie Bernardi.

LESSWRONG
LW