LESSWRONG
LW

AI RiskExercises / Problem-SetsAICommunity
Frontpage

41

List of technical AI safety exercises and projects

by JakubK
19th Jan 2023
2 min read
5

41

This is a linkpost for https://docs.google.com/document/d/1-58zgC2lRMbMK-CXU44VR3ApGYbTI0aJKX-cKkxDeyo/edit?usp=sharing
AI RiskExercises / Problem-SetsAICommunity
Frontpage

41

List of technical AI safety exercises and projects
5Steven Byrnes
1JakubK
2plex
3JakubK
2plex
New Comment
5 comments, sorted by
top scoring
Click to highlight new comments since: Today at 1:39 PM
[-]Steven Byrnes2y50

not sure if this is what you’re looking for but I have some wish-list big projects listed here.

Reply
[-]JakubK2y10

Thanks for sharing that, I just added it to the Google doc.

Reply
[-]plex2y20

Nice! Would you be up for putting this in the aisafety.info Google Drive folder too, with a question-shaped title?

Reply
[-]JakubK2y30

Done, the current title is "What are some exercises and projects I can try?"

Reply
[-]plex2y20

Great, thanks!

Reply
Crossposted to the EA Forum. Click to view.
Moderation Log
Curated and popular this week
5Comments

EDIT 3/17/2023: I've reorganized the doc and added some governance projects.

I intend to maintain a list at this doc. I'll paste the current state of the doc (as of January 19th, 2023) below. I encourage people to comment with suggestions.

  • Levelling Up in AI Safety Research Engineering [Public] (LW)
    • Highly recommended list of AI safety research engineering resources for people at various skill levels.
  • AI Alignment Awards
  • Alignment jams / hackathons from Apart Research
    • Past / upcoming hackathons: LLM, interpretability 1, AI test, interpretability 2 
    • Projects on AI Safety Ideas: LLM, interpretability, AI test
    • Resources: black-box investigator of language models, interpretability playground (LW), AI test
    • Examples of past projects; interpretability winners
    • How to run one as an in-person event at your school
  • Neel Nanda: 200 Concrete Open Problems in Mechanistic Interpretability (doc and previous version)
  • Project page from AGI Safety Fundamentals and their Open List of Project ideas
  • AI Safety Ideas by Apart Research; EAF post
  • Most Important Century writing prize (Superlinear page)
  • Center for AI Safety
    • Competitions like SafeBench
    • Student ML Safety Research Stipend Opportunity – provides stipends for doing ML research. 
    • course.mlsafety.org projects CAIS is looking for someone to add details about these projects on course.mlsafety.org
  • Distilling / summarizing / synthesizing / reviewing / explaining
  • Forming your own views on AI safety (without stress!) – also see Neel's presentation slides and "Inside Views Resources" doc
  • Answer some of the application questions from the winter 2022 SERI-MATS, such as Vivek Hebbar's problems
  • 10 exercises from Akash in “Resources that (I think) new alignment researchers should know about”
  • [T] Deception Demo Brainstorm has some ideas (message Thomas Larsen if these seem interesting)
  • Upcoming 2023 Open Philanthropy AI Worldviews Contest
  • Alignment research at ALTER – interesting research problems, many have a theoretical math flavor
  • Open Problems in AI X-Risk [PAIS #5]
  • Amplify creative grants (old)
  • Evan Hubinger: Concrete experiments in inner alignment, ideas someone should investigate further, sticky goals
  • Richard Ngo: Some conceptual alignment research projects, alignment research exercises
  • Buck Shlegeris: Some fun ML engineering projects that I would think are cool, The case for becoming a black box investigator of language models
  • Implement a key paper in deep reinforcement learning
  • “Paper replication resources” section in “How to pursue a career in technical alignment”
  • Daniel Filan idea
  • Summarize a reading from Reading What We Can