Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This past semester, HAIST and MAIA (the Harvard and MIT AI safety student groups) ran an adapted version of Richard Ngo's AGI Safety Fundamentals alignment curriculum. This adaptation – which consists of eight 2-hour long meetings, with all readings done during the meeting – is now available on the AGISF website.

In this post, we discuss the adapted curriculum and its intended use, and we recommend that other in-person reading groups following AGISF use this adaptation.[1]

The adapted curriculum and its intended use

The adapted curriculum was made by refining a slightly rustier first adaptation, with significant help from Richard Ngo and feedback from participants. The key differences between the adapted curriculum and the mainline AGISF alignment curriculum are:

  • Participants do all the core readings during the meeting; no reading is required in between meetings.
  • Participants meet for 2 hours per week instead of 1.5.
  • Readings, including further readings, tend to be more bite-sized (usually not longer than 20 minutes).
  • There are no projects, and certain topics are omitted (e.g. governance and inverse reinforcement learning).

The way that HAIST and MAIA used this curriculum, and the way we recommend other groups use it, is:

  • Alternate between silent reading and discussion.
    • So a typical meeting might look like: people arrive, everyone does reading 1, everyone discusses reading 1, everyone does reading 2, everyone discusses reading 2, etc.
    • With certain longer or more difficult readings (e.g. Toy models of superposition), it could be reasonable to occasionally pause for discussion in the middle of the reading.
  • Encourage faster readers to take a look at the further readings while they wait for others to catch up.
    • We found that reading speeds varied significantly, with slower readers taking ~1.5x as long to finish as faster readers.
    • This works especially well if the readings are printed (which we recommend doing).

We note that this format introduces some new challenges, especially when there are slower readers.

  • Facilitators need to manage discussion timing since discussions that go too long cut into time for reading and discussing other material.
    • Planning out how long to spend discussing each core reading ahead of time can be very useful.
    • Facilitators should feel comfortable cutting off discussions to make sure there’s time to read and discuss all the core readings. (On the other hand, if a discussion is very productive, it may be worth skipping certain readings; this is a judgment call that facilitators will need to make.)
  • Different reading speeds need to be managed.
    • At HAIST, we typically found it feasible to wait for the slowest reader to finish reading. We printed copies of the further readings for faster readers to peruse while they waited for others to finish.
    • On the other hand, this might not work well for groups with especially slow readers. In these cases, you may need to begin discussions before everyone is done reading and, going forward, encourage slower readers to take a look at the core readings ahead of future meetings.

To help with some of these challenges, Sam prepared a guide for HAIST and MAIA facilitators that included recommended discussion times, points of discussion, and advice about which readings to cut if necessary. That facilitator guide was for an outdated version of the curriculum, but we hope to have an updated facilitator guide in the next few weeks. We don’t want to make these public, but feel free to reach out to smarks@math.harvard.edu if you’re running a reading group and are interested in seeing the old or forthcoming facilitator guides.

Why we recommend the adapted curriculum

Sam and Xander generally felt that the in-sessions reading format worked better than the take-home readings format, which HAIST used for an AGISF reading group in spring 2022. In particular:

  • Reading comprehension generally seemed higher, possibly because take-home readings led to participants rushing through readings before meetings, or possibly because participants could ask their facilitator about core confusions they had at the start of readings.
  • Discussions generally went better, possibly because readings were fresh on participants’ minds.
  • Participants seemed more energetic and engaged, possibly because alternating between silent reading and discussion is more engaging than long discussion blocks.

Empirically, participants also thought that HAIST reading groups went well. Participants gave the program an overall rating of 8.6 on a 0-10 scale, and when asked “To what extent are you considering a career in AI safety?” the average response was 7.4, up from 4.7 before the program (though part of this movement was likely due to selection effects[2]). 

Of course, the curriculum isn’t the only factor impacting how well a reading group goes. We note that smooth operationsfacilitator quality, a high admissions bar, and printing the readings (as an alternative to reading on devices) also seemed important for things going well at HAIST.[3]

  1. ^

    We don’t know whether the in-session readings format would work well for reading groups that meet virtually. If anyone experiments with this, we’d be very interested in hearing how it goes; you can contact Sam at smarks@math.harvard.edu.

  2. ^

    The start-of-program and end-of-program surveys had response rates around 45% and 65%, respectively, with an attrition rate of around 40%.

  3. ^

    See the HAIST update post for some relevant advice.

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 6:10 PM

You might want to post this on the effective altruism forum too, if you haven't considered it.  I think many groups interested in running similar AGISF programs don't read lesswrong, but do skim the forum

Thanks, that's a good suggestion! I've done so.