Update on Harvard AI Safety Team and MIT AI Alignment

Xander Davies; Sam Marks; kaivu; tlevin; leni; maxnadeau; Naomi Bashkansky

We help organize the Harvard AI Safety Team (HAIST) and MIT AI Alignment (MAIA), and are excited about our groups and the progress we’ve made over the last semester.

In this post, we’ve attempted to think through what worked (and didn’t work!) for HAIST and MAIA, along with more details about what we’ve done and what our future plans are. We hope this is useful for the many other AI safety groups that exist or may soon exist, as well as for others thinking about how best to build community and excitement around working to reduce risks from advanced AI.

Important things that worked:

Well-targeted outreach, which (1) focused on the technically interesting parts of alignment (rather than its altruistic importance), and (2) leveraged informal connections with networks and friend groups.
HAIST office space, which was well-located and very useful for running HAIST’s programming and co-working.
Well-contextualized leadership, with many of the people involved in running HAIST/MAIA programming having experience with AI safety research (including nearly all of the facilitators for our reading groups).
High-quality, scalable weekly reading groups, including 13 sections of introductory reading groups, 2 science of deep learning reading groups, 2 policy reading groups, and general member reading groups for HAIST and MAIA.
Significant time expenditure, including mostly full-time attention from several organizers.

Important things we got wrong:

Poor retention for MAIA programming, perhaps due to starting this programming too late in the semester.
Excessive focus on intro programming, which cut against ML engineering programming and advanced reading groups for more seasoned members.

If you’re interested in supporting the alignment community in our area, the Cambridge Boston Alignment Initiative is currently hiring.

What we’ve been doing

HAIST and MAIA are concluding a 3-month period during which we expanded from one group of about 15 Harvard and MIT students who read AI alignment papers together once a week to two large student organizations that:

Ran a large AI safety intro fellowship (organized by Sam Marks and adapted from the AGI Safety Fundamentals program) that attracted over 230 applicants and enrolled about 130 in 13 weekly reading groups, all facilitated by people with experience in AI safety research (some of whom are students). About 60 participants have continued attending as of this post, including undergraduates, grad students, and postdocs in math and computer science.
- This wound up taking up most of our focus, which was not the original plan (we planned to spend more time on ML up-skilling and supporting research). This pivot was mostly intentional (we got a higher number of great applicants than expected), but we are worried about continually prioritizing our introductory program in the future (which we discuss further below).
Opened the HAIST office (with the significant help of Kaleem Ahmid, Frances Lorenz, and Madhu Sriram), which has become a vibrant coworking space for alignment research and related work. We plan to open a MAIA office, officially starting in February 2023.
Launched the MAIA/HAIST Research Fellows program (organized by Oam Patel), which paired 20 undergraduate and graduate students with AI safety and governance research mentors.
Started a Science of Deep Learning reading group at MAIA (organized by Eric Michaud), with around 8 active participants (more information about the reading group here). This program ended up being good for experienced member engagement and generating research ideas, but didn’t perform as well as an outreach mechanism (initial intention).
Ran two retreats (organized by Trevor Levin and Kuhan Jeyapragasan), with a total of 85 unique attendees, including many of our most engaged intro fellows, discussion group facilitators, research mentors, and guests from Redwood Research, OpenAI, Anthropic, Lightcone, Global Challenges Project, and Open Philanthropy. We think these retreats were unusually impactful (even compared to other retreats), with multiple participants at each indicating that they were significantly more likely to pursue careers in AI alignment research or related fields (governance/policy, outreach) after the retreat, and many expressing interest in and following up regarding continued involvement with (and in some cases organizing for) HAIST and MAIA.
Ran two weekly AI governance fellowships with 15 initial and 14 continuing participants.
Hosted Q&As with Daniel Ziegler, Tom Davidson, Chris Olah, Richard Ngo, and Daniel Kokotajlo.
Ran HAIST’s weekly member meetings, where we read alignment-relevant research (e.g. 1) and began MAIA member meetings.
Facilitated a debate on the risks and benefits of research on Reinforcement Learning from Human Feedback, and are working on producing an adversarial collaboration document (headed by Adam Jermyn) summarizing our debate.
Added weekly socials (organized by Naomi Bashkansky) hosted at the HAIST office where people new to alignment mingle with more experienced people.
Started an AI forecasting group, with talks, workshops with AI forecaster Tamay Besiroglu, and friendly competitions on Fermi estimations and pastcasting..
Are organizing an MLAB-inspired ML bootcamp in January 2023 in partnership with the Cambridge Boston Alignment Initiative, to which current students should apply by December 4th if they are interested in AI safety but have little ML experience.

What worked

Communication & Outreach Strategy

Outreach targeting the most promising students with technical backgrounds (and leveraging informal friend networks).
- Both HAIST and MAIA actively promoted our programs on the course sites/Slacks/mailing lists of relevant advanced CS and math classes, undergrad majors/graduate programs, and relevant student groups.
- Getting help on our outreach strategy from well-positioned members of these social networks (created in part through shared problem-set groups and friend groups with similar majors/extra-curricular/research interests) and asking them to recommend our programs to their peers, especially through direct messages.
Emphasizing the technical aspect and interestingness of AI alignment (over just its ethical importance). As we noted in our announcement post, we want AI safety to be motivated not just by mitigating existential risk or effective altruist considerations, but also as one of the most interesting, exciting, and important problems humanity faces. We continued pitching our programs primarily as ways to explore unique and interesting technical problems that also happen to be extremely important (rather than primarily as a means to social impact). We think this worked well and should be replicated. That being said, we think that having group members engaging with the impacts advanced AI could have, and implications for humanity are important, and addressed in our programs, social contexts, and other programming.
Good digital communications and copy. We (mostly Xander) put substantial effort into nailing the wording of our emails and Slack messages, including customizing them for different audiences, and we’re happy with what we wound up with. We also like our websites. If you’re doing similar outreach, feel free to reach out to Xander at xanderlaserdavies@gmail.com for resources and advice.
Special attention to the most engaged and skilled (in relevant domains) participants. We put participants with especially high combinations of engagement/interest with the ideas and technical skills in touch with top professionals and organizations. Chatting with professionals (1:1 chats, talks/Q&As at retreats, external connections) has often been cited as highly important for newer students getting more involved.

Operations

Active office in convenient location: Getting collective buy-in to use the office regularly (for default working, socializing, and AI safety programming), and investing effort into making the space fun and convenient to use helped improve programming, social events, and sense of community. We think the office facilitated many more interactions between group members (and with intro fellows) than would have occurred without it.
Smooth participant experience (through high-effort background organizing costs). We put effort into making the participant experience in programs strong - ensuring that discussions take place with the necessary materials (food, printed readings) in place, the rooms booked, the facilitators on time, and especially engaged participants followed up with. Starting, advertising, and running these programs, opening an office, and running two large retreats involved over a dozen organizers contributing >10 hours a week. (We know smaller groups will not have this kind of capacity, so we should note that we think it was important to make one or two of our core programs great before we significantly expanded.)

Pedagogy

Finding excellent, well-contextualized facilitators. All but one of the facilitators for our reading groups have done research on the topic of the group. Most were PhD students; some had finished PhDs or were otherwise full-time professional researchers; some had worked in relevant research groups, orgs, or labs. We think this increased the educational quality of the groups, improved discussions, and lent substantial credibility and professionalism to the programs. This probably resulted in part from confirming most facilitators over the summer.
Basing our intro fellowship on Richard Ngo’s AGISF curriculum. Though we made various adaptations (see below), we were very fortunate to have been starting from an extremely high-quality baseline. Any curriculum we had tried to make from scratch would likely have been significantly worse.
Putting special effort into selecting pedagogically-valuable reading materials for intro fellowships. Not all explanations of the same idea are equally reader-friendly, especially for people learning about alignment for the first time. For intro fellowship sections that met early in the week (before the other sections met to cover the same material), we tried to pay close attention to how they responded to the readings. When they weren’t getting much out of a reading we did our best to substitute it with a clearer write-up of the same topic (or sometimes, of a new topic altogether) before other intro fellows had. We sometimes also ran experiments, giving one reading to some sections and a different reading to another. We thought that the additional boost in reading quality helped keep participants engaged.
Having reading groups do readings in-meeting. A standard way to run reading groups is to ask participants to read materials outside of meetings, and then spend the entire meeting discussing. We have found that having longer meetings which provide time for eating and doing readings in-person result in much better reading comprehension (possibly because participants aren’t rushing to finish readings before the meeting) and much higher-quality discussion (since readings are fresh on participants’ minds). (A common concern, which is that longer meetings tire out participants more, seems to not have materialized, possibly because alternating between reading and discussion helps keep participants alert.) To this end, we adapted the AGISF curriculum (originally 7 weeks of 1.5-hour meetings) to span 10 weeks of two-hour meetings, with all readings done in-meeting.

After we incorporate a final round of participant feedback, we’ll release our final adaptation of the AGISF curriculum, structured as 9 weeks of two-hour meetings, and with various minor curricular substitutions.

Mistakes/Areas for Improvement

High attrition rates in our MIT programs. Our MIT programs had significantly higher attrition rates. We’re still figuring out why, but reasons might include lack of office space, a later start, lower rates of friend-groups taking part together, and an MIT-specific aversion to the reading group format, each of which we will try to fix next semester.
Insufficient focus on programming exciting for group members/organizers, and too much focus on intro-friendly programs. For example, MAIA neglected running advanced meetings for already engaged members at MIT until late in the term, which also hindered strong community formation we saw at Harvard. This was understandable given that it was fall semester and the groups were new, but we somewhat fell into the trap of trying to appeal to newer students at the expense of making group involvement fun for experienced students interested in alignment (especially at MIT).
Inadequate task management and organizational structure. Next semester, we’ll plan this out more to reduce organizer stress, ambiguity, redundant work, and communication costs (e.g. organizers not knowing who was doing what).
Lack of office space at MIT. MAIA suffered substantially from not having a physical office. Whereas almost all HAIST meetings took place in the office, making the operations easy and the atmosphere professional and legitimate, MAIA meetings were scattered throughout MIT classrooms.
Late launch of the Research Fellowship. The research program did not officially launch until late October (since we weren’t planning on running one until late), meaning most students did not do much research. We also didn’t invest enough effort into getting existing HAIST/MAIA members to work on research during term.
Taking too long to get to more technical material during our intro fellowship. For example, many intro fellows identified the material on recent interpretability work (e.g. circuits, causal tracing, and superposition) as their favorite part of the intro fellowship. But this material didn’t come until the 7th and 8th weeks, after many participants had already dropped out! (Other fan favorites include specification gaming and goal misgeneralization.) In redrafting the intro fellowship curriculum, we’re looking for ways to introduce this technical material sooner.

Next Steps/Future Plans

At this stage, we’re most focused on addressing mistakes and opportunities for improvement on existing programming (see above). Concretely, some of our near-term top priorities are:

Setting up office space for MAIA.
Setting up and sharing infrastructure and resources for AI alignment university programming with organizers at other universities (e.g., our technical and governance curricula).
Improving our programming for already engaged students (e.g. paper implementation groups, an Alignment 201 program, research opportunities, ML skill-building opportunities, etc)
Creating and sharing opportunities for extended engagement (strong overlap with the above), especially over winter and summer breaks.

How You Can Get Involved

Mentor + advise junior researchers/students (remotely and in person). Following this semester’s successes, we are likely to have many more junior members who are interested in and capable of helping with alignment and governance research than mentors to support them. Contact Xander or Kuhan to express interest (on the Forum, FB messenger, or email - xanderlaserdavies@gmail.com and kuhanjey@gmail.com).
Visit us, especially during retreats. Several retreat guests got very positive feedback from attendees and, we think, accelerated several careers. Other researchers who were not able to join for retreats did well-received Q&As at our office (see the Q&As bullet in the summary). Online talks + Q&As are also welcome.
Give us feedback, whether via the email addresses above, in the comments, or at EAGxBerkeley this weekend.
Apply to our MLAB-inspired ML bootcamp in January 2023 in partnership with the Cambridge Boston Alignment Initiative.
If you’re interested in supporting the alignment community in our area, the Cambridge Boston Alignment Initiative is currently hiring.

This is fantastic, thank you for sharing. I helped start USC AI Safety this semester and we're facing a lot of the same challenges. Some questions for you -- feel free to answer some but not all of them:

What does your Research Fellows program look like?
- In particular: How many different research projects do you have running at once? How many group members are involved in each project? Have you published any results yet?
- Also, in terms of hours spent or counterfactual likelihood of producing a useful result, how much of the research contributions come from students without significant prior research experience vs. people who've already published papers or otherwise have significant research experience?
- The motivation for this question is that we'd like to start our own research track, but we don't have anyone in our group with the research experience of your PhD students or PhD graduates. One option would be to have students lead research projects, hopefully with advising from senior researchers that can contribute ~1 hour / week or less. But if that doesn't seem likely to produce useful outputs or learning experiences, we could also just focus on skilling up and getting people jobs with experienced researchers at other institutions. Which sounds more valuable to you?
What about the general member reading group?
- Is there a curriculum you follow, or do you pick readings week-by-week based on discussion?
- It seems like there are a lot of potential activities for advanced members: reading groups, the Research Fellows program, facilitating intro groups, weekly social events, and participating in any opportunities outside of HAIST. Do you see a tradeoff where dedicated members are forced to choose which activities to focus on? Or is it more of a flywheel effect, where more engagement begets more dedication? For the typical person who finished your AGISF intro group and has good technical skills, which activities would you most want them to focus on? (My guess would be research > outreach and facilitation > participant in reading groups > social events.)
- Broadly I agree with your focus on the most skilled and engaged members, and I'd worry that the ease of scaling up intro discussions could distract us from prioritizing research and skill-building for those members. How do you plan to deeply engage your advanced members going forward?
Do you have any thoughts on the tradeoff between using AGISF vs. the ML Safety Scholars curriculum for your introductory reading group?
- MLSS requires ML skills as a prerequisite, which is both a barrier to entry and a benefit. Instead of conceptual discussions of AGI and x-risk, it focuses on coding projects and published ML papers on topics like robustness and anomaly detection.
- This semester we used a combination of both, and my impression is that the MLSS selections were better received, particularly the coding assignments. (We'll have survey results on this soon.) This squares with your takeaway that students care about "the technically interesting parts of alignment (rather than its altruistic importance)".
- MLSS might also be better from a research-centered approach if research opportunities in the EA ecosystem are limited but students can do safety-relevant work with mainstream ML researchers.
- On the other hand, AGISF seems better at making the case that AGI poses an x-risk this century. A good chunk of our members still are not convinced of that argument, so I'm planning to update the curriculum at least slightly towards more conceptual discussion of AGI and x-risks.
How valuable do you think your Governance track is relative to your technical tracks?
- Personally I think governance is interesting and important, and I wouldn't want the entire field of AI safety to be focused on technical topics. But thinking about our group, all of our members are more technically skilled than they are in philosophy, politics, or economics. Do you think it's worth putting in the effort to recruit non-technical members and running a Governance track next semester, or would that effort better be spent focusing on technical members?

Appreciate you sharing all these detailed takeaways, it's really helpful for planning our group's activities. Good luck with next semester!

These are all fantastic questions! I'll try to answer some of the ones I can. (Unfortunately a lot of the people who could answer the rest are pretty busy right now with EAGxBerkeley, getting set up for REMIX, etc., but I'm guessing that they'll start having a chance to answer some of these in the coming days.)

Regarding the research program, I'm guessing there's around 6-10 research projects ongoing, with between 1 and 3 students working on each; I'm guessing almost none of the participants have previous research experience. (Kuhan would have the actual numbers here.) This program just got started in late October, so certainly no published results yet.

I'm guessing the mentors are not all on the same page about how much of the value comes from doing object-level useful research vs. upskilling. My feeling is that it's mostly upskilling, with the exception of a few projects where the mentor was basically taking on a RA for a project they were already working on full-time. In fact, when pitching projects, I explicitly disclaimed for some of them that I thought they were likely not useful for alignment (but would be useful for learning research skills and ML upskilling).

It sounds like in your situation, there's a lack of experienced mentors. (Though I'll note that a mentor spending ~1 hour per week meeting with a group sounds like plenty to me.) If that's right, then I think I'd recommend focusing on ML upskilling programming instead of starting a research program. My thoughts here are: (1) I doubt participants will get much mileage out of working on projects that they came up with themselves, especially without mentors to help them shape their work; (2) poorly mentored research projects can be frustrating for the mentees, and might sour them on further engaging with your programming or AI safety as a whole; (3) ML upskilling programming seems almost as valuable to me and much easier to do well.

Regarding general member programming: for our weekly reading group, we pick readings week-by-week, usually based on someone messaging a group chat saying "I'd really love to read X this week." (X was often something that had come out in the last week or so.) I don't think this wasn't an especially good way to do things, but we got lucky and it mostly worked out.

That said, I think most of the value here was from getting a bunch of aligned people in a room reading something and discussing with each other. If you don't already have a lot of people sold on AI x-risk and with a background similar to having completed AGISF, I think it'd be better to run a more structured reading group rather than doing something like this.

Like we mentioned in the post, we think that we actually underinvested in developing programming for our members to participate in (instead putting slightly too much work into making the intro fellowship go well). Most of our full members were too busy for the research program, and the bar for facilitating for our intro fellowship was relatively high (other than Xander, all of our facilitators were PhD students or people who worked full-time on AIS). So the only real thing we had for full members were the weekly general member meetings and the retreats at the end of the semester.

For the typical person who finished your AGISF intro group and has good technical skills, which activities would you most want them to focus on? (My guess would be research > outreach and facilitation > participant in reading groups > social events.)

I think my ordering would be

research > further ML upskilling > reading groups > outreach

with social events not really mattering much to me, and facilitating not being an option for most of them, thanks to our wealth of over-qualified facilitators. I'm not sure how this should translate to your situation, sorry.

Regarding the intro fellowship, we hadn't really considered MLSS at all, and probably we should have. I think we were approaching things from a frame separating our programming into things that require coding (ML upskilling) and things that don't (AGISF), but this was potentially a mistake. The MLSS curriculum looks good, I agree that it seems better at getting people research-ready, and I'll think about whether it makes sense to incorporate some of this stuff for next semester -- thanks for this suggestion!

One dynamic to keep in mind is that when you advertise for an AI educational program, you'll get a whole bunch of people who are excited about AI and don't care much about the safety angle (it seems like lots of the people we attracted to our research program were like this). To some extent this is okay -- it gives a chance to persuade people who would have otherwise gone into AI capabilities work! -- but I think it's also worth trying not to spend resources teaching ML to people who will just go off and work in capabilities. One nice thing about AGISF is that it starts off with multiple weeks on safety, allowing people who aren't interested in safety to self-select out before the technical material. (And the technical content is mostly stuff that I'm not worried is could advance capabilities anyway.) So if you've noticed that you have a lot of people sticking around to the end of your curriculum without really engaging with the safety angle, I might recommend front-loading some AGISF-style safety content.

Anyway, above-and-beyond anything I say above, I think my top piece of advice is to have a 1-1 call with Xander (or more if you've spoken with him already). I think Xander is really good at this stuff and consistently made really good judgement calls in the process of building HAIST and MAIA, and I expect he'd be really helpful in helping you think through the same issues in your context at USC.

Great to hear! Maybe I'll see some of you next year.

Congrats all, it seems like you were wildly successful in just 1 semester of this new strategy!

I have a couple of questions:

130 in 13 weekly reading groups

= 10 people per group, that feels like a lot and maybe contributed to the high drop rate. Do you think this size was ideal?

Ran two retreats, with a total of 85 unique attendees

These seem like huge retreats compared to other university EA retreats at least, and more like mini-conferences. Was this the right size, or do you think they would have been more valuable as more selective and smaller things where the participants perhaps got to know each other better?

two weekly AI governance fellowships with 15 initial and 14 continuing participants.

This retention rate seems very high, though I imagine maybe these were mostly people already into AI gov and not representative of what a scaled-up cohort would look like. Do you plan to also expand AI governance outreach/programming next term?

Overall, I'm really glad your doing all these things and paving the way for others to follow--we'll seek to replicate some of your success at Stanford :)

What does your Research Fellows program look like?
- In particular: How many different research projects do you have running at once? How many group members are involved in each project? Have you published any results yet?
- Also, in terms of hours spent or counterfactual likelihood of producing a useful result, how much of the research contributions come from students without significant prior research experience vs. people who've already published papers or otherwise have significant research experience?
- The motivation for this question is that we'd like to start our own research track, but we don't have anyone in our group with the research experience of your PhD students or PhD graduates. One option would be to have students lead research projects, hopefully with advising from senior researchers that can contribute ~1 hour / week or less. But if that doesn't seem likely to produce useful outputs or learning experiences, we could also just focus on skilling up and getting people jobs with experienced researchers at other institutions. Which sounds more valuable to you?
What about the general member reading group?
- Is there a curriculum you follow, or do you pick readings week-by-week based on discussion?
- It seems like there are a lot of potential activities for advanced members: reading groups, the Research Fellows program, facilitating intro groups, weekly social events, and participating in any opportunities outside of HAIST. Do you see a tradeoff where dedicated members are forced to choose which activities to focus on? Or is it more of a flywheel effect, where more engagement begets more dedication? For the typical person who finished your AGISF intro group and has good technical skills, which activities would you most want them to focus on? (My guess would be research > outreach and facilitation > participant in reading groups > social events.)
- Broadly I agree with your focus on the most skilled and engaged members, and I'd worry that the ease of scaling up intro discussions could distract us from prioritizing research and skill-building for those members. How do you plan to deeply engage your advanced members going forward?
Do you have any thoughts on the tradeoff between using AGISF vs. the ML Safety Scholars curriculum for your introductory reading group?
- MLSS requires ML skills as a prerequisite, which is both a barrier to entry and a benefit. Instead of conceptual discussions of AGI and x-risk, it focuses on coding projects and published ML papers on topics like robustness and anomaly detection.
- This semester we used a combination of both, and my impression is that the MLSS selections were better received, particularly the coding assignments. (We'll have survey results on this soon.) This squares with your takeaway that students care about "the technically interesting parts of alignment (rather than its altruistic importance)".
- MLSS might also be better from a research-centered approach if research opportunities in the EA ecosystem are limited but students can do safety-relevant work with mainstream ML researchers.
- On the other hand, AGISF seems better at making the case that AGI poses an x-risk this century. A good chunk of our members still are not convinced of that argument, so I'm planning to update the curriculum at least slightly towards more conceptual discussion of AGI and x-risks.
How valuable do you think your Governance track is relative to your technical tracks?
- Personally I think governance is interesting and important, and I wouldn't want the entire field of AI safety to be focused on technical topics. But thinking about our group, all of our members are more technically skilled than they are in philosophy, politics, or economics. Do you think it's worth putting in the effort to recruit non-technical members and running a Governance track next semester, or would that effort better be spent focusing on technical members?