Great course! Thanks for the writeup, Roy.
I'm a grantmaker at cG, and we'd be excited to see more universities put on this seminar! If you're a faculty member or graduate student at a research university and interested in running a version of this course, please fill out this interest form. We may be able to help with administrative costs.
This fall Boaz Barak taught Harvard’s first AI safety course (course website). Boaz has done an excellent job organizing and promoting the material; you may have seen his original post on LW. I was the head TA for the course, and helped Boaz run the course (alongside Natalie Abreu, Sunny Qin, and Hanlin Zhang). I wanted to give an inside look into the course and share a lot more of our material in order to help future iterations of the course go well, both at Harvard and similar courses at other universities.
I think this course was a good thing which more universities should do. If you are in a position to try to get a similar course running at your university, this post is for you, and aims to walk you through course logistics and course decisions. Feel invited to reach out with questions or comments. The syllabus is public, and all of the lectures are shared on youtube here.
This course was structured as a research seminar where students learned to replicate important AI safety papers and produce novel research. You can see their projects here.
Basic organization in bullet form:
Assignment Structure
Assignment 0 (Filtering for the course)
The process of filtration was a combination of a “homework 0” score, interest, and background
Midterm assignment
The midterm assignment was a recreation of one of five papers that we suggested, with a small open-ended twist. An example here is something like trying to see how different forms of prompting may affect a result, or the robustness of the results on a different model or dataset.
Final assignment
Everyone had to do a poster and a paper. Most people formed groups of 2-4.
Final projects had 2 flavors:
I solicited a lot of project ideas from different people, which I then shared with students (at the bottom of cs2881 Final Project Outline). One example was something I was personally interested in: a method for identifying what model generated text (model fingerprinting), given blackbox access to the model (2 groups did projects relating to this). Other projects were solicited from other researchers like one on developing a benchmark for AI scams, or validating a theoretical model of how misinformation spreads.
About 2 weeks after receiving the assignment, all the students had to meet with a TA once, and create a 1 page proposal for their project, shared with the rest of the course. It didn’t seem like most students looked at each other’s proposals, and most of the value was in forcing students to write something down. Realistically, I don’t see a good way to have the students engage across projects, one minor change that is possible here is to force students to give comments on 1 to 2 other projects. This would have been more possible if we started the final projects earlier.
In order to facilitate this, the TAs had to post more office hours, and keep a relatively strict schedule. Most groups met with me (because I was the most “AI safety pilled”), which meant that I had to enforce fairly strict 30-minute meetings with groups.
Upon reflection, I think this is a reasonably good structure. Having-to-meet forces students to come prepared, and do a lot of thinking on their own. It’s fairly fast to get a lot of context on how students are doing when you have high-context on the problem space.
Students submitted a 5-10 page neurips-style assignment on December 3rd, and then came to class on December 10th with a printed out poster.
Weekly Optional Presentations
Nearly every week a group of students volunteered to give a ~15 minute presentation on the experiments that they ran relating to the lecture that week. I think this structure is a very good idea, but is also the part of the course that could have benefited the most from more structure. These optional presentations clearly involved a lot of work, but were not for a grade, nor did they substitute other work. In the future, I would make these assignments either worth extra-credit or potentially could be used to replace the midterm assignment. Separately, these presentations were extremely open-ended, and in order to make these more productive for both the students presenting and the students listening, I think that the course staff should have offered the students more guidance about what we were expecting here.
Take a look at some of the LW articles that the students have posted - we had 11 posts by students!
General reflections
This course helped me solidify the role that academia can play in today’s climate: a lot of papers out there in the field of AI safety somewhat have the flavor of “we tried a thing and it worked.” These papers can be really valuable, but one place for academia is in picking these things up and inspecting them for how robust they really are - when do they break, and what isn’t stated in the papers (i.e. how specific are their claims).
I think this course would not have gone as well as it did if it was an undergraduate course. A lot of the structure that was applied somewhat relied on students wanting to learn, and generally de-emphasized grades. Our ability to give students guidance for projects and the course staff’s ability to grade relied on students trying to engage with the material in earnest and giving our semi-under-specified guidance the benefit of the doubt.
Grading is hard, in general we want to make something that is hard to do and easy to verify. Having students recreate a headline figure from a paper is a fairly decent way to keep things easy-to-verify while requiring meaningful engagement with the material. Requiring a “meaningful spin-off” also forces students to go a bit beyond running claude-code on an existing code base. Perhaps in the future AI tools will also make this too easy for students, but at this current iteration this was decent.
I think we had the correct number of assignments, and correct general structure. The mini-project helped students figure out groups that they wanted to collaborate with, without being too binding.
A lot of the lectures were guest lectures. One might be concerned that this only works well for Boaz, who is extremely well-connected in both academia and industry. I think this is partially true, but my reflection is that a very similar version of this course can be very meaningful even if the guest lectures are not from famous researchers.
One thing that I think was meaningful is making the work that the students did legible to outside people (e.g. we put the papers and the posters on our website, and also promoted this course, and also encouraged the students to post on LW). This helps the students improve their resumes so they can do more work in the field of AI safety.
Lastly, I think it’s important to share that I spent a lot of time on this course. I don’t share this to receive a pat-on-the-back, but rather because I think that the course going well relies on a great professor, but also on a course staff that is willing to spend a lot of time and energy in making the course work out. In future iterations, this may become less important, because the general structure will be more structured and specified.
Recommendations for future iterations:
I found that the the general structure of our course was good, and in future iterations I would keep many things the same. Here are some things I would change or emphasize:
Aggregated Student Perspectives on the Course
We issued 2 sets of anonymous questionaires (one official one through Harvard which got 35 responses, and one through a google form, which got 21 responses). Generally the course appeared very well received, and the criticisms were fairly mild and limited. The full "Q-report" from Harvard is available here with the anonymous feedback from the students.
We are happy to share more information about how the students received the course, but here are some course evaluations we surface:
A google form was sent out, and 21/69 responses were given, which I then ran through ChatGPT to summarize. (I did very minimal filtering after that.) The Harvard-issued report was very similar in the kinds of responses we received.
Overall Experience
Students generally described the course as high quality, engaging, and well-structured, with a strong emphasis on exposure to real research and researchers.
Commonly cited strengths included:
Overall, the course was perceived as enjoyable, intellectually stimulating, and effective at exposing students to frontier ideas.
How the Course Influenced Students’ Future Plans
Shifts in Perspective
Many students reported that the course:
Some students noted that while their high-level stance did not radically change, they now felt more grounded and informed.
Future Intentions
Students’ stated plans varied, but common directions included:
Student Critiques
While feedback was largely positive, several recurring critiques emerged.
Overall, the course appears to have lowered barriers to engagement with the field, whether through direct research involvement or more informed participation in adjacent areas.
Conclusion
I'm extremely happy to have helped make this course a reality alongside Boaz, Natalie, Sunny, and Hanlin. I welcome comments on this post, and would be happy to engage with people trying to spin up similar courses at other universities/organizations - some conversations have already started.
Wrap-up: Summary of all the resources we make public: