Learnings from AI safety course so far

Thanks for teaching this and writing these updates!

it is easier for me to get speakers from OpenAI

FWIW, I’m a bit surprised by this. I’ve heard of many AI safety programs succeeding in getting a bunch of interesting speakers from across the field. In case you haven’t tried very hard, consider sending some cold emails, the hit rate is often high in this community.

[-]Ryan Kidd2mo20

I can help with this, if you like

[-]boazbarak2mo81

To be clear - I am getting some great lecturers from out of OpenAI - confirmed non-OAI guest lecturers in this course at this point are:

* Nicholas Carlini (Anthropic)

* Keri Warr (Anthropic)

* Joel Becker (METR)

* Buck Shlegeris (Redwood Research)

* Marius Hobbhahn (Apollo Research)

* Neel Nanda (Google DeepMind)

* Jack Lindsey (Anthropic)

It's just that my "hit rate" with OpenAI is higher, and I also have some more "insider knowledge" on who is likely to be a great fit. I am trying not to make the course a collection of external people giving their "canned talks" but rather have each lecture really be something coherent that fits with what the students saw before. This is also why I am not making the course fully reliant on guest lecturers.

[-]Bryce Robertson1mo10

Nice course, I've added the curriculum to AISafety.com/self-study.

[-]scronkfinkle2mo10

How to get involved in this?

Aspects that are working:

Experiments are working well! I am trying something new this semester - every lecture there is a short presentation by a group of students who are carrying out a small experiment related to this lecture. (For example, in lecture 1 there was an experiment on generalizations of emergent misalignment by @Valerio Pepe ). I was worried that the short time will not allow groups to do anything, but so far have been pleasantly surprised! Also, even when some approaches fail, I think it is very useful to present that. Too often we only hear presentations of the "final product" where actually hearing about the dead ends and failures is maybe even more useful for researchers.

Students are very thoughtful! I am learning a lot from discussions with the students. The class has a mix of students from a lot of backgrounds - undergraduate and graduates, scientists, engineers, law, etc.. It's also good to be outside of the "SF bubble." The typical experience in office hours is that no one shows up, and if they do they have a technical question about homework or exam. In this course, office hours often involve fascinating discussions on the future of AI and its impact. Yesterday we had a lecture on the model spec, and a group exercise on writing their own specs. The discussion was fascinating (and some of it might inform me in thinking of future changes to the AI model spec).

Pre-reading works well. We are using a platform called Perusall for pre-reading which enables students to comment on the pre-reading and discuss it with each other. Going through their comments and seeing what they find confusing or problematic helps inform me for future lectures. Not all students leave substantive comments but many do.

Zoom hasn't been as bad as I feared. I am teaching most lectures but we will have guest lecturers, and some are not able to travel. In lecture 3 we had the first remote guest lecturers - Nicholas Carlini and Keri Warr (Anthropic) - and it still managed to be interactive. I think the fact that all students watching are in the same classroom and I can bring over the mike to them makes it better than a pure zoom only seminar.

Aspects that perhaps could work better:

Time management is not my strong suit. On Thursday the group exercise took much longer than I planned for (in retrospect I was unrealistic) and we had to bump up the experiment to the next lecture and to skip some material. I am already trying to be more realistic for next lecture, and will move the material on responsible scaling and catastrophic risks to October 16.

Striking balance between technical and philosophical. I will need to see if we strike the right balance. While the projects/experiments are technical, while the lectures sometimes are more "philosophical" (though I did go into some of the math in reinforcement learning etc.. in lecture 2). It's hard to have weekly homework in a course like that and so we will just have a mini project and then a final project (plust the experiments but these are just one group per lecture). I always have a fear that students are not learning as much when we are too philosophical and they don't get enough hands on experience.

Breadth course means we can't go too deep into any topic. This is a "breadth course" where lectures include technical topics such as alignment training methods, or jailbreak attacks, more societal question such as economic impacts of AI, policy topics such as catastrophic risks, responsible scaling, and regulation, and more. Many of these topics deserve their own course, and so this is more of a "tasting menu" course. I am not sure I want to change that.

Aspects I am unsure of

There is obviously a potential conflict of interest with me teaching a course like that while also being a member of the alignment team at OpenAI. (I have a dual position which is part time at OpenAI and part time as a Catalyst Professor at Harvard.) I am trying to ensure a diversity of speakers and to ensure the course is not just from an OpenAI perspective. But of course it is challenging, since it is easier for me to get speakers from OpenAI, and also I am more deeply familiar with topics like deliberative alignment or our model spec. I am not yet sure how well I am navigating this issue, but I will hear from the students in the end.

Anyway, I encourage people who have feedback to comment here. (In particular if you are taking the course...)

LESSWRONG
LW

LESSWRONG
LW

106

Learnings from AI safety course so far

106

106

Aspects that are working:

Aspects that perhaps could work better:

Aspects I am unsure of