Thanks for the guide, ARENA is fantastic and I highly recommend it for people interested in learning interpretability!
I'm currently working through the ARENA course now. I completely skipped week 0 because I've done similar content in other courses and university and I'm on section Week 1: Transformer Interpretability now. I'm studying part time so I'm hoping to get through most of the content in a few months.
Most importantly and maybe surprisingly, while the program is called "Alignment Research Engineering Accelerator", there is actually almost no content on how to align AI systems
Yeah, I think the assumption is that people can learn this through programs like AISF and then do ARENA just to focus on the skills.
If you're working independently, the answer is probably no.
Agreed. The ARENA content is developed for the bootcamp format. If you're not doing it as a bootcamp, you probably don't want to do all the content as 1 day of content per week given 4 weeks of content time and five days of content per week would be 20 days or half a year. That's a long time to delay getting started on projects which is what will actually unlock further AI safety opportunities, rather than just working through content. Furthermore, it'll be hard to keep up with the ARENA pace without the supporting infrastructure around you.
Some things are important to learn (like how to log with wandb), but there is nothing conceptually interesting about doing it. For such exercises, I think it's justified to look at the solution and simply remember how to do it.
Stuff with wandb, or plotting data, is prime work for a LLM to just do it for you, and save valuable human time on the actual import code itself. I think it's totally valid to highlight the training loop in Cursor, ask "please rewrite to log loss/accuracy/etc. to wandb" and then eyeballing the result.
I've recently completed the in-person ARENA program, which is a 5-week bootcamp teaching the basics of safety research engineering (with the 5th week being a capstone project). Sometimes, I talk to people who want to work through the program independently and who ask for advice. Even though I didn't attempt this, I think doing the program in-person gives me some insight into how to get most out of the program when doing it independently, so here are my thoughts and tips:
On working speed
How should you approach each day?
Should you do all the material?
If you're working independently, the answer is probably no. Not all material is equally valuable to everyone. In an in-person program it makes sense to have everyone work through the same material at the same pace since otherwise it becomes difficult to pair people up for pair-programming, and for TAs to prepare for questions. But if you're alone, you probably want to skip more material that isn't as interesting to you personally.
Which days/sections are valuable to do?
This is very subjective, but here I'll give you my assessment of which days or sections in days are valuable to do. Probably other people's opinions will differ. Also, note that the program keeps evolving, so it's possible that my opinions and advice are out-of-date once you read this post.
What is missing?
There is some material that would be useful to learn but which is missing from the ARENA material. I hope some of this material will be added in the future.
Most importantly and maybe surprisingly, while the program is called "Alignment Research Engineering Accelerator", there is actually almost no content on how to align AI systems (except the policy-optimization stage of RLHF). There is no content on scalable oversight.
There is also not yet any content on AI control.
Should you work through the material alone?
I don't know, it depends on your ability to motivate yourself to work through a large amount of material on your own, and also on your prior skills. Here is a somewhat cheap test:
This should be a reasonably cheap test. Notice how long you need to complete each of the days (remembering that [0.0] likely takes more than a day) and use this to assess whether it's worth it for you to work through a larger chunk of the material alone. If you want to do the program in-person instead, apply to the next iteration by June 21st, 2025, or go to their website to check for later iterations.
One caveat here is that I was already somewhat familiar with much of the content on a conceptual/theoretical level. It's possible that other people gain more from actually reading the provided material.
When I say "completely" here or later, I mean everything except the bonus material.