"The content of the existing ARENA notebooks could be a prerequisite for the new program"
I don't think this would work very well. If you were super disciplined and you took one day every two weeks to work through one notework, you'd spend most of a year just to qualify for the program.
Also, shifting Arena to focus on research sprints would, in a sense, reduce the diversity of the field in that most other programs focus more on research than developing ML skills. If one program were to shift to doing research spints, I suspect it'd actually be better for a program that already focuses on research to do that.
> If you were super disciplined and you took one day every two weeks to work through one notework, you'd spend most of a year just to qualify for the program
I believe: 1) you don't need to diligently work through a whole notebook to get most of the value of the notebook and 2) the majority of the value of ARENA is contained in a subset of the notebooks. Some reasons:
1a) The notebooks are often, by design, far more work than is possible to do in a day. Even in ARENA, where you have pair programming, TAs on hand, great co-working space, lunch and dinner provided, etc.. Note a 'day' here is roughly 5.5-6.5 hours. (10am to 6pm, morning lecture at 10, lunch at 12, break at 3.30)
1b) Even for the shorter notebooks, it is often only manageable to complete in a day if you skip some exercises, or cheat and peak at the solution. (This is recommended by ARENA, and I agree with this recommmendation given time constraints.)
1c) There are 5 (or 6) LARGE mech interp notebooks for final three days of mech interp week. One recommendation is to try two notebooks on the Wed and Thu, then continue with the one you prefer on Friday. So I saw 2 out of the 5 notebooks when I participated in ARENA. Despite this, I was still able to TA during mech interp. It was bit frantic, but I would skim the start of each of the notebooks I didn't understand, enough that I could help people unblock or to explain key ideas. I feel I got good percent of the value that the ARENA participants got out of those other notebooks without having done a single exercise in them.
2a) In ARBOx2, the schedule was (comma represents different days)
- Week 1: CNNs, Transformers, Induction circuit, IoI circuit, [another mech interp notebook. cant remember which. likely SAEs]
- Week 2: RL day 1, RL day 2, project, project, project.
The biggest thing missing from this IMO is the theory of impact exercise from second half of ARENA evals day 1. Otherwise, for the calibre of ppl doing ARENA, a quick skim of the other notebooks gives majority of the value.
I would recommend ARBOx over ARENA because of the time efficiency. You get high percentage of value of ARENA, but in 40% of the time.
> most other programs focus more on research than developing ML skills
I dont think ARENA focusses on ML skills. Week 0 has content directly supervised ML, and only a small (but crucial!) part of ML, namely, writing networks in pytorch and creating training loops. Week 2 has content on RL. But given time constraints, many other parts of ML aren't covered in depth, e.g. how to do hyper-parameter tuning (most of the time just use the hyper-parameters provided, there's no time to actually do hyper-parameter tuning), how to even tell if hyper-parameters are the issue, data collection and cleaning, cluster management, selecting GPUs, etc.
I think this is a relatively interesting idea as a stand-alone thing (i.e. running more week-long research sprints). This seems good and like it would be pretty useful in the AI safety community, and is underserved at the moment.
I don't really think the framing of this as an alternative to ARENA makes much sense. Because ARENA's main bottleneck to scaling hasn't really been TAs. I mean it's a bottleneck, don't get me wrong, but I wouldn't call it the main one. I also think having a less-structured, more research-driven model would probably require more TA involvement? If you wanted to do it well and have it be accessible to early-stage people, at least.
I'm confused about the evidence given that ARENA functions primarily as a signaling mechanism. I think that is certainly an aspect of ARENA (as with MATS, as with the completion of any high-quality program). But the fact that some people have done AI safety research before ARENA is weak evidence of this to me (I could go into more detail about this in private, but not willing to do so publicly since it pertains to ARENA's selection process which I do not want to be gamed).
The fact that people from compressed versions of ARENA (ARBOx, which is a great program!) also go on to do great things doesn't seem like evidence of this to me. In fact, this seems like evidence that completing a structured curriculum just makes you a better applicant to other programmes. Not sure how this is being interpreted (since this depends on what you perceive to be the signaling value of ARBOx, which I think is probably not as high as ARENA's. I think it should be higher because it's a great programme!)
Also we do put out impact reports where people self-assess as having improved their ability to do certain concrete tasks that we think are valuable in research. Won't go into it in detail here because we've done so in impact reports in the past. E.g. The report for ARENA 6 is here https://www.lesswrong.com/posts/WiergDX4ufcLbvwzs/arena-6-0-impact-report (Side note that I would love to see more reports like this from other programs in the ecosystem, and on a more regular basis).
I have more thoughts but will leave it there. Think this project is probably worth pursuing independently (although if you're getting early-stage people to do research-sprints, as I said earlier, I think you do need high-quality and involved mentorship that looks similar to TAs). Also think there's a lot of people doing somewhat similar things, but maybe not quite 1-week sprints.
Thanks, James, for the detailed thoughts and for reading through the post. I'll respond once here. If we want further back and forth, better to have a chat in private so we can iron out our cruxes (and then summarize for community benefit). I'd also want to hear what others in community think before committing to anything.
> Because ARENA's main bottleneck to scaling hasn't really been TAs
I am happy to defer to you regarding the scaling bottlenecks of Arena. That's not a big crux for the proposal.
> I'm confused about the evidence given that ARENA functions primarily as a signaling mechanism
Maybe the word signaling isn't correct. Let me try to explain. When I point out that there are four people who did ARBOx and are now doing elite fellowship programs, my hunch is that those four had a very good chance of getting into those elite programs, even if they hadn't done ARBOx. Furthermore, if ARBOx did provide a significant boost to their profile/skillset, then one needs to consider how much extra value the extra three weeks at ARENA are providing. Another way of saying this is that ARBOx and Arena and these elite programs have similar selection processes. And so Arena or ARBOx accepting someone is strongly correlated with the fact that they have high potential for future AI safety research, regardless of how much value they add on top.
> I also think having a less-structured, more research-driven model would probably require more TA involvement? If you wanted to do it well and have it be accessible to early-stage people, at least.
I do not consider participants of ARENA to be 'early stage'. In my mind they are mid-stage (i.e. middle of upskilling towards a full-time researcher role) and most participants would be able to do solid research sprints without having gone through ARENA. My proposal is based on helping such mid-stage researchers. I think something like BlueDot (at least, BlueDot in 2024, I dont know about current BlueDot) or AISC targets early-stage researchers.
> Also we do put out impact reports where people self-assess as having improved their ability
My claim (which I have not really justfied, except to defer to Neel Nanda's post) is that the counter-factual of doing four mini research sprints would be significantly higher impact. This could be the central crux.
> Side note that I would love to see more reports like this from other programs in the ecosystem, and on a more regular basis
100%. Thanks for doing this and being a role model!
Sure, I agree with most of that. I think this is probably mostly based on counterfactuals being hard to measure, in two senses:
The first is the counterfactual where participants aren't selected for ARENA, do they then go on to do good things. We've taken a look at this (unpublished) and found that for people who are on the margin attendance at ARENA has an effect. But then that effect could be explained by signaling value. It's basically difficult to say. This is why we try and do start-of-program and end-of-program surveys to measure this. But different viewpoints are available here because it is difficult to measure definitively.
The second is the counterfactual where people spend 4 weeks doing research sprints. I basically do expect that to be more effective if you require the ARENA materials as prerequisites, but I think it would then be hard to actually get applicants to such a programme (since people generally struggle to work through ARENA materials themselves). But maybe something else could work here. I actually kind of expect the counterfactual of that to be pretty low due to margin-based reasoning, where there exist many research-oriented programmes already, but relatively fewer upskilling-oriented programmes. But again, difficult to know definitively what's more valuable on current margins (though I do think on current margins is the relevant question).
My guess is these are the two cruxes? But unsure.
The first is the counterfactual where participants aren't selected for ARENA, do they then go on to do good things
This is not crux for me. I believe ARENA provides counter-factual value compared to not doing ARENA. You work much harder during ARENA than you otherwise would, in great environment, great support, etc.
> The second is the counterfactual where people spend 4 weeks doing research sprints.
This is crux. And agreed it is hard to measure!
Thanks for engaging thoughtfully. Useful to think things through.
Some brief feedback on the structure:
IMO ARENA is about bringing skilled coders to the research frontier, and showing how to quickly run experiments. If you instead make ARENA a prereq, you will lose out on many talented coders who don't have time to complete it independently. So I would consider this moreso a follow-up to ARENA to teach research skills than a replacement.
My main disagreement is it does not deal with the reality of who actually participates in ARENA. If the AI Safety community could magically coordinate perfectly, then ARENA would serve the role you're describing, but as of now, I think the participants who do ARENA are better served by doing research sprints rather than the ARENA notebooks. See comment by sturb below for one participants perspective
I've worked on admissions/strategy/evaluation for ARBOx2 and ARBOx3, and I'd broadly endorse JamesH's comments—I'd be very excited to see the programme you describe, but as a next step/complement to ARENA-type programmes rather than a substitute.
A few thoughts on the signalling/upskilling conversation:
Based on surveys and private anecdata, it seems like a large part of ARBOx's value comes from building confidence, motivation, and a sense of direction for participants. I think it's hard to replicate this benefit through self-studying the ARENA curriculum, and I'd say it's a significant component of ARBOx's past/ongoing impact. This could explain ARBOx's strong placement record despite a compressed runtime (since these benefits probably scale sub-linearly with more weeks).
Moreover, participants reported that ARBOx was helpful for them "developing technical skills in ML for AI safety" (avg 9.15/10 agreement, much higher than statements pertaining to perceived signalling benefits). Of course, different participants benefit from ARBOx in different ways, and end-of-programme self-reports can be misleading (we're doing a six-month follow-up now), but I think the "signalling" story is at best incomplete.
I also don't think the signalling story follows from many ARENA/ARBOx applicants having prior safety research experience. Eyeballing the survey data, participants with (more) prior safety research experience didn't find ARBOx any less useful for developing their technical skills (if anything, the opposite is true, although there are too many confounders to infer anything meaningful).
A few thoughts on time-efficiency:
I do think ARBOx is pretty time-efficient, which is very important to a lot of our participants (e.g. it fits in the two weeks between New Year and Oxford term starting). I don't dispute that a minority of ARENA notebooks comprise the majority of the curriculum's value, and that many ARBOx participants get >40% of the value they'd get from ARENA in 40% of the time. For this reason, I'd be excited to see more people put on ARBOx-style programmes.
In some cases, though, participants aren't so time-constrained, and gains per week isn't really the thing you care about —it seems like most participants would upskill faster from 5 weeks of ARENA + a few week-long sprints than 2 weeks of ARBOx + 3 weeks of ??? + a few week-long sprints. For what it's worth: 14/19 ARBOx2 survey respondents reported they were happy with its length, 5/19 would have preferred a longer programme, and nobody wanted a shorter programme.
I would agree that as much as I enjoyed doing ARENA and think the materials are very high quality, most of the really valuable stuff could be compressed into a week (the content on learning how transformers work and a few parts of the mech interp material are significantly more important than the rest), and the mental training from solving specific problems has not transferred into doing better research for me to a significant extent. Research on transfer learning across domains has also shown remarkably poor results in general.
This might sound odd because it might seem like the pedagogy of ARENA is really strong, because the program is very hard, the material is very technical, it's made by really smart people, and really smart, capable people go through the program and benefit. Doing the program is almost certainly a significantly better counterfactual use of time than not doing the program for most AI safety people.
However, I think a lot of the benefits stem from being able to spend time around those sorts of people and engaging in research in a collaborative environment. I don't think a great deal of the benefit is derived from the pedagogy itself, because for most people, working through the notebooks doesn't meaningfully translate much to research. I think this is doubly true in the age of agentic coding where a lot of the nitty gritty details that are the bulk of the notebooks can be relatively safely abstracted away.
I can think of many excellent, highly technical interpretability researchers who never did anything like the material covered in the notebooks, but who are producing excellent research today. The really hard parts of research are developing taste about the field, having a sense of what good experimental design looks like, asking the right questions and sniffing out how to extract the most information, and to some extent the technical skills associated with writing code. ARENA primarily aims to fix the last, which is also the part most exposed to being addressed by agentic coding (of course having the underlying knowledge is important, but doing the research and reflecting with others will also impart this knowledge).
The most valuable things I received from ARENA were the confidence to pursue research, the validation of being accepted into the program, the relationships I developed with other people during the program, and the exposure to the people/environment at LISA. If the program shifted to help expose people to quickly forming teams, working together, developing an interesting question, and getting a deeper sense of a particular area through a research sprint, this would cover more of the key skills required to do research, provide all the key benefits listed above, and allow for technical exposure and upskilling. I am less certain of scrapping the in-person program, as maybe it could function in exactly the same way at first.
I could see this modified program being similarly valuable to me today, having already completed MATS, as to someone starting out much earlier. I would also probably be more likely to recommend someone starting out in the field to participate in such a program.
I prefer just in time learning over just in case learning because it's much more time efficient. Developing the core skills of research seem more likely to serve a young researcher well, and they'd also benefit more from the friendships and collaborations, and potentially have interesting threads to pull on after the program ended. If they desperately need to pick up the skills from one of the ARENA weeks, they could presumably pick up the core parts in around a week.
One is that many participants at ARENA have already done AI Safety research before participating. Second evidence is that at least four ARBOx (a 2-week compressed version of ARENA) are doing elite AI safety fellowships (1 Anthopic Fellows Program, 2 LASR Labs, 1 MATS).
I don't think this is evidence that ARENA is about signalling:
On 3., I think the four ARBOx fellows cited above did ARBOx first, then went on to AFP etc. I understand the argument to be "despite ARBOx being only two weeks, it has a good placement record, so why is ARENA 2.5x longer?"
1 and 2. This is subjective and more a gut feeling, but I think doing ARENA after having done LASR or MATS is not a good use of time, especially against the counter-factual of doing 4 research sprints. In my mind (and without more context), doing MATS and then ARENA would be a counter-signal - "How come you need to ARENA after having done MATS?"
3. See James Lester's reply.
TLDR
I propose restructuring the current ARENA program, which primarily focuses on contained exercises, into a more scalable and research-engineering-focused model consisting of four one-week research sprints preceded by a dedicated "Week Zero" of fundamental research engineering training. The primary reasons are:
[Edit: as discussed in the comments, on reflection the scalability is not a primary issue or benefit.]
Context and disclaimers
The Core Problem with the Current ARENA
My primary concern is that the skills learned in the current ARENA program are not the bottleneck for the AI Safety ecosystem.
The Proposed Research Sprint Format
The alternative program structure would be a four-week sequence of mini-research sprints, with each week having a different AI safety theme, plus an introductory Week Zero. This aligns with the advice from researchers like Neel Nanda on upskilling in mechanistic interpretability—study the relevant material, then start mini-sprints.
Application Process: ARENA Knowledge as a Prerequisite
The content of the existing ARENA notebooks could be a prerequisite for the new program.
Program Structure
Week
Theme/Focus
Goal
Week Zero: Dedicated Training
The goal for this (optional) week is to teach the actual skills needed for research.
The Software Engineering Week
A potential alternative for Week 4 is a pure Software Engineering Week, where participants contribute to open-source packages in collaboration with open-source maintainers. This is an excellent way to teach hard software engineering skills and build up "taste" for good software, which is a growing concern with the rise of LLM coding.
Partnership and Mentoring
To maximize value, ARENA could partner with research programs like MATS.
ML4Good best practices
Any new structure should embed the good practices of programs like ML4Good to create a positive learning environment, a sense of community, and a safe space for both personal and technical growth. For details, see my post about it.
Scalability
[Edit: I no longer think this is an important or defining feature.]
The new model is significantly easier to scale:
Potential Downsides
[Edit: the strongest downsides have been suggested by commenters. Humans still got an edge over AI.]
One potential downside is the reduced incentive for the ARENA team to create new ARENA-style notebooks (e.g., for control research). However, since the team is already heavily bottlenecked on time for new notebook development, this might not be a real disadvantage. Both systems suffer from the same staffing problem.
Another downside is the implication that this has to replace ARENA. This could just be a separate parallel initiative. However, I do actually believe that the ARENA team and ARENA participants are better served moving more to a model I am suggesting.
I am actually struggling to think of downsides. I asked Gemini and here are its thoughts along with my counters:
I asked Claude to review this post and it came up with some other downsides. Again, Claude's comments followed by mine.
Making It Happen
If you think this is a good idea, then the obvious question is how do we make this happen? Unfortunately, I probably don't have the time to make this happen, but I'd definitely like to be involved. Possible next steps include:
If you have any feedback or want to get involved, please share in the comments.