I think this is a relatively interesting idea as a stand-alone thing (i.e. running more week-long research sprints). This seems good and like it would be pretty useful in the AI safety community, and is underserved at the moment.
I don't really think the framing of this as an alternative to ARENA makes much sense. Because ARENA's main bottleneck to scaling hasn't really been TAs. I mean it's a bottleneck, don't get me wrong, but I wouldn't call it the main one. I also think having a less-structured, more research-driven model would probably require more TA involvement? If you wanted to do it well and have it be accessible to early-stage people, at least.
I'm confused about the evidence given that ARENA functions primarily as a signaling mechanism. I think that is certainly an aspect of ARENA (as with MATS, as with the completion of any high-quality program). But the fact that some people have done AI safety research before ARENA is weak evidence of this to me (I could go into more detail about this in private, but not willing to do so publicly since it pertains to ARENA's selection process which I do not want to be gamed).
The fact that people from compressed versions of ARENA (ARBOx, which is a great program!) also go on to do great things doesn't seem like evidence of this to me. In fact, this seems like evidence that completing a structured curriculum just makes you a better applicant to other programmes. Not sure how this is being interpreted (since this depends on what you perceive to be the signaling value of ARBOx, which I think is probably not as high as ARENA's. I think it should be higher because it's a great programme!)
Also we do put out impact reports where people self-assess as having improved their ability to do certain concrete tasks that we think are valuable in research. Won't go into it in detail here because we've done so in impact reports in the past. E.g. The report for ARENA 6 is here https://www.lesswrong.com/posts/WiergDX4ufcLbvwzs/arena-6-0-impact-report (Side note that I would love to see more reports like this from other programs in the ecosystem, and on a more regular basis).
I have more thoughts but will leave it there. Think this project is probably worth pursuing independently (although if you're getting early-stage people to do research-sprints, as I said earlier, I think you do need high-quality and involved mentorship that looks similar to TAs). Also think there's a lot of people doing somewhat similar things, but maybe not quite 1-week sprints.
Thanks, James, for the detailed thoughts and for reading through the post. I'll respond once here. If we want further back and forth, better to have a chat in private so we can iron out our cruxes (and then summarize for community benefit). I'd also want to hear what others in community think before committing to anything.
> Because ARENA's main bottleneck to scaling hasn't really been TAs
I am happy to defer to you regarding the scaling bottlenecks of Arena. That's not a big crux for the proposal.
> I'm confused about the evidence given that ARENA functions primarily as a signaling mechanism
Maybe the word signaling isn't correct. Let me try to explain. When I point out that there are four people who did ARBOx and are now doing elite fellowship programs, my hunch is that those four had a very good chance of getting into those elite programs, even if they hadn't done ARBOx. Furthermore, if ARBOx did provide a significant boost to their profile/skillset, then one needs to consider how much extra value the extra three weeks at ARENA are providing. Another way of saying this is that ARBOx and Arena and these elite programs have similar selection processes. And so Arena or ARBOx accepting someone is strongly correlated with the fact that they have high potential for future AI safety research, regardless of how much value they add on top.
> I also think having a less-structured, more research-driven model would probably require more TA involvement? If you wanted to do it well and have it be accessible to early-stage people, at least.
I do not consider participants of ARENA to be 'early stage'. In my mind they are mid-stage (i.e. middle of upskilling towards a full-time researcher role) and most participants would be able to do solid research sprints without having gone through ARENA. My proposal is based on helping such mid-stage researchers. I think something like BlueDot (at least, BlueDot in 2024, I dont know about current BlueDot) or AISC targets early-stage researchers.
> Also we do put out impact reports where people self-assess as having improved their ability
My claim (which I have not really justfied, except to defer to Neel Nanda's post) is that the counter-factual of doing four mini research sprints would be significantly higher impact. This could be the central crux.
> Side note that I would love to see more reports like this from other programs in the ecosystem, and on a more regular basis
100%. Thanks for doing this and being a role model!
Sure, I agree with most of that. I think this is probably mostly based on counterfactuals being hard to measure, in two senses:
The first is the counterfactual where participants aren't selected for ARENA, do they then go on to do good things. We've taken a look at this (unpublished) and found that for people who are on the margin attendance at ARENA has an effect. But then that effect could be explained by signaling value. It's basically difficult to say. This is why we try and do start-of-program and end-of-program surveys to measure this. But different viewpoints are available here because it is difficult to measure definitively.
The second is the counterfactual where people spend 4 weeks doing research sprints. I basically do expect that to be more effective if you require the ARENA materials as prerequisites, but I think it would then be hard to actually get applicants to such a programme (since people generally struggle to work through ARENA materials themselves). But maybe something else could work here. I actually kind of expect the counterfactual of that to be pretty low due to margin-based reasoning, where there exist many research-oriented programmes already, but relatively fewer upskilling-oriented programmes. But again, difficult to know definitively what's more valuable on current margins (though I do think on current margins is the relevant question).
My guess is these are the two cruxes? But unsure.
The first is the counterfactual where participants aren't selected for ARENA, do they then go on to do good things
This is not crux for me. I believe ARENA provides counter-factual value compared to not doing ARENA. You work much harder during ARENA than you otherwise would, in great environment, great support, etc.
> The second is the counterfactual where people spend 4 weeks doing research sprints.
This is crux. And agreed it is hard to measure!
Thanks for engaging thoughtfully. Useful to think things through.
"The content of the existing ARENA notebooks could be a prerequisite for the new program"
I don't think this would work very well. If you were super disciplined and you took one day every two weeks to work through one notework, you'd spend most of a year just to qualify for the program.
Also, shifting Arena to focus on research sprints would, in a sense, reduce the diversity of the field in that most other programs focus more on research than developing ML skills. If one program were to shift to doing research spints, I suspect it'd actually be better for a program that already focuses on research to do that.
> If you were super disciplined and you took one day every two weeks to work through one notework, you'd spend most of a year just to qualify for the program
I believe: 1) you don't need to diligently work through a whole notebook to get most of the value of the notebook and 2) the majority of the value of ARENA is contained in a subset of the notebooks. Some reasons:
1a) The notebooks are often, by design, far more work than is possible to do in a day. Even in ARENA, where you have pair programming, TAs on hand, great co-working space, lunch and dinner provided, etc.. Note a 'day' here is roughly 5.5-6.5 hours. (10am to 6pm, morning lecture at 10, lunch at 12, break at 3.30)
1b) Even for the shorter notebooks, it is often only manageable to complete in a day if you skip some exercises, or cheat and peak at the solution. (This is recommended by ARENA, and I agree with this recommmendation given time constraints.)
1c) There are 5 (or 6) LARGE mech interp notebooks for final three days of mech interp week. One recommendation is to try two notebooks on the Wed and Thu, then continue with the one you prefer on Friday. So I saw 2 out of the 5 notebooks when I participated in ARENA. Despite this, I was still able to TA during mech interp. It was bit frantic, but I would skim the start of each of the notebooks I didn't understand, enough that I could help people unblock or to explain key ideas. I feel I got good percent of the value that the ARENA participants got out of those other notebooks without having done a single exercise in them.
2a) In ARBOx2, the schedule was (comma represents different days)
- Week 1: CNNs, Transformers, Induction circuit, IoI circuit, [another mech interp notebook. cant remember which. likely SAEs]
- Week 2: RL day 1, RL day 2, project, project, project.
The biggest thing missing from this IMO is the theory of impact exercise from second half of ARENA evals day 1. Otherwise, for the calibre of ppl doing ARENA, a quick skim of the other notebooks gives majority of the value.
I would recommend ARBOx over ARENA because of the time efficiency. You get high percentage of value of ARENA, but in 40% of the time.
> most other programs focus more on research than developing ML skills
I dont think ARENA focusses on ML skills. Week 0 has content directly supervised ML, and only a small (but crucial!) part of ML, namely, writing networks in pytorch and creating training loops. Week 2 has content on RL. But given time constraints, many other parts of ML aren't covered in depth, e.g. how to do hyper-parameter tuning (most of the time just use the hyper-parameters provided, there's no time to actually do hyper-parameter tuning), how to even tell if hyper-parameters are the issue, data collection and cleaning, cluster management, selecting GPUs, etc.
Some brief feedback on the structure:
IMO ARENA is about bringing skilled coders to the research frontier, and showing how to quickly run experiments. If you instead make ARENA a prereq, you will lose out on many talented coders who don't have time to complete it independently. So I would consider this moreso a follow-up to ARENA to teach research skills than a replacement.
TLDR
I propose restructuring the current ARENA program, which primarily focuses on contained exercises, into a more scalable and research-engineering-focused model consisting of four one-week research sprints preceded by a dedicated "Week Zero" of fundamental research engineering training. The primary reasons are:
Context and disclaimers
The Core Problem with the Current ARENA
My primary concern is that the skills learned in the current ARENA program are not the bottleneck for the AI Safety ecosystem.
The Proposed Research Sprint Format
The alternative program structure would be a four-week sequence of mini-research sprints, with each week having a different AI safety theme, plus an introductory Week Zero. This aligns with the advice from researchers like Neel Nanda on upskilling in mechanistic interpretability—study the relevant material, then start mini-sprints.
Application Process: ARENA Knowledge as a Prerequisite
The content of the existing ARENA notebooks could be a prerequisite for the new program.
Program Structure
Week
Theme/Focus
Goal
Week Zero: Dedicated Training
The goal for this (optional) week is to teach the actual skills needed for research.
The Software Engineering Week
A potential alternative for Week 4 is a pure Software Engineering Week, where participants contribute to open-source packages in collaboration with open-source maintainers. This is an excellent way to teach hard software engineering skills and build up "taste" for good software, which is a growing concern with the rise of LLM coding.
Partnership and Mentoring
To maximize value, ARENA could partner with research programs like MATS.
ML4Good best practices
Any new structure should embed the good practices of programs like ML4Good to create a positive learning environment, a sense of community, and a safe space for both personal and technical growth. For details, see my post about it.
Scalability
The new model is significantly easier to scale:
Potential Downsides
One potential downside is the reduced incentive for the ARENA team to create new ARENA-style notebooks (e.g., for control research). However, since the team is already heavily bottlenecked on time for new notebook development, this might not be a real disadvantage. Both systems suffer from the same staffing problem.
Another downside is the implication that this has to replace ARENA. This could just be a separate parallel initiative. However, I do actually believe that the ARENA team and ARENA participants are better served moving more to a model I am suggesting.
I am actually struggling to think of downsides. I asked Gemini and here are its thoughts along with my counters:
I asked Claude to review this post and it came up with some other downsides. Again, Claude's comments followed by mine.
Making It Happen
If you think this is a good idea, then the obvious question is how do we make this happen? Unfortunately, I probably don't have the time to make this happen, but I'd definitely like to be involved. Possible next steps include:
If you have any feedback or want to get involved, please share in the comments.