ARENA 4.0 Impact Report

JamesH; James Fox

Congratz on your successes and thank you for publishing this impact report.

It leaves me unsatiated related to cost effectiveness though. With no idea of how much money was invested in this project to get this outcome, I don't know if Arena is cost effective compared to other training programs and counterfactual opportunities. Would you mind sharing at least something about the amount of funding this got?

Re

Still, it is also positive if ARENA can help participants who want to pursue a career transition test their fit for alignment engineering in a comparatively low-cost way.

it doesn't strike me that a 5 week all expenses paid program is a particularly low cost way to find out AI Safety isn't for you (as compared to for example participating in an Apart Hackathon)

[-]James Fox1y41

Thank you for your comment.

We are confident that ARENA's in-person programme is among the most cost-effective technical AI safety training programmes:
- ARENA is highly selective, and so all of our participants have the latent potential to contribute meaningfully to technical AI safety work
- The marginal cost per participant is relatively low compared to other AI safety programmes since we only cover travel and accommodation expenses for 4-5 weeks (we do not provide stipends)
- The outcomes set out in the above post seem pretty strong (4/33 immediate transitions to AI safety roles and 24/33 more actively pursuing them)
- There are lots of reasons why technical AI safety engineering is not the right career fit for everyone (even those with the ability). Therefore, I think that 2/33 people updating against working in AI safety after the programme is actually quite a low attrition rate.
- Apart Hackathons have quite a different theory of change compared with ARENA. While hackathons can be valuable for some initial exposure, ARENA provides 4-weeks of comprehensive training in cutting-edge AI safety research (e.g., mechanistic interpretability, LLM evaluations, and RLHF implementation) that leads to concrete outputs through week-long capstone projects.

[-]JaimeRV1y10

Thanks for sharing this! Great to see the impact of ARENA!

According to the OpenPhil public grant[1] this iteration of Arena got £245,895, and with this you were able to achieve the points mentioned in this post right?

Also it is great to hear that there are 4 new people working in AIS thanks to the program! It would be nice to know how did you manage it (and what was the counterfactual). Getting 4 people through full hiring processes within 4 weeks seems impresive, did you manage because they got jobs at orgs who were also at LISA? or there were other networking effects or other factors that made this possible?

[1] https://www.openphilanthropy.org/grants/alignment-research-engineer-accelerator-ai-safety-technical-program-2024/

^{^}

Note: Some details of this section have been redacted, in order that key details and aspects of how we select and choose participants remain private to avoid potential issues that may arise in our selection process in the future.

^{^}

Note: “Conducting alignment research” only includes those who are currently working full-time on alignment research (independently, as mentee/intern, or employed), not those who have in the past or are working part-time on alignment. This was not self-reported by the participants, but annotated by us based on their CV, so there may be some inaccuracies.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

45

ARENA 4.0 Impact Report

45

45

Summary

Programme Information

ARENA 4.0 Programme

Main Changes

Method

Criteria 1: Sourcing high-quality participants^[1]

Selection process

Who we selected

Improvements

Criteria 2: Upskilling

Week 0: Fundamentals

Week 1: Mechanistic Interpretability

Week 2: LLM Evaluations

Week 3: Reinforcement Learning

Overall Learning Experience

Criteria 3: Integration

Criteria 4: Career Acceleration

Overall Programme Experience

Most valuable gain

Improvements

Acknowledgments

45

ARENA 4.0 Impact Report

45

45

Summary

Programme Information

ARENA 4.0 Programme

Main Changes

Method

Criteria 1: Sourcing high-quality participants[1]

Selection process

Who we selected

Improvements

Criteria 2: Upskilling

Week 0: Fundamentals

Week 1: Mechanistic Interpretability

Week 2: LLM Evaluations

Week 3: Reinforcement Learning

Overall Learning Experience

Criteria 3: Integration

Criteria 4: Career Acceleration

Overall Programme Experience

Most valuable gain

Improvements

Acknowledgments

Criteria 1: Sourcing high-quality participants^[1]