ARENA 5.0 Impact Report

JamesH; James Fox

The impact report from ARENA’s prior iteration, ARENA 4.0, is available here.

Summary:

The purpose of this report is to evaluate ARENA 5.0’s impact according to ARENA’s four success criteria:

Source high-quality participants;
Upskill these talented participants in ML skills for AI safety work;
Integrate participants with the existing AI safety community;
Accelerate participants’ career transition into AI safety.

Overall, this iteration of ARENA was highly successful according to our success criteria. We are delighted that our 28 in-person programme participants rated their overall enjoyment of the ARENA programme at 9.3/10, representing our highest satisfaction score to date.

Criterion 1: Our participants were of a strong calibre, coming from diverse backgrounds and bringing a wealth of different expertise with them. Notably, 11 participants either held or were pursuing doctoral degrees in technical fields, and 5 had over one year’s professional experience as software engineers. Other participants came from diverse backgrounds including data science, forecasting, computer engineering and neuroscience. Compared to ARENA 4.0, this iteration included fewer software engineers and similar professionals but more participants holding or pursuing doctoral degrees. This was not a deliberate choice, but our selection process placed high value on those who could demonstrate substantial engagement with technical AI safety and safety-relevant topics, which may explain this shift in demographic.

Criterion 2: Our in-person programme delivered substantial upskilling across all technical domains; when asked to rate out of 10 how satisfied they were that they had achieved their pre-programme goals, participants responded with an average of 8.8/10, with 9 out of 28 respondents responding with a 10/10 score. After the programme’s conclusion, participants were significantly more confident in mechanistic interpretability (improving from 3.4 to 6.1 on average), reinforcement learning (3.2 to 6.4 on average), and LLM evaluations (3.9 to 7.5 on average) – see Figure 1 for these statistics. Most impressively, participants’ confidence in specific LLM evaluation design tasks increased from 4.2/10 to 9.0/10 (Figure 18), demonstrating our curriculum’s effectiveness in developing practical AI safety skills.

Our in-person taught programme lasts 4 weeks. On average, participants estimated their counterfactual time to learn the full ARENA content unsupervised would have been 9.3 weeks. Two participants felt the programme was too short; another two felt it was too long. The rest felt that the duration was ‘just right’.

Criterion 3: Participants rated the value of being in the LISA environment as 9.6/10 (Figure 22), our highest recorded score yet (compared to ARENA 4.0’s 8.9/10). This underscores the value of hosting ARENA at the LISA workspace, and we are grateful for the continued opportunity to host our programmes here. Participants’ post-programme feedback highlighted their appreciation of the connections, community-building, events and functions that LISA facilitates.

Criterion 4: Participants’ confidence that technical AI safety is right for them increased from 7.9/10 to 8.4/10 (Figure 23). Furthermore, participants reported an average of 8.9/10 agreement that the programme left them in a significantly stronger position to contribute to technical AI safety, indicating ARENA’s impact on career acceleration (Figure 24). At the end of the programme, 8 participants stated that they held confirmed offers to pursue (or to continue pursuing) full-time work in technical AI safety within four months, mainly as research scientists or doctoral students. It should be noted that for the most part, these individuals already held these offers at the time they started the programme; in such cases, these individuals’ desire to participate in ARENA was largely driven by the fact that our programme’s content would enable them better to perform these roles. At the close of the programme, a further 19 participants were either actively applying or planning to apply to full-time roles in technical AI safety.

Programme Information

First, we outline when the programme took place, what topics were covered, and the main changes made to the programme in contrast to previous iterations. For more information about our curriculum’s content, see our website.

ARENA 5.0 Programme

ARENA 5.0 ran from April 28th – May 30th 2025. The schedule of the programme was as follows:

Neural Networks Fundamentals (optional): 28th April – 2nd May
Transformers & Mechanistic Interpretability: 5th May – 9th May
Reinforcement Learning: 12th May – 16th May
LLM Evaluations: 19th May – 23rd May
Capstone Projects: 26th May – 30th May

Main Changes:

The main changes for ARENA 5.0 compared with ARENA 4.0 (ran in Q3 2024) were:

Staff Changes: James Hindmarch and Joly Scriven acted as Programme Lead and Operations Lead respectively for this iteration. David Quarel resumed his role as Head TA, while Nicky Pochinkov, Dennis Akar, Callum McDougall, and Adam Newgas acted as a rotating cast of TAs during the programme. James Fox adopted an advisory role for this iteration.

Increased Career Focus:

Acting on feedback from ARENA 4.0 that the offramping process from the programme was somewhat abrupt, the Capstone Week doubled up as a career-oriented week. When participants were not working on their Capstone Projects, we guided them to develop professional connections in the field and plan for their next steps post-ARENA. To this end, we organised three careers talks – delivered by Marius Hobbhahn (of Apollo Research), Joseph Bloom and Shannon Yang (both of UK AISI) – which included advising and Q&A sessions to address participants’ uncertainties ahead of navigating the job market in AI safety. Additionally, we gave participants the opportunity to have 1-1 discussions with ARENA alumni now working in the field, giving them further support and insight into what doors ARENA might open for them. These efforts were met with positive feedback, and we plan to continue these provisions in future iterations.

Additionally, we now provide a LinkedIn certification for completing the programme, which we hope will be useful for participants in pursuing further careers in AI safety.

Improved Accommodation:

Accommodation was identified as an area requiring improvement from ARENA 4.0. This was addressed, and there was evidence that our efforts on this front paid dividends in achieving our stated aims. We increased our accommodation budget, accommodation was booked significantly earlier, and each property was visited individually to ensure its fitness for purpose (comfort, cleanliness, and proximity to the LISA workspace). Participants’ attendance during ARENA 5.0 was excellent, with an average daily attendance of 96% and all absences justified.

Automation of Processes:

As the programme has evolved from the previous iteration – with new branding, a new website, and so forth – the operations of the programme have evolved in keeping with this progression, notably through improved daily feedback mechanisms and the provision of automated travel and reimbursement software. This has proven effective; administrative burden was flagged as a pain point during ARENA 4.0, but no such issues materialised this time. As we hoped, this enabled a greater focus on the programme’s content, epitomised by one participant’s comment that ‘the overall feeling of being supported in every sense was very comforting, and it was so easy to concentrate on just the programme itself’. This is in keeping with our aspirations for ARENA’s continued development in the future.

Order of Curriculum:

ARENA 5.0 saw the Reinforcement Learning curriculum delivered in Week 2 of the programme and the LLM Evaluations content delivered in Week 3 of the programme. This is a reversal of the previous order in ARENA 4.0. We didn’t see a substantial difference between participants’ engagements with these areas from the changed order.

Methods of Impact Assessment:

We surveyed our participants at the programme’s start (prior to day 1) and at the end (on the last day). The surveys were intentionally constructed to test the same domains, enabling a detailed ‘before and after’ analysis to gauge the programme’s impact. We also conducted 2-minute daily feedback surveys – quickly accessible to participants via QR codes – at the end of each day of the programme, but the bulk of this impact report relies on the pre- and post-programme survey data.

We collected three types of responses:

Numerical ratings (out of 10);
- For skills-based questions, ‘1’ designated a complete beginner and ‘10’ a complete expert, or ‘totally disagree’ and ‘totally agree’ as appropriate.
Multiple choice;
Open-ended questions and responses.

We evaluated open-ended responses using thematic analysis. We highlighted keywords in each response, identified recurring themes and patterns across responses, reviewed the themes, and then counted the frequency of each theme across participant responses. Each count comes from a different participant, but each participant can add to multiple theme counts if their response mentions them.

Criterion 1: Sourcing high-quality participants

ARENA’s primary selection criteria for participants remain (1) their likelihood to pursue AI safety rather than general AI development, despite teaching skills applicable to both, and (2) their technical aptitude relevant to AI safety research/engineering (especially their skill in programming). Building on lessons from ARENA 4.0, we selected more experienced participants on average, recognising the value of strengthening the technical skills and community connections of people who will be able to contribute to AI safety in the very near future.

Overall, our updated selection procedure worked effectively. ARENA 5.0 attracted a high-calibre cohort of 28 participants^[1], who demonstrated strong foundational technical skills from the outset. One participant commented that they were especially impressed with the technical quality of their peers given ARENA’s relatively streamlined selection process.

Selection process

Initial applications for ARENA 5.0 opened on January 30th and closed on February 15th 2025. The coding test ran from February 20th until March 1st 2025. Interviews ran from March 6th until March 13th 2025. Building on feedback from ARENA 4.0, where participants indicated that one month's notice was insufficient for their planning needs, we provided participants with 1.5 months' advance notice of acceptance to ARENA 5.0.

Who we selected

We selected 33 participants (of whom 5 were unable to attend) from ~350 applications. ARENA 5.0 had a geographically diverse cohort of high quality participants, with participants coming from the UK, EU, USA, Canada, Australia, Georgia, China, the Philippines, India, and Singapore. Compared with ARENA 4, this cohort contained more participants pursuing or holding doctoral degrees. Our selection process favoured those who could demonstrate substantial engagement with technical AI safety and safety-relevant topics, which may explain this shift in demographic.

Participants were high calibre and included:

Full-time (software) engineers: 4
AI alignment researchers: 8
Participants with published AI safety work: 4
Those with PhDs and PhD students in technical fields: 11

Our participants also came from diverse fields, including data science, pure mathematics, neuroscience, and forecasting.

Participant Quality Indicators

The quality of our participant selection was evidenced by their pre-programme technical competencies, first assessed by us during the coding test and interviews. Participants entered the programme with solid foundations across key domains:

Neural Network Fundamentals: Average pre-programme self-assessment of 6.8/10, indicating participants already possessed substantial knowledge of core ML concepts.
PyTorch Proficiency: Average pre-programme self-assessment of 5.0/10, demonstrating reasonable familiarity with essential ML tooling yet substantial room for improvement.

These pre-programme scores reflect our successful identification of participants with the technical aptitude necessary to engage meaningfully with ARENA's challenging curriculum whilst having room for substantial skill development.

Selection Process Effectiveness

The technical baseline of our participants validated our selection methodology. Unlike complete beginners, our cohort possessed foundational knowledge necessary to tackle technical AI safety concepts immediately, whilst still having room for growth in specialised domains relevant to AI safety:

In mechanistic interpretability, participants self-assessed their skills at 3.4/10 pre-programme, indicating appropriate entry-level knowledge for intensive upskilling
In reinforcement learning, participants self-assessed their skills on average at 3.2/10 pre-programme, positioning participants well for comprehensive RL upskilling
In LLM evaluations, participants self-assessed their skills 3.9/10 pre-programme on average, demonstrating slightly more familiarity with evaluations, but still substantial room for upskilling.

This skill distribution – strong programming foundations with targeted, safety-relevant knowledge gaps – represents an ideal participant profile for maximising ARENA’s impact through focused upskilling in AI safety-specific technical domains.

Criterion 2: Upskilling

Our core goal is to upskill participants to tackle technical problems in AI safety. The first four weeks of the ARENA in-person programme cover four technical topics (more detail on each topic is provided in the relevant sections):

Neural Networks Fundamentals (optional): Deep learning foundations and PyTorch proficiency

Mechanistic Interpretability: Transformer analysis, superposition, circuit identification, and other mechanistic interpretability techniques

Reinforcement Learning: Classical and deep RL methods, policy optimisation, and RLHF implementation

LLM Evaluations: Eval design and threat modelling, creating evals, infrastructure for evals and agent evaluations

Week 0: Fundamentals

The aim of this week is for participants to reinforce basic deep learning concepts. This week had 23 participants, as it was optional for those with significant deep-learning experience (5 participants chose to skip this week). Topics covered included PyTorch, basics of neural networks, residual neural networks, CNNs, weights and biases, optimisation, and backpropagation.

At the end of the programme, participants self-assessed as having strengthened their foundational ML skills significantly:

Neural Network Fundamentals: Improvement from 6.82/10 to 8.35/10 on average (+1.53 points)
PyTorch Proficiency: Substantial improvement from 5.04/10 to 7.43/10 on average (+2.39 points)

The PyTorch improvement is particularly noteworthy, representing a significant increase in self-assessed competency. This foundational upskilling ensures participants can implement and experiment with AI safety research ideas effectively. However, the improvements here are more modest than in other areas of the curriculum; this is to be expected, as this week is an optional refresher course for the essential ML toolkit required later in the course.

If left to their own devices, participants estimated that self-studying the Fundamentals content to the same degree of proficiency would have taken them 2.2 weeks on average, compared to the 1 week spent on the content with us.

Week 1: Transformers and Mechanistic Interpretability

The aim of this week is for participants to understand some of the methods that can be used to analyse model internals and replicate the results from key interpretability papers. Topics covered include the following: GPT models, training and sampling from transformers, TransformerLens, induction heads, indirect object identification, superposition, linear probes, inference-time intervention, and sparse autoencoders. We had three speakers during this week: Callum McDougall of Google DeepMind, Joseph Bloom of UK AISI, and Neel Nanda of Google DeepMind.

Participants showed remarkable progress in mechanistic interpretability, one of the most technically demanding areas of AI safety research:

Domain Knowledge: Improvement from 3.4/10 to 6.1/10 (+2.7 points)
Practical Application: Confidence in circuit identification tasks increased from 4.0/10 to 8.0/10 (+4.0 points)

This represents one of our most successful upskilling outcomes, transforming participants from beginners into competent practitioners capable of contributing to interpretability research either independently or collaboratively.

If left to their own devices, participants estimated that self-studying the Transformers and Mech Interp content to the same degree of proficiency would have taken them 3.1 weeks on average, compared to the 1 week spent on the content with us. This continues to underline that this week is arguably the most demanding section of our curriculum.

Week 2: Reinforcement Learning

This week’s core aim is for participants to understand classical and deep RL methods and how RLHF is implemented on LLMs as the dominant alignment method used today. Topics covered include the following: Fundamentals of RL, gym and gymnasium environments, policy gradient optimisation, PPO, deep Q-learning, RLHF, HuggingFace, and fine-tuning LLMs. We had two guest speakers during this week, both specialising in RL methods: Roberto-Rafael Maura-Rivero, a PhD from LSE, and Liza Tennant, a PhD from UCL (and one of our participants)!

The RL curriculum delivered substantial competency gains:

Domain Knowledge: Improvement from 3.2/10 to 6.4/10 (+3.2 points)
Technical Definitions: Understanding of core RL concepts (MDPs, policies, Bellman equations, etc.) increased from 4.5/10 to 8.4/10 (+3.9 points)
Practical Implementation: Confidence in building DQN agents improved from 4.0/10 to 8.2/10 (+4.2 points)

These results demonstrate participants’ transformation from RL novices to practitioners capable of implementing sophisticated RL algorithms and understanding RLHF methodologies for AI alignment.

If left to their own devices, participants estimated that self-studying the RL content to the same degree of proficiency would have taken them 2.7 weeks on average, compared to the 1 week spent on the content with us.

Week 3: LLM Evaluations

ARENA’s evals content aims for participants to build alignment and dangerous capability evaluations in multiple-choice and agentic settings, and understand how to use these evaluations to gain information about current frontier LLMs. Topics covered include the following: threat-modelling, using LLM APIs, implementing a pipeline to generate questions using LLMs, UK AISI’s inspect library, implementing LLM agents, and scaffolding LLM agents. We had two guest speakers this week: Marius Hobbhahn and Teun van der Weij, both from Apollo Research.

Our evals curriculum achieved the most dramatic improvement across the curriculum:

Domain Knowledge: Improvement in participants’ confidence in evals from 3.9/10 on average to 7.5/10 (+3.6 points)
Eval Design: Confidence in designing LLM evaluations increased from 4.2/10 to 9.0/10 (+4.8 points)

This exceptional outcome reflects both the curriculum’s effectiveness and the importance of evals skills for AI safety work. Participants now possess the capabilities to work on the design and implementation of safety evaluations for frontier AI systems.

If left to their own devices, participants estimated that self-studying the evals content to the same degree of proficiency would have taken them 1.3 weeks on average, compared to the 1 week spent on the content with us.

Overall Learning Experience

Furthermore, we asked participants how they found the ARENA materials overall. This helps us calibrate participant calibre across different ARENA cohorts and elicit feedback on the quality of our teaching mechanisms.

We also asked all 28 participants what their favourite aspect of the ARENA curriculum was. They responded as follows:

Fundamentals: 3
Transformers and Mechanistic Interpretability: 8
Reinforcement Learning: 6
LLM Evals: 3
Capstone Project: 8

Participants provided exceptionally positive feedback on the programme’s educational components:

Exercise Enjoyment: Average rating of 8.7/10, indicating participants found the curriculum engaging and motivating
Teaching Quality: Outstanding average rating of 9.3/10, representing our highest recorded score for instructional effectiveness
Goal Achievement: Average rating of 8.8/10 for whether participants felt they achieved their personal learning objectives at the start of the programme

These scores reflect both the high calibre of our participants and the effectiveness of our pedagogical approach. The teaching quality score of 9.3/10 is noteworthy, suggesting our curriculum design, TA expertise, and delivery methods have reached good standards – our task now is to maintain them, and improve wherever possible.

The strong goal achievement rating (8.8/10) indicates that ARENA successfully met participants’ diverse learning expectations, despite the cohort’s varied backgrounds and objectives. This validates our curriculum’s flexibility and comprehensiveness in addressing the multifaceted skill requirements for AI safety work.

Criterion 3: Integration

Our participants spent 4 to 5 weeks full-time in the LISA office in London. Overall, they really enjoyed their time in the LISA workspace. When asked to rate out of 10 how valuable it was to be based in LISA for the programme, our participants responded with an average score of 9.6/10. This is an exceptional score and highlights the unique value of LISA – with its speakers, researchers, events and non-stop activity – as a home for the ARENA programmes.

Similarly, when asked to rate ARENA’s logistical operations out of 10, participants responded with an average score of 9.5/10. We are delighted that we were able to provide our participants with the support they needed to concentrate on their learning outcomes, connect with their peers and integrate seamlessly into the AI safety community over these five weeks. LISA and its staff have been instrumental in ensuring a successful outcome on this front, and we are grateful for the continued opportunity to host our programmes here.

Beyond quantified scores, our participants’ comments provide more detail on specific themes.

Connections to/feeling like a part of the AI safety community

“I loved talking with people from diverse backgrounds and areas. This is an experience that I’d never had before.”
“[The] cohort vibes were the best. I really enjoyed the learning materials as well, and the feeling that we were supported in all possible ways. Didn't have second thoughts at any point during the programme.”
“[I most appreciated] the connections with people interested in the field, being able to attend all the talks held at LISA, and being able to go through all the content in such an intense and short period of time.”
“[I enjoyed] talking to people, networking, getting a better idea of the field, upskilling in AI safety areas which I lacked experience in (and I wouldn't necessarily do on my own).”
“[I enjoyed the] chats over meals, learning more about others’ ideas and experiences before the programme, and general knowledge exchange.”
“[The most valuable aspect of the programme for me was] probably the connections I made through the programme and through LISA.”

Access to top researchers

“It was nice to be able to listen to all the talks and have Q&As with people who are well-established in the field, and also to meet other researchers who happened to be passing through.”
“[I appreciated] knowing where to find the relevant information about specific research domains, and building connections.”
“[I now have] a bit more research taste. Talking about my opinions about the different research paths was a really good thing.”
“[I enjoyed] pair-programming, bouncing ideas off of others, and lunch talks with researchers.”
“I gained a plentiful supply of interview question answers from collaborating with researchers of diverse styles and backgrounds.”

Meeting like-minded talented people

“[I particularly liked] the opportunity to meet and discuss ideas with so many amazing people.”
“I took advantage of many opportunities to chat with other people working out of LISA (outside of the program) and the talks were frequent and great.”
“[I particularly liked] meeting people that are aligned with me and interested in this topic.”
“Given that the material is available online and (with much more effort) I could have gone through it by myself, I must say again [that the most valuable parts of the program were] the amazing connections, discussions and ideas shared between participants and TAs.”
“[My most valuable gain from the programme was] experience doing ML coding and [making] connections with the other participants. Update that working in person with passionate people is really valuable to me.”
“[I loved LISA because it gave me the chance to] share the space with other people who share my interests.”
“I felt like I got along with most people (they were friendly and fun to be around and talk to), and I got through the curriculum like I had wanted (my momentum was waning after the first week doing the material on my own at home).”

Criterion 4: Career Acceleration

Finally, ARENA aims to accelerate participants’ AI safety careers. We’re really excited about this cohort’s next steps deploying the skills that they acquired during ARENA. At the conclusion of the programme, we observed encouraging patterns in our participants’ career plans; thirteen participants stated they were actively applying for AI alignment roles at the time of the survey, with a further six saying that they were planning to do so after the programme ended. Most excitingly, eight participants had a confirmed full-time position in AI safety beginning in the four months post-ARENA. We are delighted about these outcomes, and they make us optimistic about ARENA’s impact into the future. It is immensely rewarding to witness the programme’s impact in securing one of ARENA’s core goals: to provide talented individuals with the skills to go directly into technical AI safety work.

We also saw a difference in participants’ confidence in AI safety being the right field for them. Prior to the programme, when asked to score out of 10 “How confident are you that AI safety is the right field for you?”, participants gave an average of 7.9/10. By the end of the programme, the same question was met with an average response of 8.4/10. The same two participants rated their confidence in this regard at 10/10 before and after the programme.

Below, we have provided selected feedback from participants highlighting the programme’s value as an in-person accelerator bootcamp; beyond the curriculum’s educational content, our programme’s in-person, collaborative structure makes it an equally valuable source of accountability, community and personal growth.

Motivation to learn and get unstuck

“Excellent lectures. Pair programming was great.”
“I’m more confident in my coding skills… And I’ve also found a passion and mission that drives me to continue on AI safety research.”
“I love pair programming. The pair programming setup was incredibly motivating and pushed me to think harder. “
“I think overall pair programming worked really well. When there were gaps in my understanding, especially around einops, Pytorch etc., partners were always super helpful in covering those gaps. I think it was also really important to have a good overview of the material for the day in morning lectures.”
“I liked pair-programming a lot, it made it easier to understand things compared to learning on my own.”
“I loved the pair-programming and exercises.”
“[I have] the confidence to work with PyTorch and set up my own models and training setups. Especially an interest in RL and how RLHF works for transformers.”
“I think it was useful that the programme pushed us “through” a few sticky points. I’m very happy I actually implemented a transformer and did some RL training on language models. I’m also happy about having reproduced many OthelloGPT results.”
“[I have] significantly more comfort with various ‘basic’ aspects of torch/neural networks. For example, broadcasting feels more like second nature, creating (basic) neural nets is now something I am not as intimidated by, etc.”
“[I have] a much stronger grasp on the technical concepts.”
“[My] main goal was to just get reps in doing technical work, which I definitely did.”
“[I have a] much better understanding of the math foundations behind transformers, [and a] better understanding of how common algorithms are structured.”

Immediate access to TAs

“I loved the content, the teaching, the TAs and the other students.”
“The TAs were always super helpful in getting through material that was hard to understand!”
“Shout out to the enthusiasm of the TAs!”
“David is very engaging. High quality throughout, great overall.”
“It was amazing how David taught us all the RL content in an amazingly short amount of time. David was a great TA!”
“I enjoyed all the enthusiasm in lectures.”
“David and James did an awesome job teaching! Other TAs were awesome as well, always happy and excited to help.”
“[TAs showed] great availability and willingness to help. And of course extremely high technical competence.”
"Great knowledge and intuition.
“I liked that the TAs were really knowledgeable and quick to answer questions.”

Overall Programme Experience

We asked the participants some key questions to gauge ARENA’s counterfactual impact and participants’ overall appreciation of the programme.

To what extent do you feel the ARENA programme has left you in a stronger position to work in technical AI safety?
- Average response: 8.9/10.
How much did you enjoy the programme overall?
- Average response: 9.3/10.
Do you feel you achieved your goals?
- Average response: 8.8/10.
To what extent do you feel being in-person at the LISA offices contributed positively to your experience on ARENA?
- Average response: 9.6/10.
How much did you enjoy the exercises, overall?
- Average response: 8.7/10.
How did you find the quality of the teaching?
- Average response: 9.3/10.
Did you feel ARENA was well run, logistically?
- Average response: 9.5/10.

Most Valuable Gain

We asked participants ‘What was the most valuable thing you gained from the programme?’ and thematically analysed their open-ended responses. 23 participants gave responses for this question, and some responses fell into more than one of the categories outlined below (making for a total of 40 tokens).

Meeting talented and like-minded people: 9 responses (39%)
ML skills and knowledge: 13 responses (57%)
Confidence to take on technical AI safety work: 6 responses (26%)
Ability to quickly execute on a project: 4 responses (17%)
Ability to bootstrap their own learning in the future: 7 responses (30%)
Building an impressive Capstone Project: 1 response (4%)

Given the nature of the programme – as an educational bootcamp with a central focus on collaboration through pair-programming – it is unsurprising that 96% of responses identify either technical skill acquisition or relationship-building as their most valuable gain from the programme. This speaks not only to the quality of the curriculum and ARENA’s TAs, but also to the extra-curricular elements of our in-person programme.

Notable examples of responses include:

Meeting talented and like-minded people

“The connections with people interested in the field, being able to attend all the talks held at LISA, and being able to go through all the content in such an intense and short period of time.”
“Talking to people, networking, getting a better idea of the field, upskilling in AIS areas which I lacked in (and I wouldn't necessarily do them on my own).”
“Given that the material is available online and (with much more effort) I could have gone through it by myself, I must say again the amazing connections, discussions and ideas shared between participants and TAs.”

ML skills and knowledge

“I think the single most valuable thing was probably the skill transfer [from collaboration during pair-programming]... though a deepened understanding of the AI safety community and context also seems very helpful.”
“Enough understanding of evals, RL, and mech interp tools that I didn't have before. [Now, I can] dig much deeper.”
“I think it was useful that the program pushed us “through” a few sticky points. I'm very happy I actually implemented a transformer and did some RL training on language models. I'm also happy about having reproduced many OthelloGPT results.”
“[I gained] a more clear perspective on the kind of work expected of a researcher in AI safety. Before this, I was participating in [programme: name redacted], but I learned a lot more here.”

Confidence to take on ML work

“Significantly more comfort with various ‘basic’ aspects of torch/neural networks. For example, broadcasting feels more like second nature, creating (basic) neural nets is now something I am not as intimidated by, etc.”
“Hard to pinpoint one thing, but probably just overall fluency in thinking and approaching this kind of research. I’ve read about rationality and safety stuff a lot but within this kind of an environment I felt that stuff supercharged and became much more comfortable overall with this content.”

Ability to quickly execute on an ML project idea

“I used to overthink theory of change… I feel that I can now take up some projects and work on them.”

Ability to bootstrap their own learning in the future

“I think the single most valuable thing was probably the skill transfer [from collaboration during pair-programming]... though a deepened understanding of the AI safety community and context also seems very helpful.”
“Enough understanding of evals, RL, and mech interp tools that I didn't have before. [Now, I can] dig much deeper.”

Improvements

As a team, we endeavour to use feedback to improve the quality of ARENA for participants. Each iteration, we learn how better to run the programme to scale its impact. Although this programme was overall successful according to our four success criteria, we noticed some core improvements that would benefit future iterations. The key improvements we noticed in this iteration are:

Development of New Materials

Due to the nature of the AI safety ecosystem, our materials need to be reviewed and updated regularly to retain their relevance. Techniques, methodologies and disciplines that are considered state-of-the-art in AI safety can become obsolete within a matter of months, such is the rate at which AI innovation progresses; naturally, the goalposts of technical safety work must shift to keep pace with this. To this end, and based on feedback from participants and third parties alike during the programme, we are currently scoping out new materials (e.g., AI Control and Technical Governance curricula) to supplement our current curriculum.

Other improvements would include:
- Including reasoning model training;
- Including fine-tuning materials;
- Improving the evals section. This was noted as the section where participants felt they could most get through by themselves; the counterfactual impact of completing this section in person seems to have room for improvement. Moreover, feedback on day [3.1] and [3.4] specifically indicates that these days could be improved.

Added Support for External Programmes and Independent Learners

Overall, the data collected in this impact report paint an encouraging picture of our in-person programmes. The key pain points from ARENA 4.0 appear to have been addressed, and this has been reflected in our participants’ strong learning outcomes and overall enjoyment of the programme.

Though we continue to take seriously the feedback we received and its role in shaping improvements for future iterations, we now feel it would be prudent to improve the support we provide to external programmes teaching our materials – ‘ARENA clones’ – and individuals who are self-studying our materials. Rather than directing our limited resources at chasing marginal gains in the in-person programme, we feel these would be better mobilised in support of those who engage with our content without coming to join us in person at LISA. To this end, our long-term projects include:

Creating an ARENA YouTube channel, with videos including:
- Lecture recordings covering each day of the programme’s content;
- Walkthroughs of key exercises of the ARENA curriculum.
Integrating the ARENA materials, currently on Streamlit, into our website;
- This is with a view to creating an interactive learning platform where learners can save/track their progress.
Building deeper relationships and touch-points with other organisations running ARENA materials, using their feedback to improve our provisions and curriculum;
Creating improved resources to help other organisations’ versions of ARENA run effectively, such as guidebooks for external TAs.

We recognise that the bulk of our efforts, up until now, have focused on maximising the effectiveness of our in-person programmes whilst simply making the ARENA materials available for external use. Given that our end goal is to promote work and research in AI alignment through the ARENA materials, we feel that the most impactful use of our resources at this time would be to bolster our support for remote learners whilst maintaining (and improving, where possible) the high quality of our in-person programmes. Any direct feedback or suggestions on what ARENA should be doing would be appreciated – please use this form to do so! Or feel free to leave a comment on LessWrong giving us this feedback!

Finally, if anyone is interested in running a version of the ARENA curriculum in the Bay Area, reach out to us at info@arena.education. We’d be very excited to discuss how we can best support and facilitate this!

Acknowledgements:

This report was produced by @JScriven and @JamesH at ARENA, and was reviewed with comments by @James Fox. We thank Callum McDougall and David Quarel who both acted as TAs for this iteration of ARENA and designed the bulk of the ARENA curriculum. We also thank Nicky Pochinkov, Dennis Akar, and Adam Newgas who acted as TAs during this iteration of the programme. Finally, we thank Open Philanthropy for their generous support of the ARENA programme.

^{^}
For each iteration, we aim to accept 30 participants – in ARENA 5.0, we accepted fewer participants than this owing to unforeseen issues with visas and availability.

LESSWRONG
LW