The impact report from ARENA’s prior iteration, ARENA 4.0, is available here.
The purpose of this report is to evaluate ARENA 5.0’s impact according to ARENA’s four success criteria:
Overall, this iteration of ARENA was highly successful according to our success criteria. We are delighted that our 28 in-person programme participants rated their overall enjoyment of the ARENA programme at 9.3/10, representing our highest satisfaction score to date.
Criterion 1: Our participants were of a strong calibre, coming from diverse backgrounds and bringing a wealth of different expertise with them. Notably, 11 participants either held or were pursuing doctoral degrees in technical fields, and 5 had over one year’s professional experience as software engineers. Other participants came from diverse backgrounds including data science, forecasting, computer engineering and neuroscience. Compared to ARENA 4.0, this iteration included fewer software engineers and similar professionals but more participants holding or pursuing doctoral degrees. This was not a deliberate choice, but our selection process placed high value on those who could demonstrate substantial engagement with technical AI safety and safety-relevant topics, which may explain this shift in demographic.
Criterion 2: Our in-person programme delivered substantial upskilling across all technical domains; when asked to rate out of 10 how satisfied they were that they had achieved their pre-programme goals, participants responded with an average of 8.8/10, with 9 out of 28 respondents responding with a 10/10 score. After the programme’s conclusion, participants were significantly more confident in mechanistic interpretability (improving from 3.4 to 6.1 on average), reinforcement learning (3.2 to 6.4 on average), and LLM evaluations (3.9 to 7.5 on average) – see Figure 1 for these statistics. Most impressively, participants’ confidence in specific LLM evaluation design tasks increased from 4.2/10 to 9.0/10 (Figure 18), demonstrating our curriculum’s effectiveness in developing practical AI safety skills.
Our in-person taught programme lasts 4 weeks. On average, participants estimated their counterfactual time to learn the full ARENA content unsupervised would have been 9.3 weeks. Two participants felt the programme was too short; another two felt it was too long. The rest felt that the duration was ‘just right’.
Criterion 3: Participants rated the value of being in the LISA environment as 9.6/10 (Figure 22), our highest recorded score yet (compared to ARENA 4.0’s 8.9/10). This underscores the value of hosting ARENA at the LISA workspace, and we are grateful for the continued opportunity to host our programmes here. Participants’ post-programme feedback highlighted their appreciation of the connections, community-building, events and functions that LISA facilitates.
Criterion 4: Participants’ confidence that technical AI safety is right for them increased from 7.9/10 to 8.4/10 (Figure 23). Furthermore, participants reported an average of 8.9/10 agreement that the programme left them in a significantly stronger position to contribute to technical AI safety, indicating ARENA’s impact on career acceleration (Figure 24). At the end of the programme, 8 participants stated that they held confirmed offers to pursue (or to continue pursuing) full-time work in technical AI safety within four months, mainly as research scientists or doctoral students. It should be noted that for the most part, these individuals already held these offers at the time they started the programme; in such cases, these individuals’ desire to participate in ARENA was largely driven by the fact that our programme’s content would enable them better to perform these roles. At the close of the programme, a further 19 participants were either actively applying or planning to apply to full-time roles in technical AI safety.
First, we outline when the programme took place, what topics were covered, and the main changes made to the programme in contrast to previous iterations. For more information about our curriculum’s content, see our website.
ARENA 5.0 ran from April 28th – May 30th 2025. The schedule of the programme was as follows:
The main changes for ARENA 5.0 compared with ARENA 4.0 (ran in Q3 2024) were:
Staff Changes: James Hindmarch and Joly Scriven acted as Programme Lead and Operations Lead respectively for this iteration. David Quarel resumed his role as Head TA, while Nicky Pochinkov, Dennis Akar, Callum McDougall, and Adam Newgas acted as a rotating cast of TAs during the programme. James Fox adopted an advisory role for this iteration.
Increased Career Focus:
Acting on feedback from ARENA 4.0 that the offramping process from the programme was somewhat abrupt, the Capstone Week doubled up as a career-oriented week. When participants were not working on their Capstone Projects, we guided them to develop professional connections in the field and plan for their next steps post-ARENA. To this end, we organised three careers talks – delivered by Marius Hobbhahn (of Apollo Research), Joseph Bloom and Shannon Yang (both of UK AISI) – which included advising and Q&A sessions to address participants’ uncertainties ahead of navigating the job market in AI safety. Additionally, we gave participants the opportunity to have 1-1 discussions with ARENA alumni now working in the field, giving them further support and insight into what doors ARENA might open for them. These efforts were met with positive feedback, and we plan to continue these provisions in future iterations.
Additionally, we now provide a LinkedIn certification for completing the programme, which we hope will be useful for participants in pursuing further careers in AI safety.
Improved Accommodation:
Accommodation was identified as an area requiring improvement from ARENA 4.0. This was addressed, and there was evidence that our efforts on this front paid dividends in achieving our stated aims. We increased our accommodation budget, accommodation was booked significantly earlier, and each property was visited individually to ensure its fitness for purpose (comfort, cleanliness, and proximity to the LISA workspace). Participants’ attendance during ARENA 5.0 was excellent, with an average daily attendance of 96% and all absences justified.
Automation of Processes:
As the programme has evolved from the previous iteration – with new branding, a new website, and so forth – the operations of the programme have evolved in keeping with this progression, notably through improved daily feedback mechanisms and the provision of automated travel and reimbursement software. This has proven effective; administrative burden was flagged as a pain point during ARENA 4.0, but no such issues materialised this time. As we hoped, this enabled a greater focus on the programme’s content, epitomised by one participant’s comment that ‘the overall feeling of being supported in every sense was very comforting, and it was so easy to concentrate on just the programme itself’. This is in keeping with our aspirations for ARENA’s continued development in the future.
Order of Curriculum:
ARENA 5.0 saw the Reinforcement Learning curriculum delivered in Week 2 of the programme and the LLM Evaluations content delivered in Week 3 of the programme. This is a reversal of the previous order in ARENA 4.0. We didn’t see a substantial difference between participants’ engagements with these areas from the changed order.
We surveyed our participants at the programme’s start (prior to day 1) and at the end (on the last day). The surveys were intentionally constructed to test the same domains, enabling a detailed ‘before and after’ analysis to gauge the programme’s impact. We also conducted 2-minute daily feedback surveys – quickly accessible to participants via QR codes – at the end of each day of the programme, but the bulk of this impact report relies on the pre- and post-programme survey data.
We collected three types of responses:
We evaluated open-ended responses using thematic analysis. We highlighted keywords in each response, identified recurring themes and patterns across responses, reviewed the themes, and then counted the frequency of each theme across participant responses. Each count comes from a different participant, but each participant can add to multiple theme counts if their response mentions them.
ARENA’s primary selection criteria for participants remain (1) their likelihood to pursue AI safety rather than general AI development, despite teaching skills applicable to both, and (2) their technical aptitude relevant to AI safety research/engineering (especially their skill in programming). Building on lessons from ARENA 4.0, we selected more experienced participants on average, recognising the value of strengthening the technical skills and community connections of people who will be able to contribute to AI safety in the very near future.
Overall, our updated selection procedure worked effectively. ARENA 5.0 attracted a high-calibre cohort of 28 participants[1], who demonstrated strong foundational technical skills from the outset. One participant commented that they were especially impressed with the technical quality of their peers given ARENA’s relatively streamlined selection process.
Initial applications for ARENA 5.0 opened on January 30th and closed on February 15th 2025. The coding test ran from February 20th until March 1st 2025. Interviews ran from March 6th until March 13th 2025. Building on feedback from ARENA 4.0, where participants indicated that one month's notice was insufficient for their planning needs, we provided participants with 1.5 months' advance notice of acceptance to ARENA 5.0.
We selected 33 participants (of whom 5 were unable to attend) from ~350 applications. ARENA 5.0 had a geographically diverse cohort of high quality participants, with participants coming from the UK, EU, USA, Canada, Australia, Georgia, China, the Philippines, India, and Singapore. Compared with ARENA 4, this cohort contained more participants pursuing or holding doctoral degrees. Our selection process favoured those who could demonstrate substantial engagement with technical AI safety and safety-relevant topics, which may explain this shift in demographic.
Participants were high calibre and included:
Our participants also came from diverse fields, including data science, pure mathematics, neuroscience, and forecasting.
The quality of our participant selection was evidenced by their pre-programme technical competencies, first assessed by us during the coding test and interviews. Participants entered the programme with solid foundations across key domains:
These pre-programme scores reflect our successful identification of participants with the technical aptitude necessary to engage meaningfully with ARENA's challenging curriculum whilst having room for substantial skill development.
The technical baseline of our participants validated our selection methodology. Unlike complete beginners, our cohort possessed foundational knowledge necessary to tackle technical AI safety concepts immediately, whilst still having room for growth in specialised domains relevant to AI safety:
This skill distribution – strong programming foundations with targeted, safety-relevant knowledge gaps – represents an ideal participant profile for maximising ARENA’s impact through focused upskilling in AI safety-specific technical domains.
Our core goal is to upskill participants to tackle technical problems in AI safety. The first four weeks of the ARENA in-person programme cover four technical topics (more detail on each topic is provided in the relevant sections):
Neural Networks Fundamentals (optional): Deep learning foundations and PyTorch proficiency
Mechanistic Interpretability: Transformer analysis, superposition, circuit identification, and other mechanistic interpretability techniques
Reinforcement Learning: Classical and deep RL methods, policy optimisation, and RLHF implementation
LLM Evaluations: Eval design and threat modelling, creating evals, infrastructure for evals and agent evaluations
The aim of this week is for participants to reinforce basic deep learning concepts. This week had 23 participants, as it was optional for those with significant deep-learning experience (5 participants chose to skip this week). Topics covered included PyTorch, basics of neural networks, residual neural networks, CNNs, weights and biases, optimisation, and backpropagation.
At the end of the programme, participants self-assessed as having strengthened their foundational ML skills significantly:
The PyTorch improvement is particularly noteworthy, representing a significant increase in self-assessed competency. This foundational upskilling ensures participants can implement and experiment with AI safety research ideas effectively. However, the improvements here are more modest than in other areas of the curriculum; this is to be expected, as this week is an optional refresher course for the essential ML toolkit required later in the course.
If left to their own devices, participants estimated that self-studying the Fundamentals content to the same degree of proficiency would have taken them 2.2 weeks on average, compared to the 1 week spent on the content with us.
The aim of this week is for participants to understand some of the methods that can be used to analyse model internals and replicate the results from key interpretability papers. Topics covered include the following: GPT models, training and sampling from transformers, TransformerLens, induction heads, indirect object identification, superposition, linear probes, inference-time intervention, and sparse autoencoders. We had three speakers during this week: Callum McDougall of Google DeepMind, Joseph Bloom of UK AISI, and Neel Nanda of Google DeepMind.
Participants showed remarkable progress in mechanistic interpretability, one of the most technically demanding areas of AI safety research:
This represents one of our most successful upskilling outcomes, transforming participants from beginners into competent practitioners capable of contributing to interpretability research either independently or collaboratively.
If left to their own devices, participants estimated that self-studying the Transformers and Mech Interp content to the same degree of proficiency would have taken them 3.1 weeks on average, compared to the 1 week spent on the content with us. This continues to underline that this week is arguably the most demanding section of our curriculum.
This week’s core aim is for participants to understand classical and deep RL methods and how RLHF is implemented on LLMs as the dominant alignment method used today. Topics covered include the following: Fundamentals of RL, gym and gymnasium environments, policy gradient optimisation, PPO, deep Q-learning, RLHF, HuggingFace, and fine-tuning LLMs. We had two guest speakers during this week, both specialising in RL methods: Roberto-Rafael Maura-Rivero, a PhD from LSE, and Liza Tennant, a PhD from UCL (and one of our participants)!
The RL curriculum delivered substantial competency gains:
These results demonstrate participants’ transformation from RL novices to practitioners capable of implementing sophisticated RL algorithms and understanding RLHF methodologies for AI alignment.
If left to their own devices, participants estimated that self-studying the RL content to the same degree of proficiency would have taken them 2.7 weeks on average, compared to the 1 week spent on the content with us.
ARENA’s evals content aims for participants to build alignment and dangerous capability evaluations in multiple-choice and agentic settings, and understand how to use these evaluations to gain information about current frontier LLMs. Topics covered include the following: threat-modelling, using LLM APIs, implementing a pipeline to generate questions using LLMs, UK AISI’s inspect library, implementing LLM agents, and scaffolding LLM agents. We had two guest speakers this week: Marius Hobbhahn and Teun van der Weij, both from Apollo Research.
Our evals curriculum achieved the most dramatic improvement across the curriculum:
This exceptional outcome reflects both the curriculum’s effectiveness and the importance of evals skills for AI safety work. Participants now possess the capabilities to work on the design and implementation of safety evaluations for frontier AI systems.
If left to their own devices, participants estimated that self-studying the evals content to the same degree of proficiency would have taken them 1.3 weeks on average, compared to the 1 week spent on the content with us.
Furthermore, we asked participants how they found the ARENA materials overall. This helps us calibrate participant calibre across different ARENA cohorts and elicit feedback on the quality of our teaching mechanisms.
We also asked all 28 participants what their favourite aspect of the ARENA curriculum was. They responded as follows:
Participants provided exceptionally positive feedback on the programme’s educational components:
These scores reflect both the high calibre of our participants and the effectiveness of our pedagogical approach. The teaching quality score of 9.3/10 is noteworthy, suggesting our curriculum design, TA expertise, and delivery methods have reached good standards – our task now is to maintain them, and improve wherever possible.
The strong goal achievement rating (8.8/10) indicates that ARENA successfully met participants’ diverse learning expectations, despite the cohort’s varied backgrounds and objectives. This validates our curriculum’s flexibility and comprehensiveness in addressing the multifaceted skill requirements for AI safety work.
Our participants spent 4 to 5 weeks full-time in the LISA office in London. Overall, they really enjoyed their time in the LISA workspace. When asked to rate out of 10 how valuable it was to be based in LISA for the programme, our participants responded with an average score of 9.6/10. This is an exceptional score and highlights the unique value of LISA – with its speakers, researchers, events and non-stop activity – as a home for the ARENA programmes.
Similarly, when asked to rate ARENA’s logistical operations out of 10, participants responded with an average score of 9.5/10. We are delighted that we were able to provide our participants with the support they needed to concentrate on their learning outcomes, connect with their peers and integrate seamlessly into the AI safety community over these five weeks. LISA and its staff have been instrumental in ensuring a successful outcome on this front, and we are grateful for the continued opportunity to host our programmes here.
Beyond quantified scores, our participants’ comments provide more detail on specific themes.
Finally, ARENA aims to accelerate participants’ AI safety careers. We’re really excited about this cohort’s next steps deploying the skills that they acquired during ARENA. At the conclusion of the programme, we observed encouraging patterns in our participants’ career plans; thirteen participants stated they were actively applying for AI alignment roles at the time of the survey, with a further six saying that they were planning to do so after the programme ended. Most excitingly, eight participants had a confirmed full-time position in AI safety beginning in the four months post-ARENA. We are delighted about these outcomes, and they make us optimistic about ARENA’s impact into the future. It is immensely rewarding to witness the programme’s impact in securing one of ARENA’s core goals: to provide talented individuals with the skills to go directly into technical AI safety work.
We also saw a difference in participants’ confidence in AI safety being the right field for them. Prior to the programme, when asked to score out of 10 “How confident are you that AI safety is the right field for you?”, participants gave an average of 7.9/10. By the end of the programme, the same question was met with an average response of 8.4/10. The same two participants rated their confidence in this regard at 10/10 before and after the programme.
Below, we have provided selected feedback from participants highlighting the programme’s value as an in-person accelerator bootcamp; beyond the curriculum’s educational content, our programme’s in-person, collaborative structure makes it an equally valuable source of accountability, community and personal growth.
We asked the participants some key questions to gauge ARENA’s counterfactual impact and participants’ overall appreciation of the programme.
We asked participants ‘What was the most valuable thing you gained from the programme?’ and thematically analysed their open-ended responses. 23 participants gave responses for this question, and some responses fell into more than one of the categories outlined below (making for a total of 40 tokens).
Meeting talented and like-minded people: 9 responses (39%)
ML skills and knowledge: 13 responses (57%)
Confidence to take on technical AI safety work: 6 responses (26%)
Ability to quickly execute on a project: 4 responses (17%)
Building an impressive Capstone Project: 1 response (4%)
Given the nature of the programme – as an educational bootcamp with a central focus on collaboration through pair-programming – it is unsurprising that 96% of responses identify either technical skill acquisition or relationship-building as their most valuable gain from the programme. This speaks not only to the quality of the curriculum and ARENA’s TAs, but also to the extra-curricular elements of our in-person programme.
Notable examples of responses include:
Meeting talented and like-minded people
ML skills and knowledge
Confidence to take on ML work
Ability to quickly execute on an ML project idea
Ability to bootstrap their own learning in the future
As a team, we endeavour to use feedback to improve the quality of ARENA for participants. Each iteration, we learn how better to run the programme to scale its impact. Although this programme was overall successful according to our four success criteria, we noticed some core improvements that would benefit future iterations. The key improvements we noticed in this iteration are:
Development of New Materials
Due to the nature of the AI safety ecosystem, our materials need to be reviewed and updated regularly to retain their relevance. Techniques, methodologies and disciplines that are considered state-of-the-art in AI safety can become obsolete within a matter of months, such is the rate at which AI innovation progresses; naturally, the goalposts of technical safety work must shift to keep pace with this. To this end, and based on feedback from participants and third parties alike during the programme, we are currently scoping out new materials (e.g., AI Control and Technical Governance curricula) to supplement our current curriculum.
Added Support for External Programmes and Independent Learners
Overall, the data collected in this impact report paint an encouraging picture of our in-person programmes. The key pain points from ARENA 4.0 appear to have been addressed, and this has been reflected in our participants’ strong learning outcomes and overall enjoyment of the programme.
Though we continue to take seriously the feedback we received and its role in shaping improvements for future iterations, we now feel it would be prudent to improve the support we provide to external programmes teaching our materials – ‘ARENA clones’ – and individuals who are self-studying our materials. Rather than directing our limited resources at chasing marginal gains in the in-person programme, we feel these would be better mobilised in support of those who engage with our content without coming to join us in person at LISA. To this end, our long-term projects include:
We recognise that the bulk of our efforts, up until now, have focused on maximising the effectiveness of our in-person programmes whilst simply making the ARENA materials available for external use. Given that our end goal is to promote work and research in AI alignment through the ARENA materials, we feel that the most impactful use of our resources at this time would be to bolster our support for remote learners whilst maintaining (and improving, where possible) the high quality of our in-person programmes. Any direct feedback or suggestions on what ARENA should be doing would be appreciated – please use this form to do so! Or feel free to leave a comment on LessWrong giving us this feedback!
Finally, if anyone is interested in running a version of the ARENA curriculum in the Bay Area, reach out to us at info@arena.education. We’d be very excited to discuss how we can best support and facilitate this!
Acknowledgements:
This report was produced by @JScriven and @JamesH at ARENA, and was reviewed with comments by @James Fox. We thank Callum McDougall and David Quarel who both acted as TAs for this iteration of ARENA and designed the bulk of the ARENA curriculum. We also thank Nicky Pochinkov, Dennis Akar, and Adam Newgas who acted as TAs during this iteration of the programme. Finally, we thank Open Philanthropy for their generous support of the ARENA programme.
For each iteration, we aim to accept 30 participants – in ARENA 5.0, we accepted fewer participants than this owing to unforeseen issues with visas and availability.