Please post the work test and grading template you used. A lot of people here might benefit from reading it.
This was useful, thanks.
One thing that stood out to me was the project of generating documents which fully describe a model’s behaviour, given only its behaviour.
How do people working on this think about cases where there may be multiple equally predictive descriptions of the same model?
For example, if two descriptions predict the behaviour equally well, is there a good way to decide which one is closer to the model’s actual motivation? Or is predictive adequacy basically the standard being aimed at here?
This post’s goal is to distill our takeaways from building a new research team over the past four months. We describe some context about our team, how it came about, and then describe the lessons learned.
Since AI safety is becoming more and more entrepreneurial, we hope this is helpful for others trying to do the same.
1. The team
We're a new alignment research team within Arcadia Impact, based in London. We’re a team of 8, working closely with members of the UK AISI alignment team. We currently have three main projects:
We're also hiring for two roles! More on this at the bottom.
2. Context about how the team came about
The rest of this post is written from the perspective of Andrew Draganov (research lead & current programme manager on the team) and Erin Robertson (co-director of Arcadia).
In short, Arcadia Impact had been collaborating with AISI already, through LASR Labs and ASET. Our alignment team started by applying for the AISI alignment project funding, saying that we would hire a team of researchers to collaborate with their alignment team. Andrew was taking part in LASR at the time and was brought in to help with the application. His remit then widened as the number of things to do kept growing. Once our AISI funding was approved we began the process of hiring researchers, and also applied to Coefficient Giving for additional compute funding.
A bit about Andrew, since it bears on how replicable this is. In his words:
For anyone reading this post as a template, here are some things which may be specific to our situation and might not generalise cleanly:
3. Lessons learned
Given the above context, here is advice which we hope is immediately actionable by people looking to start AI safety orgs.
3.1 Hiring
[Written from Andrew’s perspective]
I feel like our hiring went very well and I’m really excited about the team. But also I wasted a lot of time chasing leads that were varying amounts of useful.
For one thing, everyone wants to measure 'crackedness' but it’s unclear how to do it. On that axis, the two highest-signal parts of our process were the work test and the references; if we'd relied on only those two, I think we'd have assessed raw research ability roughly as well as we did. The interviews were helpful in addition to that, but mostly to vibecheck for fit rather than to gauge ability.
For the work test, we paid 50 applicants ~$200 each to make a research proposal. We gave them 4 hours to do this, and the deliverable was just a pdf. We then graded them anonymously. This feels in line with what the work actually looks like in the age of Claude code. We’re happy to share the work test and grading template we used if someone is interested.
Here are a few additional thoughts:
3.2 Networking
[Written from Andrew’s perspective]
Even though it’s clear that building a good team requires a lot of networking, it was often hard to tell which networking was “worth it” and which wasn’t. Here are the things I’d prioritise if I was doing it again:
3.3 Trying to build a good team culture
[Written from Erin’s perspective, with context from running LASR Labs for multiple years]
Since the team’s just started, we’re not able to claim the culture is good (also, this is not really for us to say). Instead, here is how we thought about the process of establishing team culture prior to people joining. Parts are heavily influenced by the way this is done for LASR cohorts:
Interested in working with us?
We're hiring! Specifically, we're looking for an Alignment Programme Manager, a senior generalist to help build and run the team. We're also hiring a Communications and Operations Associate to shape how our research reaches stakeholders and to keep the team's operations running. Both will be based at the LISA office in central London, with visa sponsorship available.
If you think your skills don’t fit neatly into one of these descriptions but you think you’d be a good fit, please apply – we are flexible on the exact role and are more interested in finding good candidates! The deadline for applications is June 23rd.
Similarly, if you're working on related topics, please reach out! The easiest option is to send an email to andrew[at]arcadiaimpact[dot]org. You can also follow our research updates on twitter.
Disclosure: Erin is joining the Coefficient Giving Technical AIS team full time at the end of June and is currently part time there.