Cross-posted from the EA Forum.
During the past months I've privately talked with +20 AI Safety researchers about how to transition into Theoretical AI Safety . Here's a distillation of their advice and opinions (both general consensus and controversial takes).
Some great posts already exist with general advice for transitioning into AI Safety. However, these and others are mainly centered around technical Computer Science research and Machine Learning engineering. They don't delve into how more theoretical aptitudes and background (such as careers in Mathematics, theoretical Computer Science or Philosophy) can be steered into more abstract Alignment research that exploits them (except for Critch's short post). I think that's mainly because:
This post tries to fill that gap by being a helpful first read for graduates/researchers of abstract disciplines interested in AI Alignment. I'd recommend using it as a complement to the other more general introductions and advice. The two following sections are just a summary of general community knowledge. The advice sections do include some new insights and opinions which I haven't seen comprehensively presented elsewhere.
I presuppose familiarity with the basic arguments encouraging AI Alignment research. This is a somewhat risky and volatile area to work on for positive impact, given how little we understand the problem, and so I recommend having a good inside view of this field's theory of change (and our doubts about it) before committing hard to any path (and performing a ladder of tests to check for your personal fit).
Of course, I do think careers in AI Safety have expected positive impact absurdly larger than the median, only equaled by other EA cause areas. Furthermore, if you're into an intellectual challenge, Alignment is one of the most exciting, foundational and mind-bending problems humanity is facing right now!
There are sound arguments for the importance of theoretical research, since its methods allow for more general results that might scale beyond current paradigms and capabilities (which is what mainly worries us). But we are not even sure such general solutions should exist, and carrying this research out faces some serious downsides, such as the scarcity of feedback loops.
Truth is, there's no consensus as to whether applied or theoretical research is more helpful for Alignment. It's fairly safe to say we yet don't know, and so need people working in all fronts. If you might have especially good aptitudes for abstract thinking, mathematics, epistemics and leading research agendas, theoretical research might be your best chance for impact. That is, given uncertainty about each approach's impact, you should mainly maximize personal fit.
That said, I'd again encourage developing a good inside view to judge for yourself whether this kind of research can be useful. And of course, trying out some theoretical work early on doesn't lock you out of applied research.
In theoretical research I'm including both:
This research is mainly carried out in private organizations, in academia or as independent funded research. Any approach will need or heavily benefit from basic understanding of the current paradigm for constructing these systems (Machine Learning theory), although some do abstract away some details of this paradigm.
As a preliminary advice, the first step in your ladder of tests can be just reading through ELK and thinking for a non-trivial amount of hours about it, trying to come up with strategies or poke holes in others. If you enjoy that, then you'd probably not be miserable partaking in Theoretical AI Safety. If furthermore you have or can acquire the relevant theoretical background, and your aptitudes excel for that, then you're probably a good fit. Way more details can be weighed to assess your fit: how comfortable you will be tackling a problem about which everyone is mostly confused, how good you are at autonomous study/self-directed research, whether you'd enjoy moving to another country, how anxious deadlines or academia make you feel...
The following advice presuppose that you have already reached the (working) conclusion that this field can be impactful enough and you are a good fit for it.
These points were endorsed by almost everyone I talked to.
These points prompted radically different opinions amongst the researchers I've talked to, and some portray continuous debates in the community.
If you're one of those researchers and would like nominal recognition in this post, let me know. I've taken the approach of default anonymity given the informal and private nature of some of the discussions.
"AI Safety Technical research" usually refers to any scientific research trying to ensure the default non-harmfulness of AI deployment, as opposed to the political research and action in AI Governance. It thus includes the Theoretical/Conceptual research I talk about in this post despite the applied connotations of the word "Technical".
The division between applied and theoretical is not binary but gradual, and even theoretical researchers usually construct upon some empirical data.
I've been positively surprised as to how many popular researchers have been kind enough to answer some of my specific questions.