Requirements for independent AI alignment research and how they are connected

This diagram summarizes the requirements for independent AI alignment research and how they are connected.

In this post I'll outline my four-year-long attempt at becoming an AI alignment researcher. It's an ‘I did X [including what I did wrong], and here's how it went’ post (see also jefftk's More writeups!). I'm not complaining about how people treated me – they treated me well. And I'm not trying to convince you to abandon AI alignment research – you shouldn't. I'm not saying that anyone should have done anything differently – except myself.



Funding is the main requirement, because it enables everything else. Thanks to Paul Christiano I had funding for nine months between January 2019 and January 2020. Thereafter I applied to the EA Foundation Fund (now Center on Long-Term Risk Fund) and Long-Term Future Fund for a grant and they rejected my applications. Now I don't know of any other promising sources of funding. I also don't know of any AI alignment research organisation that would hire me as a remote worker.

How much funding you need varies. I settled on 5 kUSD per month, which sounds like a lot when you're a student, and which sounds like not a lot when you look at market rates for software developers/ML engineers/ML researchers. On top of that, I'm essentially a freelancer who has to pay social insurance by himself, take time off to do accounting and taxes, and build runway for dry periods.

Results and relationships

In any job you must get results and build relationships. If you don't, you don't earn your pay. (Manager Tools talks about results and relationships all the time. See for example What You've Been Taught About Management is Wrong or First Job Fundamentals.)

The results I generated weren't obviously good enough to compel Paul to continue to fund me. And I didn't build good enough relationships with people who could have convinced the LTFF and EAFF fund managers that I have the potential they're looking for.


Funding buys time, which I used for study and research.

Another aspect of time is how effectively and efficiently you use it. I'm good at effective, not so good at efficient. – I spend much time on non-research, mostly studying Japanese and doing sports. And dawdling. I noticed the dawdling problem at the end of last year and got it under control at the beginning of this year (see my time tracking). Too late.

Added 2020-03-16: I also need a lot of sleep in order to do this kind of work. – About 8.5 h per day.

Travel and location

I live in Kagoshima City in southern Japan, which is far away from the AI alignment research hubs. This means that I don't naturally meet AI alignment researchers and build relationships with them. I could have compensated for this by travelling to summer schools, conferences etc. But I missed the best opportunities and I felt that I didn't have the time and money to take the second-best opportunities. Of course, I could also relocate to one of the research hubs. But I don't want to do that for family reasons.

I did start maintaining the Predicted AI alignment event/meeting calendar in order to avoid missing opportunities again. And I did apply and get accepted to the AI Safety Camp Toronto 2020. They even chose my research proposal for one of the teams. But I failed to procure the funding that would have supported me from March through May when the camp takes place.


I know more than most young AI alignment researchers about how to make good software, how to write well and how to work professionally. I know less than most young AI alignment researchers about maths, ML and how to do research. The latter appear to be more important for getting results in this field.


Why do I know less about maths, ML and how to do research? Because my formal education goes only as far as a BSc in computer science, which I finished in 2014 (albeit with very good grades). There's a big gap between what I remember from that and what an MSc or PhD graduate knows. I tried to make up for it with months (time bought with Paul's funding) of self-study, but it wasn't enough.

Added 2020-03-26: Probably my biggest strategic mistake was to focus on producing results and trying to get hired from the beginning. If I had spent 2016–2018 studying ML basics, I would have been much quicker to produce results in 2019/2020 and convince Paul or the LTFF to continue funding me.

Added 2020-12-09: Perhaps trying to produce results by doing projects is fine. But then I should have done projects in one area and not jumped around the way I did. This way I would have built experience upon experience, rather than starting from scratch everytime. (2021-05-25: I would also have continued to build relationships with researchers in that one area.) Also, it might have been better to focus on the area that I was already interested in – type systems and programming language theory – rather than the area that seemed most relevant to AI alignment – machine learning.

Another angle on this, in terms of Jim Collins (see Jim Collins — A Rare Interview with a Reclusive Polymath (#361)): I'm not ‘encoded’ for reading research articles and working on theory. I am probably ‘encoded’ for software development and management. I'm skeptical, however, about this concept of being ‘encoded’ for something.

All for nothing?

No. I built relationships and learned much that will help me be more useful in the future. The only thing I'm worried about is that I will forget what I've learned about ML for the third time.


I could go back to working for money part-time, patch the gaps in my knowledge, results and relationships, and get back on the path of AI alignment research. But I don't feel like it. I spent four years doing ‘what I should do’ and was ultimately unsuccessful. Now I'll try and do what is fun, and see if it goes better.

What is fun for me? Software/ML development, operations and, probably, management. I'm going to find a job or contracting work in that direction. Ideally I would work directly on mitigating x-risk, but this is difficult, given that I want to work remotely. So it's either earning to give, or building an income stream that can support me while doing direct work. The latter can be through saving money and retiring early, or through building a ‘lifestyle business’ the Tim Ferriss way.

Another thought on fun: When I develop software, I know when it works and when it doesn't work. This is satisfying. Doing research always leaves me full of doubt whether what I'm doing is useful. I could fix this by gathering more feedback. For this again I would need to buy time and build relationships.


For reference I'll list what I've done in the area of AI alignment. Feel free to stop reading here if you're not interested.


…to everyone who helped me and treated me kindly over the past four years. This encompasses just about everyone I've interacted with. Those who helped me most I've already thanked in other places. If you feel I haven't given you the appreciation you deserve, please let me know and I'll make up for it.

New Comment
19 comments, sorted by Click to highlight new comments since: Today at 2:54 AM

Because conditions might change and you might come back to AI alignment research, I want to share some details of what I've been doing and how I've approach my alignment work. I'll write this out as a personal story since that seems to be the best fit, and you can pull out what resonates as advice. Some of the details might seem irrelevant at first but I promise I put them there as context that I think is relevant to tying the whole thing together at the end.

So back in 1999 I got a lot more engaged on the Extropians mailing list (like actually reading it rather than leaving them unread in a folder). This led to me joining the SL4 mailing list and then getting really excited about existential risks more generally (since about 1997 I had been reading and thinking a lot about nanotech/APM and its risks). Over the next few years I stayed moderately engaged on SL4 and things that came after it until around 2004-2005. By this point it seemed I just wasn't cut out for AI alignment research even though I cared a lot, and I mostly gave up on ever being able to contribute anything. I went off to live my life, got married, and worked on a PhD.

I didn't lose touch with the community. When it started Overcoming Bias went straight into my RSS reader, and then LW later on. I kept up with the goings on of SIAI, the Foresight Institute, and other things.

My life changed directions in 2011. That year I dropped out of my PhD, having lost the spirit to finish it about 2 years earlier such that I failed classes and only worked on my research, because my wife was sick and couldn't work and I needed a job that paid more. So I started working as a software engineer at a startup. Over the next year or so this changed me: I was making money, I was doing something I was good at, I saw that I kept getting better at it, and it built a lot of confidence. It seemed I could do things.

So in 2012 I finally signed up for cryonics after years of procrastination. I felt good about myself for maybe the first time in my life and I had the money to do it. In 2013 I felt even better about myself and separated from my wife, finally realizing and accepting that I was only with her not because I wanted to be with her but because I didn't want to be alone. That same year I took a new programming job in the Bay Area.

I continued on this upward trajectory for the next few years, but I didn't think too hard about doing AI research. I thought my best bet was that maybe I could make a lot of money and use it to fund the work of others. Then in 2017, after a period of thinking real hard and writing about my ideas, I noticed one day that maybe I had some comparative advantage to offer AI alignment research. Maybe not to be a superstar researcher trying to solve the whole thing, but I could say some things and do some work that might be helpful.

So, that's what I did. AI alignment research is in some sense a "hobby" for me because it's not what I do full time and I don't get paid to do it, but at the same time it's something I make time for, stay up with, and even if I'm not seemingly doing as much as others, I keep at it because it seems to me I'm able to offer something to the field in places that appear neglected to me. Maybe my biggest impact will just be to have been part of the field and have made it bigger and more active so that it had more surface area for others to stay engaged with and find it on their own paths to doing work that has more direct impact, or maybe I'll eventually stumble on something that is really important, or maybe I already have and we just don't realize it yet. It's hard to say.

So I hope this encourages you not to give up on AI alignment research all together. Do what you need to do for yourself, but also I hope you don't lose your connection to the field. One day you might wake up to realize things have changed or you know something that gives you an unique perspective on the problem that, if nothing else, might get people thinking in ways they weren't about the problem before and inject useful noise that will help us anneal our way to a good solution. I hope that you keep reading, keep commenting, and one day find you have something you need to say about AI alignment because others need to hear it.

Thanks for sharing your story and for encouraging me! I will certainly keep in touch with the AI alignment community.

As someone just starting out on the path towards becoming AI safety researcher I appreciate this post a lot. I have started worrying about not having enough impact in the field if I could not become established fast enough. However, reading this I think that it might serve me (and the field) better if I instead take my time and instead only enter the field properly if I find that my personal fit seems good and that I can stay in the field for a long time.

Furthermore, this post has helped me in finding possible worthwhile early on projects that could increase my understanding and personal fit for the field.

Sounds good! For me, it was detrimental to focus on intended-for-public projects early on. It would probably have been better to build my understanding and knowledge, which you also appear to be aiming at.

If I can help you with anything or answer questions, let me know. In general, it's good to talk with experienced and successful people and I would suggest attending some of the now-virtual conferences to do that. – EAGxVirtual or events on this calendar:

Thanks for the writeup.

I don't know how much stock I would put in the things Jim Collins says. Thinking Fast and Slow had an interesting critique of one of his books:

The halo effect and outcome bias combine to explain the extraordinary appeal of books that seek to draw operational morals from systematic examination of successful businesses. One of the best-known examples of this genre is Jim Collins and Jerry I. Porras’s Built to Last. The book contains a thorough analysis of eighteen pairs of competing companies, in which one was more successful than the other. The data for these comparisons are ratings of various aspects of corporate culture, strategy, and management practices. “We believe every CEO, manager, and entrepreneur in the world should read this book,” the authors proclaim. “You can build a visionary company.”

The basic message of Built to Last and other similar books is that good managerial practices can be identified and that good practices will be rewarded by good results. Both messages are overstated. The comparison of firms that have been more or less successful is to a significant extent a comparison between firms that have been more or less lucky. Knowing the importance of luck, you should be particularly suspicious when highly consistent patterns emerge from the comparison of successful and less successful firms. In the presence of randomness, regular patterns can only be mirages.

Because luck plays a large role, the quality of leadership and management practices cannot be inferred reliably from observations of success. And even if you had perfect foreknowledge that a CEO has brilliant vision and extraordinary competence, you still would be unable to predict how the company will perform with much better accuracy than the flip of a coin. On average, the gap in corporate profitability and stock returns between the outstanding firms and the less successful firms studied in Built to Last shrank to almost nothing in the period following the study. The average profitability of the companies identified in the famous In Search of Excellence dropped sharply as well within a short time. A study of Fortune’s “Most Admired Companies” finds that over a twenty-year period, the firms with the worst ratings went on to earn much higher stock returns than the most admired firms.

You are probably tempted to think of causal explanations for these observations: perhaps the successful firms became complacent, the less successful firms tried harder. But this is the wrong way to think about what happened. The average gap must shrink, because the original gap was due in good part to luck, which contributed both to the success of the top firms and to the lagging performance of the rest. We have already encountered this statistical fact of life: regression to the mean.

Stories of how businesses rise and fall strike a chord with readers by offering what the human mind needs: a simple message of triumph and failure that identifies clear causes and ignores the determinative power of luck and the inevitability of regression. These stories induce and maintain an illusion of understanding, imparting lessons of little enduring value to readers who are all too eager to believe them.

Thanks for the writeup.

+1 I appreciated the writeup. Amongst many positive things about the post, I think it really helps for there to be open conversation about what it looks like when things don't work out as hoped, to help people understand what they're signing up for.

Thanks for the extended quote! :-) As I wrote, I'm sceptical of Jim Collins' claims. On the other hand – people can't just noodle around and expect to be lucky. There must be certain activities that lead to more success than others. So there is some value in Collins-type research in that it finds likely candidates for success-inducing activities.

Nice diagram.

I'm currently doing interviews with early career and aspiring AIS researchers to learn how to better support this group, since I know a lot of us are struggling. Even though you left, I think there are valuable information in your experience. You can answer here publicly or contact me via your preferred method.


What could have been different about the world for you to succeed in getting a sustainable AI Safety research career?

What if you got more funding?

What if you got some sort of productivity coaching?

What if you had a collaboration partner?


Random suggestion

Would you be interested in being a research sponsor. I'm guessing wildly here but maybe you can earn enough to live the fun life you want while also supporting a AI Safety researcher? Given that you been in the field, you have some capability to evaluate their work. You can give someone not just money but also a discussion partner and some amount of feedback.

If you can help someone else succeed, that creates as much good as doing the work yourself.

I just started doing these interviews with people, so I don't know for sure. But if my current model is more or less right, there will be lots of people who are in the situation you just left behind. And if I would make some wild guesses again, I would say that most of them will quit after a few year, like you, unless we can create better support.

This is just something that came to my mind late at night. I have not though long and hard about this idea. But maybe check if something like this feels right for you?

What could have been different about the world for you to succeed in getting a sustainable AI Safety research career?

If I had had a mentor early on, in the beginning of 2016, that would have been great. A mentor who has patience for a bungling young person and keeps nudging them back on the right path. A mentor who has time for a weekly video call. A mentor who sets the structure for the relationship. Because I wouldn't have known what structure is good.

I still don't know how to find such a person.

Added 2020-04-28: In hindsight, the mentor should have recognized that I lack a foundation of machine learning knowledge. They should have told me to have fun and study whatever AI-related topic I like, with as much backtracking as I like, for one or two years. Rather than trying to do research projects and to get a job. I wish I had that chance again. Following the MIRI Research Guide would have been just the right thing from me. But for some reason I strayed from that path.

What if you got more funding?

More funding would unblock all dependency cycles in the diagram above. This means that I could continue doing research. Would I do it? I think so, especially with a collaboration partner as described above. I tend to be doubtful about what I'm doing, but I also believe that I have something to contribute to the field. Not raw math power – other people are better at that. But more on the software development, process, management, people side.

What if you got some sort of productivity coaching?

I don't think I have a big problem with productivity. Are you asking because I wrote about dawdling above? I've fixed that mostly. And I'm so far not willing to give up the remaining big time consumers (family, Japanese, weightlifting, BJJ, sleep).

That said, it's always helpful to have someone look at what I'm doing and tell me where I can do better.

What if you had a collaboration partner?

This would help if the person filled in for my weaknesses. Ie. if they knew in depth about math and ML theory. If they liked to read articles and keep up with what the field is doing. If they liked to carve out new problems to be solved.

About the research sponsorship:

I'm all for supporting eager people. The tax issues and other formalities can be sorted out. Whether I would personally start earning to give or have time for discussions and feedback, I don't know yet. It depends on what I do next. Certainly I wouldn't mind if people ask me for it. I would like to be the kind of mentor that I wish I had. Of course, I'm still inexperienced, but I think I could help someone who is where I was four years ago.

I wouldn't want to evaluate the usefulness of people's proposals and make funding decisions. This would require keeping up with current research, which is something I dislike. Also, I'm already doubting the usefulness of my own research, so how would I know about others'?

If you need more detail, let me know and I'll book a time in your Calendly.

I'll get back to this by 24 March.

Brief note on sponsoring: I like the idea. Practically one might need to jump through some extra hoops in order to get these donations deducted from one's taxes.

Hm, I did not think about the tax part.

What country to you live in?

Maybe BERI would be willing to act as middle hand. They have non profit status in the US.

Note: It's BERI, not BEARI

Ok, thanks. I have changed it ...

... I mean that is what what I wrote all along, can't you see? :P

I very much appreciate your efforts both in safety research and in writing this retrospective :)

For other people who are or will be in a similar position to you: I agree that focusing on producing results immediately is a mistake. I don't think that trying to get hired immediately is a big mistake, but I do think that trying to get hired specifically at an AI alignment research organisation is very limiting, especially if you haven't taken much time to study up ML previously.

For example, I suspect that for most people there would be very little difference in overall impact between working as a research engineer on an AI safety team straight out of university, versus working as an ML engineer somewhere else for 1-2 years then joining an AI safety team. (Ofc this depends a lot on how much you already know + how quickly you learn + how much supervision you need).

Perhaps this wouldn't suit people who only want to do theoretical stuff - but given that you say that you find implementing ML fun, I'm sad you didn't end up going down the latter route. So this is a signal boost for others: there's a lot of ways to gain ML skills and experience, no matter where you're starting from - don't just restrict yourself to starting with safety.

Thanks for adding your thoughts! I agree, it would have made sense to become an ML engineer just somewhere. I don't remember why I dismissed that possibility at the time. NB: If I had not dismissed it, I would still have needed to get my head set straight about the job requirements, by talking to an engineer or researcher at a company. Daniel Ziegler described a good way of doing this on the 80,000 Hours Podcast, which is summarized in ML engineering for AI safety & robustness: a Google Brain engineer’s guide to entering the field. Danny Hernandez expanded on that in a useful way in Danny Hernandez on forecasting and the drivers of AI progress.

After I left AI alignment, I thought about spending three months polishing my ML skills, then applying for ML engineering jobs, so that I could return to AI alignment later. – Exactly what you're suggesting, only four years late. :-) – But given the Covid chaos and my income risk aversion, I decided to stick to my guns and get a software engineering job as soon as possible. Luckily, I ended up with a high-impact one, although on in x-risk.

Final note on why I think it was bad for me to try to get hired: It used to take me up to a week to get out an application, which distracted mightily from research work.

The diagram at the beginning is very interesting. I'm curious about the arrow from relationship to results... care to explain? It refers to joint works or collaborations?

On the other hand, it's not surprising to me that AI alignment is a field that requires much more research and math than software writing skills... the field is completely new and not very well formalized yet, probably your skill set is misaligned with the need of the market

Good point about the misaligned skillset.

Relationships to results can take many forms.

  • Joint works and collaborations, as you say.
  • Receive feedback on work products and use it to improve them.
  • Discussion/feedback on research direction.
  • Moral support and cheering in general.
  • Or someone who lights a fire under your bum, if that's what you need.
  • Access to computing resources if you have a good relationship with a university.
  • Mentoring.
  • Quick answers to technical questions if you have access to an expert.
  • Probably more.

This only lists the receiving side, whereas every good relationship is based on give-and-take. Some people get almost all their results by leveraging their network. Not in a parasitic way – they provide a lot of value by connecting others.

Hi rmoehn,

I just wanted to thank you for writing this post and "Twenty-three AI alignment research project definitions".

I have started a 2-year (coursework and thesis) Master's and intend to use it to learn more maths and fundamentals, which has been going well so far. Other than that, I am in a very similar situation that you were in at the start of this journey, which makes me think that this post is especially useful for me.

  • BSc (Comp. Sci) only,
  • 2 years professional experience in ordinary software development,
  • Interest in programming languages,
  • Trouble with "dawdling".

The part of this post that I found most interesting is

Probably my biggest strategic mistake was to focus on producing results and trying to get hired from the beginning.

[8 months]

Perhaps trying to produce results by doing projects is fine. But then I should have done projects in one area and not jumped around the way I did.

I am currently "jumping around" to find a good area, where good means 1) Results in area X are useful, 2) Results in area X are achievable by me, given my interests, and the skills that I have or can reasonably develop.

However, this has encouraged me more to accept that while jumping around, I will not actually produce results, and so (given that I want results, for example for a successful Master's) I should really try to find such a good area faster.

Sounds good! I wish you luck in finding a good area. And I suggest another criterion: ‘3) I enjoy working in area X.’ – It's not strictly necessary. Some things you only start enjoying after you've been doing them for a while. But it certainly helps with the dawdling if you're more eager to work on X than to dawdle.

By the way, I've added another clarification to the paragraph above: ‘Perhaps trying to produce results by doing projects is fine. But then I should have done projects in one area and not jumped around the way I did. This way I would have built experience upon experience, rather than starting from scratch everytime.