My first year in AI alignment

Alex_Altair

2022 was a pretty surprising year for me. In the beginning, I had just finished going through the Recurse Center retreat, and was expecting to go back into software engineering. As it turned out, I spent almost all of it getting increasingly committed to working on technical AI alignment research full-time.

I think a lot of people are trying out this whole "independent alignment researcher" thing, or are considering it, and I think a lot of these people are encountering a lack of structure or information. So I figure it might be helpful if I just relay what my experience has been, to add to the available information. And of course, there's a ton of variance in people's experience, so this might be more of a data point about what the variance looks like than it is about what the average looks like.

Some personal background

I'm probably pretty unusual in that I've been involved in the rationalist community since 2010. I bought the arguments for AI x-risk virtually since I first read them in the sequences. I tried somewhat to do useful work back then, but I wasn't very productive.

I have a life-long attention problem, which has mostly prevented me from being intellectually productive. Because I couldn't really do research, I went into software engineering. (This was still a struggle but was sufficiently doable to be a stable living.) A lot of this year's transition was figuring out how to do alignment work despite the attention problem.

My technical background is that I majored in math and physics in college, took most of those required classes, and also dropped out (twice). I was well above average in school (but probably average for the rationalist community). I never really did math in the interim, but it always remained part of my identity, and I have always had a theory mindset.

Goals and plans

When I say I've been "working on AI alignment," what do I mean? What sort of goals or plans have I been following? This has evolved over the year as I got more information.

First, I went through AGI Safety Fundamentals course. In the beginning I was just happy to be catching up with how the field had developed. I was also curious to see what happened when I interacted with alignment content more intensely. It turned out that I enjoyed it a lot; I loved almost every reading, and dived way deeper into the optional content.

Next, I tried to see if I could figure out what optimization was. I choose this as my AGISF final project, which was obviously ambitious, but I wanted to see what would happen if I focused on this one research-flavored question for four whole weeks.

I made a lot of progress. I decided it was a good project to continue working on, and then I started alternating between writing up what I had done already, and doing more research. The content continued to evolve and grow.

After a few more months, it was clear to me that I could and wanted to do this longer-term. The optimization project was taking forever, but I had some inside and outside view evidence that I was spending my time reasonably. So while I continued spending some weeks focusing entirely on trying to get the optimization project closer to an end point, I also deliberately spent some weeks doing other tasks that a full-time researcher would do, like learning more about the field, others' agendas, the open problems, et cetera. I tried to improve my own models of the alignment problem, think about what the most critical components were, and what I could do to make progress on them, including what my next major project should be. I haven't made any decisions about this, but I do have a pretty solid shortlist.

For much of December, I decided to do more focused reflection on how the optimization project was going. It looks like it could easily take several more months; does that make sense? And I being grossly inefficient anywhere? Is it valuable enough to spend that time on it? Should I pivot to publishing an MVP version of the sequence? One way to get more insight here is to try to use the optimization content to do further research; it deconfused me, but can I use it to make progress? So I spent some amount of time trying to use it to solve further problems in alignment. This was pretty inconclusive, which I'm counting as evidence against it being useful, but I'm still undecided on how much effort I want to put into the final sequence.

What are my limiting factors?

Body-doubling

My aforementioned attention problem has always been the limiting factor on anything I want to do in my life. Halfway through 2022, I figured out the best intervention so far, which is an ADHD technique called "body-doubling"; essentially, I pay someone to sit next to me while I work. Now, it's pretty clear to me that the biggest potential increase in how much time I spend researching comes from scheduling more body-doubling; another hour scheduled is another hour of research done. Unfortunately it turns out that scheduling with other people to be available for over 40 hours a week is pretty logistically challenging, so my current limit is in getting people to fill the hours.

Finding others to work with

I don't have any researcher partners, or mentors, or mentees, or really anyone that I regularly interact with about my research. I think this is an ongoing failure of mine; I'm just not quite extroverted enough to bother. So I think I'm missing a lot of the things that you get from this; rapid sanity checks, idea generation, exposure to opportunities, etc.

Lack of feedback from reality

It is pretty hard to get feedback from pre-paradigmatic theoretical/mathematical research. Under the below section on "Things I don't know", I talk about not knowing how much time to spend on any particular type of task, and one of my biggest problems is that I get virtually no feedback about this. I don't really know how to get such feedback in principle. There are some things that could serve as a positive feedback signal; number of posts I make, or theorems I prove, or number of times people say "wow good job!" or something. I am counting the lack of these signals as weak evidence against me doing a good job at research. But on the other hand, people constantly talk about how research is hard, getting good results is hard, and no one should expect anything, just keep plugging along and do your best.

Events that increased my involvement in AI alignment

For those who want to get more people into alignment work, here's a list of factors that caused me to get more involved over the year;

Having slowly built up the habit of browsing LW
Participating in AGISF
Picking a good final project
Feeling successful in improving my understanding of the content
Advances in AI making me feel more urgency
Feeling like there was a community to support me (both financially and via idea exchange)
Socializing e.g. at Lightcone, or the CHAI conference, and being able to "hold my own" in such conversations

Things I don't know

I continuously operate with an enormous sense of uncertainty. Every day I wake up and have to decide what to do, and there's almost never a clear reason to pick one thing over another. I am regularly questioning my prioritization choices, implicitly or explicitly, including questioning how much effort I should spend on prioritizing. Here are some specific things I have thought about often.

How good am I at math?

I had a lot of information about how good I was when I was in school, but that information is mostly expired. I can understand a lot of the math I come across, and I can usually understand the first chapter of a random math textbook. But a lot of it takes a lot of effort to understand, and I often don't bother. I haven't "proved a theorem" while doing research this year. I feel like I have a lot of good machinery for intuitively understanding math, but getting my hands dirty in the details continues to feel highly effortful.

How much should I prioritize learning math?

I feel like I have enough math intuition to make conceptual progress, to correctly identify and define concepts, and to identify some interesting ideas or possible theorems. Whenever I think about going through a whole textbook, it just feels like so much time and effort that won't be aimed directly at making progress in alignment. But I really can't tell. Maybe I'm behind enough that I should spend half a year just going through a handful of textbooks to really solidify some of my knowledge and skills.

How much should I prioritize learning ML?

I really prefer working with theoretical agent-foundations-flavored content. Other people are way, way ahead of me on doing anything useful with real-world ML systems. I can do software engineering, but only at a mediocre level. I know the basics of neural networks and gradient descent. Should I go well out of my way to learn more? Do I need to understand transformers? Should I be able to implement any of these things? I continue to decide no on this, but if I got into an MLAB cohort I would go for it.

How bad of an idea is it to not be working with people?

As mentioned in the section on my limiting factors, I don't have regular research interactions with others. I do it a little bit, sometimes, but mostly I just feel kinda "meh" about the idea of going well out of my way to make it a regular thing. Maybe this is a huge mistake!

How valuable is the optimization content?

I have a ton of mixed evidence about the value of my year-long optimization project.

I personally feel like all of my hours on it were super useful in de-confusing myself about optimization (including a side-quest I made into understanding entropy). I now feel like I have the tools to answer any question I encounter about optimization. It's a topic people continue to talk about a lot. Whenever I tell people the details (which I've done several times now) they have never given me significantly negative feedback of the type that I'd expect if my ideas didn't make sense or were fundamentally wrong.

But they also never respond as if I've just hugely clarified the whole topic for them. They kinda just go "huh, neat". They might have some specific questions or criticisms, which I am always prepared to address. But no one has been like "wow, I could use this idea in my research!".

It will take a ton more time to finish writing the sequence, and then it will take people a lot of time to read it. And I'm not really sure how worthwhile that will be.

I might be pretty unusual

Modulo the question of how good I am at it, I think I'm an especially good fit for independent research in a pre-paradigmatic field. I practically go around all day having to restrain myself from investigating everything around me. I read the Naturalism sequence and went "yep, that's pretty much a description of what it's like to be me". The idea of having someone hand me a research problem to work on feels bizarre; I can generate so many more ideas than I have time to investigate. It feels like the whole point of life is to figure out everything, and figuring out paradigms feels like the most exciting and rewarding version of that.

I'm also pretty socially resilient. I don't feel a need for my work to be socially legible and I don't really care about reputation or status. If my optimization project turns out not to be useful, my main problem with that will be that I wasted that time not making faster progress on alignment.

So that's how my first year has been. I welcome any feedback or advice. If people have more questions I can write up more sections. And I highly encourage others to write up similar reports of their experiences!

LESSWRONG
LW