How To Get Into Independent Research On Alignment/Agency

by johnswentworth18 min read19th Nov 202127 comments

252

Ω 81

CareersAI
Curated
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I’m an independent researcher working on AI alignment and the theory of agency. I’m 29 years old, will make about $90k this year, and set my own research agenda. I deal with basically zero academic bullshit - my grant applications each take about one day’s attention to write (and decisions typically come back in ~1 month), and I publish the bulk of my work right here on LessWrong/AF. Best of all, I work on some really cool technical problems which I expect are central to the future of humanity.

If your reaction to that is “Where can I sign up?”, then this post is for you.

Background Models

Independence

First things first: the “independent” part of “independent research” means self-employment, and everything that goes with it. It means the onus is on you to figure out what to do, how to provide value, what to prioritize, and what to aim for. In practice, it also usually means “independent” in a broader sense: you won’t have a standard template or agenda to follow. If you go down this path, assume that you will need to chart your own course - in particular, your own research agenda.

For the sort of person this post is aimed at, that will be a very big upside, not a downside.

Disclaimer: there are ways to get into alignment research which don’t involve quite so much figuring-it-all-out-on-your-own. Some people receive mentorship from existing researchers. Some people go work for alignment research organizations. Either of those paths can involve “independent research” in the sense that you are technically self-employed, but those paths aren’t “independent” in the broader sense of the word, and they’re not the main topic of this post.

Preparadigmicity

As a field, the study of alignment and agency is especially well-suited to independent research, because they center around problems we don’t understand. It’s not just that we don’t have the answers; we don’t even have the right frames for thinking about the problems. Agency is an area where we are fundamentally confused. AI alignment is largely a problem which hasn’t happened yet, on technology which hasn’t been invented yet, which we nonetheless want to solve in advance. Figuring out the right frames - the right paradigm - is itself a central part of the job.

The field needs people who are going to come up with new frames/approaches/models/paradigms/etc, because we’re pretty sure the current frames/approaches/models/paradigms/etc aren’t enough. Thus the great fit for independent research: as an independent researcher, you’re not beholden to some existing agenda based on existing frames. Coming up with your own idea of what the key problems are, how to frame them, what tools to apply… that sort of thing is exactly what we need, and it requires people who aren’t committed to the strategies of existing senior researchers and organizations. It requires people who have an independent high-level understanding of the field and different angles of looking at, and can pick out the key problems and paths from that perspective.

Again, for the sort of person this post is aimed at, that will be a very big upside.

… but it comes with some trade-offs. As a historical example of preparadigmatic research, here’s Kuhn talking about optics before Newton:

Being able to take no common body of belief for granted, each writer on physical optics felt forced to build his field anew from its foundations. In doing so, his choice of supporting experiment and observation was relatively free, for there was no standard set of methods or of phenomena that every optical writer felt forced to employ and explain. Under these circumstances, the dialogue of the resulting books was often directed as much to the members of other schools as it was to nature.

This very much applies to alignment research. Because the field does not already have a set of shared frames - i.e. a paradigm - you will need to spend a lot of effort explaining your frames, tools, agenda, and strategy. For the field, such discussion is a necessary step to spreading ideas and eventually creating a paradigm. For you, it’s a necessary step to get paid, and to get useful engagement with your work from others.

In particular, you will probably need to both think and write a lot about your strategy: the models and intuitions which inform why you’re working on the particular problems you’ve chosen, why the tools you’re using seem promising, what kinds of results you expect, and what your long-term vision looks like. Inevitably, a lot of this will rely on informal arguments or intuitions; you will need to figure out how to trace the sources of those intuitions and explain them to other people, without having to formalize everything. Explain the actual process which led to an idea/decision/approach, without going down the bottomless rabbit hole of deeply researching every single claim.

The current version of LessWrong was built in large part to support exactly that sort of discussion, and I strongly recommend using it.

Getting Paid

Right now, the best grantmaker in this space is the Long-Term Future Fund (LTFF). There are other options, but none are quite as good a fit for the sort of work we’re talking about here.

I’ve received a few LTFF grants myself and know some of the people involved in the grantmaking decisions, so I’ll give some thoughts on the most important things you’ll need in order to get paid. Bear in mind that this is inherently speculative and not endorsed by anyone at LTFF. I’d also recommend looking at LTFF’s past grants to get a more direct idea of what kinds of things they fund.

Don’t Bullshit

A low-bullshit grantmaking process works both ways. The LTFF wants to do object-level useful things, not just Look Prestigious, so they keep the application simple and the turnaround time relatively fast. The flip side is that I expect them to look very unkindly on bullshit - i.e. attempts to make the applicant/application Sound Prestigious without actually doing object-level useful things.

In academia, it’s common practice to make up some bullshit about how your research is going to help the world. During my undergrad, this sort of bullshit was explicitly taught. Of course, it’s not like anyone is ever going to hire an economist or statistician (let alone consult a prediction market) to figure out whether the research is actually likely to impact the world in the manner claimed. The goal is just to make the proposal sound good. If you’re coming from academia, this sort of bullshit may be an ingrained habit which takes effort to break.

If you want to make it in alignment/agency research, you’re going to need an actual object-level strategy.

We’ll talk more in the next sections about how to come up with a strategy, but the first stop is The Bottom Line: once you’ve chosen a strategy, anything you say to justify it will not make it any more correct. All that matters is the process which originally made you choose that strategy, or made you stick to it at times when you might realistically have changed course. So first things first, forget whatever clever idea you already have cached, and let’s start from a blank slate.

Reading

Preparadigmicity means you’ll need to spend a lot of time explaining your choice of vision, strategy, models, tools, etc. The flip side of that coin is reading: you’ll probably need to read quite a bit of material from others in the field. This is often nontechnical or semi-technical background material, explanations of intuitions, vague gesturing at broad ideas, etc - you can see plenty of it here on LessWrong and the Alignment Forum. The more of this you read, the better you’ll understand other researchers’ frames (or at least know which frames you don’t understand), and the better you’ll be able to explain your own material in terms others can readily understand.

Early on, there are two main motivators for reading:

  • To understand which strategies have already been tried, and failed, to avoid retreading that ground
  • To understand a bit of the existing jargon (definitely not all of it!), in order to explain your own ideas in terms already familiar to others

To understand (some) existing approaches and jargon, I’d recommend at least skimming these sequences/posts, and diving deeper into whichever most resemble the directions you want to pursue:

To understand barriers (other than what’s discussed in the above links), this talk and the Rocket Alignment Problem are probably the best starting points. Note that lots of people disagree with those last two links (as well as 11 Proposals), but you probably want to be at least familiar enough to have an informed disagreement.

Note that this is all on LessWrong, which means you can leave comments with questions, attempts to summarize, disagreements, etc. Often people will reply. This helps a lot for actually absorbing the ideas. (h/t Adam Shimi for pointing this out.)

I invite others to leave suggested reading in the comments. (This does risk turning into a big debate over whether X or Y is actually a good idea for new people, but at least then we’ll have a realistic demonstration of how much everybody disagrees over all this. I did warn you that the field is preparadigmatic!)

Finally, there’s The Sequences. They are long, but if you haven’t read them, then you definitely risk various failure modes which will be obvious to people who have read them and very confusing to you. I wouldn’t quite say they’re required reading, especially if you’re on the more technical end of the spectrum and already somewhat familiar with alignment discussions, but there are definitely many people who will be somewhat surprised if you do technical alignment/agency research and haven’t read them.

Again, I want to emphasize that everyone disagrees on all this stuff. Roughly speaking, assume that the grantmakers care more about your research having some plausible path to usefulness than about agreeing with any particular position in any of the field’s ongoing arguments.

The Hamming Question

Over on the other side of the dining hall was a chemistry table. I had worked with one of the fellows, Dave McCall; furthermore he was courting our secretary at the time. I went over and said, "Do you mind if I join you?" They can't say no, so I started eating with them for a while. And I started asking, "What are the important problems of your field?" And after a week or so, "What important problems are you working on?" And after some more time I came in one day and said, "If what you are doing is not important, and if you don't think it is going to lead to something important, why are you at Bell Labs working on it?" I wasn't welcomed after that; I had to find somebody else to eat with!

Probably the most common mistake people make when first attempting to enter the alignment/agency research field is to not have any model at all of the main bottlenecks to alignment, or how their work will address those bottlenecks. The standard (and strongly recommended) exercise to alleviate that problem is to start from the Hamming Questions:

  • What are the most important problems in your field (i.e. alignment/agency)?
  • How are you going to solve them?

At this point, somebody usually complains that minor contributions are important or some such. I’m not going to argue with that, because I expect the sort of person who this post is already aimed at (i.e. people who are excited to forge their own path in a technical field where everyone is fundamentally confused) is probably not the sort of person who is aiming for minor contributions anyway.

If you have decent answers to the Hamming Questions, and you make those answers clear to other people, that is probably a sufficient condition for your grant application to not end up in the giant pile of applications from people who don’t even have a model of how their proposal will help. It’s not quite a sufficient condition to get paid, but I would guess that a large majority of people who can clearly answer the Hamming Questions do get paid.

I want to emphasize that I think clear answers to the Hamming Questions are an approximately-sufficient condition, not an approximately-necessary condition; there are definitely other paths. Steve’s story in the comments below is a good example; in his words:

If you're a kinda imposter-syndrome-y person who just constitutionally wouldn't dream of looking themselves in the mirror and saying "I am aiming for a major contribution!", well me too, and don't let John scare you off. :-P

Use Your Pareto Frontier

A great line from Adam Shimi:

Most people who try to go in a direction 'no one else has tried' end up going in the most obvious direction which everyone else has tried.

My main advice to avoid this failure mode is to leverage your Pareto frontier. Apply whatever knowledge, or combination of knowledge, you have which others in the field don’t. Personally, I’ve gained a lot of insight into agency by drawing on systems biology, economics, statistical mechanics, and chaos theory. Others draw heavily on abstract math, like category theory or model theory. Evolutionary biology and user interface design are both rich sources.

This is one reason why it helps to have a broad technical background: the more frames and tools you have to draw on, the more likely you’ll find a novel and promising combination to apply to the most important problems in the field. (Or, just as good: the more frames and tools you have to draw on, the more likely you’ll notice that one of the most important problems has been overlooked.)

Flip side of this: if you have a novel-seeming idea which involves the same kinds of frames and tools which most people in alignment have (i.e. programming expertise, some ML experience, reading Astral Codex Ten) then do write it up, but don’t be surprised if it’s already been done.

If you read through some existing alignment work, and the strategy seems obviously wrong to you in a way which would not be obvious to the median LessWrong user, then that’s a very promising sign.

Legibility

Part of getting a grant is not just having a good plan and the skills to execute it, but to make your plan and skills legible to the people reviewing the grant.

Here’s (my summary of) a rough model from Oli, who’s one of the fund managers for LTFF. In order to get a grant for alignment research, usually someone needs to do one of these three:

  1. Write a grant application which clearly signals that they understand the alignment problem and have a non-bullshitted research strategy. (This is rare/difficult.)
  2. Have a reference from someone the fund managers know and trust (i.e. the existing alignment research community).
  3. Have some visible online material which clearly signals that they understand the alignment problem and have a non-bullshitted research strategy. (LessWrong posts/comments are a central example.)

As a new entrant to the field, I expect that option #3 is probably your main path. Write up not just your research strategy, but the intuitions, models and arguments behind that strategy. Give examples. Explain what you consider the key problems, why those problems seem central, and the frames and generators behind that reasoning. Again, give examples. Explain conjectures or tools you think are relevant, ideally with examples. If you’re on the theory side, sketch potential empirical tests; if on the empirical side, sketch the conceptual theory behind the ideas. And include examples. Explain your vision of success, and expected applications of your research (if it succeeds). At all stages, focus on giving accessible, intuitive explanations and lots of examples; even people who have lots of technical background will often skip over sections with just dense math, and not everyone has the same technical background as you. And put the examples at the beginnings of the posts, before the abstract/general explanations.

Remember: this is preparadigmatic work. Writing up the ideas, and the generators of the ideas, and the frames, and the tools, and making it all clear and accessible to people with totally different frames and tools, is a central part of the job.

All this writing will also make option #1 and #2 easier over time: writing a lot of posts and comments will eventually generate social connections (though this takes quite a bit of time, especially if you’re not in the Bay Area), and discussion/feedback will give some idea of how to explain things in a way which signals the kinds-of-things LTFF looks for.

(On the topic of feedback: a lot of more experienced researchers ignore most posts which they don't find very promising, partly because it’s a lot of work to explain/argue about problems and partly because there are too many posts to read it all anyway. If you explicitly reach out - e.g. send a message on LessWrong - and ask for feedback, people are much more likely to tell you what they think.)

By the time all that is written up and posted, the grant application itself is a drop in the bucket; that’s a big part of why it only takes a day to write up. A quote from Oli regarding the actual application:

I really wish people would just pretend they’re writing me an email explaining what they plan to do, rather than something aimed at the general public.

This is part of why option #1 is rare - people try to write the LTFF application like it’s an academic grant application or something, and it really isn’t. But also, clear communication is just pretty hard in general, even when you do understand the problem and have a non-bullshitted strategy.

When To Start

This post was mostly written for people who already have the technical skills they need. That probably means grad-level education, though a PhD is definitely not a formal requirement. I know at least a few who think less-than-a-full-undergrad can suffice. Personally, I never went to grad school (though admittedly my undergrad coursework looks an awful lot like a PhD program; I got an unusually large amount of mileage out of it).

In terms of specific skills, I recently wrote a study guide with a bunch of technical topics I’ve found useful, but the more important point is that we don’t currently know what the right combination of background knowledge is. If you already have a broad technical background, then my advice is to take a stab at the problem and see how it goes.

If you are currently in high school or undergrad, the study guide has some recommendations for what to study (and why). The larger your knowledge base, the more tools and frames you’ll have to draw on later. You could also apply for a grant to e.g. pursue some alignment/agency research project over the summer; taking a stab at it will give you some firsthand data on what kinds of tools/frames are useful.

Runway

The grant application takes maybe a day, but there will probably be some groundwork before you’re ready for that. You’ll probably want to read a bunch, figure out a strategy, put up a few posts on it, and maybe update in response to feedback.

Personally, I quit my job as a data scientist in late 2018, and tried out a few different things over the course of the next year before settling into alignment/agency research. I got my first grant in late 2019. If someone with roughly my 2018 level of background knew up front that they wanted to enter the field, I think it would take a lot less time than that; a few months would be my guess. That said, my level of background in 2018 was already well above zero.

I wrote a fair bit on LessWrong, and researched some agency problems, even before quitting my job. I do expect it helps to “ease into it” this way, and if you’re coming in fresh you should probably give yourself extra time to start writing up ideas, following the field, and getting feedback. That said, you should probably plan on going full time at latest by the time you get a grant, and possibly sooner. If you’re in academia, then you’ll probably have more room to aim the bulk of your research at alignment without striking out on your own. (Though you should still totally strike out on your own and enjoy the no-academic-bullshit lifestyle.)

Meta

Historically, EA causes (including alignment) have largely drawn from very young populations (mostly undergrads). I believe this is mostly because (a) those are the people who don’t need to be drawn away from a different path which they’re already on, (b) they’re willing to work for peanuts, and (c) they don’t have to unlearn how to bullshit. Unfortunately, a lot of alignment research benefits from a broad technical background, which takes time to build up. So I think we’ve historically had fewer researchers with that sort of broad knowledge than would be ideal, just because we tend to recruit young people.

But conditions have changed in recent years, and I think there’s now room for a different kind of recruitment, aimed at (somewhat) older people with more knowledge and experience.

First: the Sequences are about ten years old, so right about now there are probably a bunch of postgrads and adjunct professors with lots of technical skills who have already read them, have decent epistemic habits (i.e. know how to not bullshit), and have a rough understanding of what the alignment problem is.

Second: nowadays, we have money. If you’re a postgrad or adjunct professor or whatever, and you can do good technical alignment research, you can probably make more money as an independent researcher in alignment than you do now. Our main grantmaker has an application form which takes maybe a few hours at most, usually comes back with a decision in under a month, and complains that it doesn’t have enough good projects to spend its money on.

So if you’re the sort of person who:

  • Wants to tackle big open research problems
  • … in a field where everyone is confused and we don’t have a paradigm yet and you have to basically chart your own course
  • … and the stakes are literally astronomical
  • … and you have a bunch of technical skills, maybe read the sequences ten years ago, and have a basic understanding of what AI alignment is and why it’s hard

… then now is a good time to sit down with a notebook and think about how you’d go about understanding alignment/agency. If you have any promising ideas, write them up, post them here on LessWrong, and apply for a grant to pursue this research full-time.

I can attest that it’s an awesome job.

252

Ω 81

27 comments, sorted by Highlighting new comments since Today at 5:24 AM
New Comment

the sort of person who this post is already aimed at (i.e. people who are excited to forge their own path in a technical field where everyone is fundamentally confused) is probably not the sort of person who is aiming for minor contributions anyway.

For me, there were two separate decisions. (1) Around March 2019, having just finished my previous intense long-term internet hobby, I figured my next intense long-term internet hobby was gonna be AI alignment; (2) later on, around June 2020, I started trying to get funding for full-time independent work. (I couldn't work at an org because I didn't want to move to a different city.)

I want to emphasize that at the earlier decision-point, I was absolutely "aiming for minor contributions". I didn't have great qualifications, or familiarity with the field, or a lot of time. But I figured that I could eventually get to a point where I could write helpful comments on other people's blog posts. And that would be my contribution!

Well, I also figured I should be capable of pedagogy and outreach. And that was basically the first thing I did—I wrote a little talk summarizing the field for newbies, and gave it to one audience, and tried and failed to give it to a second audience.

(I find it a lot easier to "study topic X, in order to do Y with that knowledge", compared to "study topic X" full stop. Just starting out on my new hobby, I had no Y yet, so "giving a pedagogical talk" was an obvious-to-me choice of Y.)

Then I had some original ideas! And blogged about them. But they turned out to be bad.

Then I had different original ideas! And blogged about them in my free time for like a year before I applied for LTFF.

…and they rejected me. On the plus side, their rejection came with advice about exactly what I was missing if I wanted to reapply. On the minus side, the advice was pretty hard to follow, given my time constraints. So I started gradually chipping away at the path towards getting those things done. But luckily I wound up getting a different grant a few months later (yay).

With that background, a few comments on the post:

I wrote a fair bit on LessWrong, and researched some agency problems, even before quitting my job. I do expect it helps to “ease into it” this way, and if you’re coming in fresh you should probably give yourself extra time to start writing up ideas, following the field, and getting feedback.

I also went down the "ease into it" path. It's especially (though not exclusively) suitable for people like me who are OK with long-term intense internet hobbies. (AI alignment was my 4th long-term intense internet hobby in my lifetime. Probably last. They are frankly pretty exhausting, especially with a full-time job and kids.)

Probably the most common mistake people make when first attempting to enter the alignment/agency research field is to not have any model at all of the main bottlenecks to alignment, or how their work will address those bottlenecks.

Just to clarify:

This quote makes sense to me if you read "when first attempting to enter the field" as meaning "when first attempting to enter the field as a grant-funded full-time independent researcher".

On the other hand, when you're first attempting to learn about and maybe dabble in the field, well obviously you won't have a good model of the field yet.

One more thing:

the sort of person who this post is already aimed at (i.e. people who are excited to forge their own path in a technical field where everyone is fundamentally confused) is probably not the sort of person who is aiming for minor contributions anyway.

If you're a kinda imposter-syndrome-y person who just constitutionally wouldn't dream of looking themselves in the mirror and saying "I am aiming for a major contribution!", well me too, and don't let John scare you off. :-P

I can attest that it’s an awesome job.

I agree!

I love this post. Thanks, John.

Thanks for this post, these kinds of details seem very useful for anyone wanting to attempt this path!

A worry I have: there are people who long for the imagined lifestyle and self-description of being an independent AI alignment/agency researcher.  I would categorize some of my past selves this way.

For many such people, trying to follow this path too enthusiastically would be bad for them -- but they might not have the memetic immunities that protect them from those bad decisions.  For instance, their social safety net might be insufficient for the level of financial risk, or the career tradeoffs might be very large.  This post is enthusiastic, but I think many people need to be urged caution when making major life changes -- especially around such high stakes causes, where emotions run high.

So for my past selves, I'd disclaim:

  • It's ok (and good) to prioritize your own financial and social safety net.  You can revisit your ability to contribute from a better position in the future.  The risks of things not going as well for you are very real.
  • When starting down such a path, you should have a clear fallback plan that does not involve immense suffering.  For instance, make effort for X time period before attempting to find an alternate job if you have not achieved Y income.  Do this only if you have confidence you will not take a too-large psychological hit from the failure.

Yeah, I took the extremely-low-risk option of tinkering away as a hobby, while working a normal industry job, until I had a new income source in hand. So I had no employment gap. That turned out to be a viable option for me, but YMMV. For example, some jobs suck up all your time or energy, leaving no slack for side-projects. Anyone can DM me for other tips and tricks. :)

amazing post! scaling up the community of independent alignment researchers sounds like one of the most robust ways to convert money into relevant insights.

As nobody else has mentioned it yet in this comment section: AI Safety Support is a resource-hub specifically set up to help people get into alignment research field.

I am a 50 year old independent alignment researcher. I guess I need to mention for the record that I never read the sequences, and do not plan to. The piece of Yudkowsky writing that I'd recommend everybody interested in alignment should read is Corrigibilty. But in general: read broadly, and also beyond this forum.

I agree with John's observation that some parts of alignment research are especially well-suited to independent researchers, because they are about coming up with new frames/approaches/models/paradigms/etc.

But I would like to add a word of warning. Here are two somewhat equally valid ways to interpret LessWrong/Alignment Forum:

  1. It is a very big tent that welcomes every new idea

  2. It is a social media hang-out for AI alignment researchers who prefer to engage with particular alignment sub-problems and particular styles of doing alignment research only.

So while I agree with John's call for more independent researchers developing good new ideas, I need to warn you that your good new ideas may not automatically trigger a lot of interest or feedback on this forum. Don't tie your sense of self-worth too strongly to this forum.

On avoiding bullshit: discussion on this forum are often a lot better than on some other social media sites, but still Sturgeon's law applies.

Currently being a medical student that's very into AI, a dream of mine is to be in independant researcher in computational psychiatry.

Your post is very inspiring.

Curated. This post matched my own models of how folk tend to get into independent alignment research, and I've seen some people whose models I trust more endorse the post as well. Scaling good independent alignment research seems very important.

I do like that the post also specifies who shouldn't be going to independent research.

Hi John, thanks a lot.

Your posts are coming at the perfect time. I just gave my notice at my current job, I have about 3 years of runway ahead of me in which I can do whatever I want. I should definitely at least evaluate AI Safety research. My background is a bachelor's in AI (that's a thing in the Netherlands). The little bits of research I did try got good feedback.

Even though I'm in a great position to try this, it still feels like a huge gamble. I'm aware that a lot of AI Safety research is already of questionable quality. So my question is: how can I determine as quickly as possible whether I'm cut out for this?

Not just asking to reduce financial risk, but also because I feel like my learning trajectory would be quite different if I already knew that it was going to work out in the long run. I'd be able to study the fundamentals a lot more before trying research.

Man, this is a tough question. Evaluating the quality of research in the field is already a tough problem that everybody disagrees on, and as a result people disagree on what sort of people are well-suited to the work. Evaluating it for yourself without already being an expert in the field is even harder. With that in mind, I'll give an answer which I think a reasonably-broad chunk of people would agree with, but with the caveat that it is very very incomplete.

I had a chat with Evan Hubinger a few weeks ago where we were speculating on how our evaluations of grant applications would compare. (I generally don't evaluate grant applications, but Evan does.) We have very different views on what-matters-most in alignment, and agreed that our rankings would probably differ a lot. But we think we'd probably mostly agree on the binary cutoff - i.e. which applications are good enough to get funding at all. That's because at the moment, money is abundant enough that it makes sense to invest in projects based on views which I think are probably wrong but at least have some plausible model under which they could be valuable. If there's a project where Evan would assign it high value, and Evan's model is itself a model-which-I-think-is-probably-wrong-but-still-plausible, then that's enough to merit a grant. (It's a hits-based grantmaking model.) Likewise, I'd expect Evan to view things-I'd-consider-high-value in a similar way.

Assuming that speculation is correct, the main grants which would not be funded are those which (as far as the grant evaluator can tell) don't have any plausible model under which they'd be valuable. Thus the importance of building your own understanding of the whole high-level problem and answering the Hamming Questions: if you can do that, then you have a model under which your research will be valuable, and all that's left is to communicate that model and your plan.

Now back to your perspective. You're already hanging around and commenting on LessWrong, so right out the gate I have a somewhat-higher-than-default prior that you can evaluate the "some model under which the research is valuable" criterion. You're likely to already have the concepts of Bottom Line and Trying to Try and so forth (even if you haven't read those exact posts); you probably already have some intuition for the difference between a plan designed to actually-do-the-thing, versus a plan designed to look-like-it's-doing-the-thing or to look-like-it's-trying-to-do-the-thing. That doesn't mean you already have enough of a model of the alignment/agency problems or a promising thread to tackle them, but hopefully you can at least tell if and when you do have those things.

Based on your comment, I'm more motivated to just sit down and (actually) try to solve AI Safety for X weeks, write up my results and do an application. What is your 95% confidence interval for what X needs to be to reduce the odds of a false negative (i.e. my grant gets rejected but shouldn't have been) to a single digit? 

I'm thinking of doing maybe 8 weeks. Maybe more if I can fall back on research engineering so that I haven't wasted my time completely.

My main modification to that plan would be "writing up your process is more important than writing up your results"; I think that makes a false negative much less likely.

8 weeks seems like it's on the short end to do anything at all, especially considering that there will be some ramp-up time. A lot of that will just be making your background frames/approach more legible. I guess viability depends on exactly what you want to test:

  • If your goal is write up your background models and strategy well enough to see if grantmakers want to fund your work based on them, 8 weeks is probably sufficient
  • If your goal is to see whether you have any large insights or make any significant progress, that usually happens for me on a timescale of ~3 months

It sounds like you want to do something closer to the latter, so 12-16 weeks is probably more appropriate?

I'm aware that a lot of AI Safety research is already of questionable quality. So my question is: how can I determine as quickly as possible whether I'm cut out for this?

My key comment here is that, to be an independent researcher, you will have to rely day-by-day on your own judgement on what has quality and what is valuable. So do you think you have such judgement and could develop it further?

To find out, I suggest you skim a bunch of alignment research agendas, or research overviews like this one, and then read some abstracts/first pages of papers mentioned in there, while trying apply your personal, somewhat intuitive judgement to decide

  • which agenda item/approach looks most promising to you as an actual method for improving alignment

  • which agenda item/approach you feel you could contribute most to, based on your own skills.

If your personal intuitive judgement tells you nothing about the above questions, if it all looks the same to you, then you are probably not cut out to be an independent alignment researcher.

Ty John for writing this up. This post and the comments really helps me find my own place and direction in terms of doing what I want to do.

Currently in academia and VERY unhappy about the bullshit I have to ingest and create. But I'm still waiting on my social, political, and financial safety nets before I can do anything remotely brave, kindda like tryactions mentioned in his comment.

So the most I'll do is probably just read and write and talk to people on the side.

Speaking of talking to people...

My current research involves (manually) using CPU architectural artifacts to break sandboxing and steal data. I've been wondering whether I could do something on the lines of "make a simple AI that tries to break out of sandboxes, then make an unbreakable sandbox to contain it".

Do shoot me a message if you have any thoughts, or is just curious. I would love to chat.

I'm guessing you're aware but Jim Babcock and others have thought a bit about AI containment and wrote about it in Guidelines for AI Containment.

This is super-encouraging given my circumstances/desires. Thank you so much for posting this!

Thank you for this post very encouraging - I was thinking about applying to LTFF - I have all pre-requisites, now I feel it’s worth the try.

Hi johnswentworth,

Would you have some spare time in the next few weeks to discuss with me the benefits an independent researcher should expect to receive from joining an independent researcher institute?

You can reach me at kylethefox1@gmail.com. 


Thank you for your time and consideration.
 

If you're talking about Theiss or Ronin or IGDORE or things like that, see discussion here.

Oh perfect, I hadn't seen that. Strong upvote, very helpful.

Thank you for the link.

It's a very short discussion: there is no independent researcher institute. There are independent researchers, and we have no institute; that's what independent research (in the most literal sense) means.

... ok, actually, there is kind of an independent researcher institute. It's called the Ronin Institute. I'm not affiliated with them at all, and don't really know much about them. My understanding is that they provide an Official-Sounding Institute for independent researchers (in basically any academic field) to affiliate with, and can provide a useful social circle of other independent researchers. Again, I have no connection to them at all, and no particular advice about them.

[EDIT: never mind, go follow Steve's links above.]

My guess is that you intended to ask a different question than that. Can you give an example of the sort of thing you're asking about?

Almost there. My question was actually concerning the expected benefits from affiliating with an independent researcher institute. For example, an independent researcher would expect to receive grant administration (if funded) and virtual infrastructure services as benefits from Theiss Research in exchange for their affiliation.

Please let me know if there is a need for further clarification. 
 

Oh cool, that is what you were asking. I guess Steve's got you covered, then; I don't really know any more about it.

Yes. Thank you for your time and replying to my question.

Giving a perspective from another country that is far more annoying in administrative terms (France), grant administration can be a real plus. I go through a non-profit in France, and they can take care of the taxes and the declarations, which would be a hassle. In addition, here being self-employed is really bad for many things you might want to do (rent a flat, get a loan, pay for unemployment funds), and having a real contract helps a lot with that.

Thank you for replying adamShimi and providing your perspective on this matter.

Regarding benefits, are there other things you desire beside grant administration and employment status?