Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Trying to break into MIRI-style[1] research seems to be much, much harder than trying to break into ML-style safety research. This is worrying if you believe this research to be important[2]. I'll examine two kinds of causes: those which come from MIRI-style research being a niche area and those which go beyond this:

Challenges beyond MIRI-style research being a niche area:

  • MIRI doesn’t seem to be running internships[3] or running their AI safety for computer scientists workshops
  • If you try to break into ML-style Safety and fail, you can always be reuse at least part of what you've learned to obtain a highly-compensated role in industry. Agent foundations knowledge is highly niche and unlikely to be used elsewhere.
  • You can park in a standard industry job for a while in order to earn career capital for ML-style safety. Not so for MIRI-style research.
  • MIRI publishes a lot less material these days. I support this decision I support as infohazards deserve to be taken seriously, but it also makes it harder to contribute.
  • There are well-crafted materials for learning a lot of the prerequisites for ML-style safety.
  • There seems to be a natural pathway of studying a masters then pursuing a PhD to break into ML-style safety. There are a large number of scholarships available and many countries offer loans or income support.
  • The above opportunities mean that there are more ways to gauge fit for ML-style safety research.
  • There's no equivalent to submitting a paper[4]. If a paper passes review, then it gains a certain level of credibility. There are upvotes, but this signaling mechanism is more distorted by popularity or accessibility. Further, unlike writing an academic paper, writing alignment forum posts won't provide credibility outside of the field.

Challenges that come from being a niche area

I think this probably should be a niche area. It would be a bit strange if foundations work were the majority of the research. Nonetheless, it's worth highlighting some of the implications:

  • General AI safety programs and support - ie. AI Safety Fundamentals Course, AI Safety Support, AI Safety Camp, Alignment Newsletter, ect. are naturally going to strongly focus on ML-style research and might not even have the capability to vet MIRI-style research.
  • It is much harder to find people with similar interests to collaborate with or mentor you. Compare to how easy it is to meet a bunch of people interested in ML-style research by attending EA meetups or EAGx.
  • If you want to feel part of the AI safety community and join in the conversations people are having, you will have to spend time learning about ML-style research. While is likely valuable to broaden your scope as this can lead to cross-pollination, it also sucks up time when you could be learning about MIRI-style research.

Further Thoughts

I think it's worth thinking about what this looks like overall. If you want to try breaking into MIRI-style research, the most likely path looks like saving up 3-12 months runway[5]. 3 months might be possible if you've been consistently working on things in your free time and you've already read a lot of the material that you need to read + made substantial research progress. That said, even if you're able to produce material to prove yourself in 3 months, you'd probably need an extra month or two to obtain funding and you always need more runway than the minimum possible time. It would be possible to apply for an LTFF grant to support this research, but it's probably easier to build up the credibility for ML-style research. Further, if you fail, then you haven't learned skills nor gained credibility that would assist you for any other paths.

I suspect that these considerations not only significantly curtail the number of people who pursue this path, but also ensures that those who do pursue it will often only do so after significant delay.

I guess it's particularly interesting that these difficulties exist in light of the large amount of funding that now appears to be available for AI Safety. In fact, MIRI is now so well-funded that they didn't even bother with a fundraiser this year. I'm not saying that it's easy to resolve these problems by throwing money at them, merely that the availability of funds opens up a lot more options for mitigation.

  1. ^

    I received a comment suggesting that most of what I've written in this article would hold if MIRI-style research" were replaced everywhere with "preparadigmatic research".

  2. ^

    I'm still largely undecided as to whether Agent Foundations research is important. I'm mostly pursuing it due to comparative advantage.

  3. ^

    Although Evan Hubringer, a MIRI researcher, is involved in running internships separately.

  4. ^

    The inability to submit papers also reduces the ability to obtain review and feedback. Less Wrong does have a feedback option, but you can't expect the same level of expertise and engagement that you would if you would receive from submitting a journal article.

  5. ^

    Admittedly, applying to the EA hotel might also be an option depending on your life circumstances.

New to LessWrong?

New Comment
15 comments, sorted by Click to highlight new comments since: Today at 3:24 AM

The object-level claims here seem straightforwardly true, but I think "challenges with breaking into MIRI-style research" is a misleading way to characterize it. The post makes it sound like these are problems with the pipeline for new researchers, but really these problems are all driven by challenges of the kind of research involved.

The central feature of MIRI-style research which drives all this is that MIRI-style research is preparadigmatic. The whole point of preparadigmatic research is that:

  • We don't know the right frames to apply (and if we just picked some, they'd probably be wrong)
  • We don't know the right skills or knowledge to train (and if we just picked some, they'd probably be wrong)
  • We don't have shared foundations for communicating work (and if we just picked some, they'd probably be wrong)
  • We don't have shared standards for evaluating work (and if we just picked some, they'd probable be wrong)

Here's how the challenges of preparadigmicity apply the points in the post.

  • MIRI doesn’t seem to be running internships[3] or running their AI safety for computer scientists workshops

MIRI does not know how to efficiently produce new theoretical researchers. They've done internships, they've done workshops, and the yields just weren't that great, at least for producing new theorists.

  • You can park in a standard industry job for a while in order to earn career capital for ML-style safety. Not so for MIRI-style research.
  • There are well-crafted materials for learning a lot of the prerequisites for ML-style safety.
  • There seems to be a natural pathway of studying a masters then pursuing a PhD to break into ML-style safety. There are a large number of scholarships available and many countries offer loans or income support
  • General AI safety programs and support - ie. AI Safety Fundamentals Course, AI Safety Support, AI Safety Camp, Alignment Newsletter, ect. are naturally going to strongly focus on ML-style research and might not even have the capability to vet MIRI-style research.

There is no standardized field of knowledge with the tools we need. We can't just go look up study materials to learn the right skills or knowledge, because we don't know what skills or knowledge those are. There's no standard set of alignment skills or knowledge which an employer could recognize as probably useful for their own problems, so there's no standardized industry jobs. Similarly, there's no PhD for alignment; we don't know what would go into it.

  • There's no equivalent to submitting a paper[4]. If a paper passes review, then it gains a certain level of credibility. There are upvotes, but this signaling mechanism is more distorted by popularity or accessibility. Further, unlike writing an academic paper, writing alignment forum posts won't provide credibility outside of the field.

We don't have clear shared standards for evaluating work. Most people doing MIRI-style research think most other people doing MIRI-style research are going about it all wrong. Whatever perception of credibility might be generated by something paper-like would likely be fake.

  • It is much harder to find people with similar interests to collaborate with or mentor you. Compare to how easy it is to meet a bunch of people interested in ML-style research by attending EA meetups or EAGx.

We don't have standard frames shared by everyone doing MIRI-style research, and if we just picked some frames they would probably be wrong, and the result would probably be worse than having a wide mix of frames and knowing that we don't know which ones are right.

Main takeaway of all that: most of the post's challenges of breaking into MIRI-style research accurately reflect the challenges involved in doing MIRI-style research. Figuring out new paths, new frames, applying new skills and knowledge, explaining your own ways of evaluating outputs... these are all central pieces of doing this kind of research. If the pipeline did not force people to figure this sort of stuff out, then it would not select for researchers well-suited to this kind of work.

Now, I do still think the pipeline could be better, in principle. But the challenge is to train people to build their own paradigms, and that's a major problem in its own right. I don't know of anyone ever having done it before at scale; there's no template to copy for this. I have been working on it, though.

The object-level claims here seem straightforwardly true, but I think "challenges with breaking into MIRI-style research" is a misleading way to characterize it. The post makes it sound like these are problems with the pipeline for new researchers, but really these problems are all driven by challenges of the kind of research involved.


There's definitely some truth to this, but I guess I'm skeptical that there isn't anything that we can do about some of these challenges. Actually, rereading I can see that you've conceded this towards the end of your post. I agree that there might be a limit to how much progress we can make on these issues, but I think we shouldn't rule out making progress too quickly.

Figuring out new paths, new frames, applying new skills and knowledge, explaining your own ways of evaluating outputs... these are all central pieces of doing this kind of research. If the pipeline did not force people to figure this sort of stuff out, then it would not select for researchers well-suited to this kind of work.


Some of these aspects don't really select for people with the ability to figure this kind of stuff out, but rather strongly select for people who have either saved up money to fund themselves or who happen to be located in the Bay Area, ect.

We don't know the right frames to apply (and if we just picked some, they'd probably be wrong)

Philosophy often has this problem and they address this by covering a wide range of perspectives with the hope that you're inspired by the readings even if none of them are correct.

We don't have clear shared standards for evaluating work. Most people doing MIRI-style research think most other people doing MIRI-style research are going about it all wrong. Whatever perception of credibility might be generated by something paper-like would likely be fake.

This is a hugely difficult problem, but maybe it's better to try rather than not try at all?

There's definitely some truth to this, but I guess I'm skeptical that there isn't anything that we can do about some of these challenges. Actually, rereading I can see that you've conceded this towards the end of your post. I agree that there might be a limit to how much progress we can make on these issues, but I think we shouldn't rule out making progress too quickly.

To be clear, I don't intend to argue that the problem is too hard or not worthwhile or whatever. Rather, my main point is that solutions need to grapple with the problems of teaching people to create new paradigms, and working with people who don't share standard frames. I expect that attempts to mimic the traditional pipelines of paradigmatic fields will not solve those problems. That's not an argument against working on it, it's just an argument that we need fundamentally different strategies than the standard education and career paths in other fields.

I like your summary of the situation:

Most people doing MIRI-style research think most other people doing MIRI-style research are going about it all wrong.

This has also been my experience, at least on this forum. Much less so in academic-style papers about alignment. This has certain consequences for the problem of breaking into preparadigmatic alignment research.

Here are two ways to do preparadigmatic research:

  1. Find something that is all wrong with somebody else's paradigm, then write about it.

  2. Find a new useful paradigm and write about it.

MIRI-style preparadigmatic research, to the extent that it is published, read, and discussed on this forum, is almost all about the first of the above. Even on a forum as generally polite and thoughtful as this one, social media dynamics promote and reward the first activity much more than the second.

In science and engineering, people will usually try very hard to make progress by standing on the shoulders of others. The discourse on this forum, on the other hand, more often resembles that of a bunch of crabs in a bucket.

My conclusion is of course that if you want to break into preparadigmatic research, then you are going about it all wrong if your approach is to try to engage more with MIRI, or to maximise engagement scores on this forum.

In science and engineering, people will usually try very hard to make progress by standing on the shoulders of others. The discourse on this forum, on the other hand, more often resembles that of a bunch of crabs in a bucket.


Hmm... Yeah, I certainly don't think that there's enough collaboration or appreciation of the insights that other approaches may provide.

Any thoughts on how to encourage a healthier dynamic.

Any thoughts on how to encourage a healthier dynamic.

I have no easy solution to offer, except for the obvious comment that the world is bigger than this forum.

My own stance is to treat the over-production of posts of type 1 above as just one of these inevitable things that will happen in the modern media landscape. There is some value to these posts, but after you have read about 20 of them, you can be pretty sure about how the next one will go.

So I try to focus my energy, as a reader and writer, on work of type 2 instead. I treat arXiv as my main publication venue, but I do spend some energy cross-posting my work of type 2 here. I hope that it will inspire others, or at least counter-balance some of the type 1 work.

Alignment Newsletter, ect. are naturally going to strongly focus on ML-style research and might not even have the capability to vet MIRI-style research.

I think I've summarized ~every high-effort public thing from MIRI in recent years (I'm still working on the late 2021 conversations). I also think I understood them better (at time of summarizing) than most other non-MIRI people who have engaged with it.

MIRI also has a standing offer from me to work with me to produce summaries of new things they think should have summaries (though they might have forgotten it at this point -- after they switched to nondisclosed-by-default research I didn't bother reminding them).

Sorry, I wasn't criticizing your work.

I think that the lack of an equivalent of papers for MIRI-style research also plays a role here in that if someone writes a paper it's more likely to make it into the newsletter. So the issue is further down the pipeline.

To be clear, I didn't mean this comment as "stop cricitizing me". I meant it as "I think the statement is factually incorrect". The reason that the newsletter has more ML in it than MIRI work is just that there's more (public) work produced on the ML side.

I don't think it's about the lack of papers, unless by papers you mean the broader category of "public work that's optimized for communication".

Even if the content is proportional, the signal-to-noise ratio will still be much higher for those interested in MIRI-style research. This is a natural consequence of being a niche area.

When I said "might not have the capacity to vet", I was referring to a range of orgs.

I would be surprised if the lack of papers didn't have an effect as presumably, you're trying to highlight high-quality work and people are more motivated to go the extra yard when trying to get published because both the rewards and standards are higher.

[-]evhub2yΩ11210

One of my hopes with the SERI MATS program is that it can help fill this gap by providing a good pipeline for people interested in doing theoretical AI safety research (be that me-style, MIRI-style, Paul-style, etc.). We're not accepting public applications right now, but the hope is definitely to scale up to the point where we can run many of these every year and accept public applications.

Agreed. Thank you for writing this post. Some thoughts:

As somebody strongly on the Agent Foundations train it puzzles me that there is so little activity outside MIRI itself. We are being told there are almost limitless financial resources, yet - as you explain clearly - it is very hard for people to engage with the material outside of LW. 

At the last EA global there was some sort of AI safety breakout session. There were ~12 tables with different topics. I was dismayed to discover that almost every table was full with people excitingly discussing various topics in prosaic AI alignment and other things the AF table had just 2 (!) people.

In general, MIRI has a rather insular view of itself. Some of it is justified. I do think they have done most of the interesting research, are well-aligned, employ many of the smartest & creative people etc.

But the world is very very big. 

I have spoken with MIRI people arguing for the need to establish something like a PhD apprentice-style system. Not much interest.

Just some sort of official & long-term& OFFLINE study program that would teach some of the previous published MIRI research would be hugely beneficial for growing the AF community. 

Finally, there needs to be way more interaction with existing academia. There are plenty of very smart very capable people in academica that do interesting things with Solomonoff induction, with Cartesian Frames (but they call them Chu Spaces), with Pearlian causal inference, with decision theory, with computational complexity & interactive proof systems, with post-Bayesian probability theory etc etc. For many in academia AGI safety is still seen as silly, but that could change if MIRI and Agent Foundations people would be able to engage seriously with academia. 

One idea could be to organize sabbaticals for prominent academics + scholarships for young people. This seems to have happened with the prosaic AI alignment field but not with AF. 

Just some sort of official & long-term& OFFLINE study program that would teach some of the previous published MIRI research would be hugely beneficial for growing the AF community.


Agreed.

At the last EA global there was some sort of AI safety breakout session. There were ~12 tables with different topics. I was dismayed to discover that almost every table was full with people excitingly discussing various topics in prosaic AI alignment and other things the AF table had just 2 (!) people.


Wow, didn't realise it was that little!

I have spoken with MIRI people arguing for the need to establish something like a PhD apprentice-style system. Not much interest.

Do you know why they weren't interested?

Unclear. Some things that might be involved

  • a somewhat anti/non academic vibe
  • a feeling that they have the smartest people anyway, only hire the elite few that have a proven track record
  • feeling that it would take too much time and energy to educate people
  • a lack of organisational energy
  • .... It would be great if somebody from MIRI could chime in.

I might add that I know a number of people interested in AF who feel somewhat afloat/find it difficult to contribute. Feels a bit like a waste of talent

I want to mention that Tsvi Benson-Tilsen is a mentor at this summer's PIBBSS. So some readers might consider applying (the deadline is Jan 23rd).

I myself was mentored by Abram Demski once through the FHI SRF, which AFAIK was matching fellows with a large pull of researchers based on mutual interests.