Somewhat striking that the top 3 orgs on the career interest survey are Anthropic, DeepMind, and OpenAI.
I personally suspect that these are not the most impactful places for most MATS scholars to work (relative to say, UKAISI/USAISI, METR, starting new orgs/projects).
Regardless, curious if you have any thoughts on this & if it reflects anything about the culture/epistemics in MATS.
(And to be clear, I think the labs do have alignment teams that care about making progress & I suspect that there are some cases where joining a frontier lab alignment team is the most impactful thing for a scholar.)
I think the high interest in working at scaling labs relative to governance or nonprofit organizations can be explained by:
Note that the career fair survey might tell us little about how likely scholars are to start new projects as it was primarily seeking interest in which organizations should attend, not in whether scholars should join orgs vs. found their own.
Thanks for these explanations– I think they're reasonable & insightful. A few thoughts:
Most of the scholars in this cohort were working on research agendas for which there are world-leading teams based at scaling labs
I suspect there's probably some bidirectional causality here. People want to work at scaling labs because they're interested in the research that scaling labs are doing, and people want to focus on the research the scaling labs are doing because they want to work at scaling labs.
There seems to be an increasing trend in the AI safety community towards the belief that most useful alignment research will occur at scaling labs
I think this is true among a subset of the AI safety community but I don't think this characterizes the AI safety community as a whole. For example, another (even stronger IMO) trend in the AI safety community has been towards the belief that policy work & technical governance work is more important than many folks previously expected it to be (see EG Paul joining USAISI, MIRI shifting to technical governance, UKAISI being established, and not to mention the general surge in interest among policymakers).
One perspective on this could be "well, MATS is a technical research program, and we're adding some governance mentors, so shrug." Another perspective on this could be "well, it seems like perhaps MATS is shifting more slowly than one might've imagined, resulting in a culture/ecosystem/mentor cohort/selection process/fellow cohort that disproportionately wants to join scaling labs."
RE shifting more slowly or having a disproportionate focus, note that the ERA fellowship has prioritized toward governance and technical governance– 2/3 of their fellows will be focused on governance + technical governance projects. I'm not necessarily saying this is what would be best for MATS, but it at least points out that we should be seeing MATS' focus on incubating "technical researchers that want to work at scaling labs" as something that's part of its design.
I might be a bit "biased" in that I work in AI policy and my worldview generally suggests that AI policy (as well as technical governance) is extremely neglected. I personally think it's harder to make the case that giving scaling labs better alignment talent is as neglected– it's still quite important, but scaling labs are extremely popular & I think their ability to hire (and pay for) top technical talent is much stronger than that of governments.
Anecdotally, scholars seemed generally in favor of careers at an AISI or evals org, but would prefer to continue pursuing their current research agenda
Again, I think my primary response here is something like the research interests of the MATS cohort are a function of the program and its selection process– not an immutable characteristic of the world. The ERA example is a "strong" example of prioritizing people with other interests, but I imagine there are plenty of "weaker" things MATS could be doing to select/prioritize fellows who had an interest in governance & technical governance. (Or put differently, my guess is that there are ways in which the current selection process and mentor pool disproportionately attracts/favors those who are interested in the kinds of topics you mentioned).
If I could wave a magic wand, I would probably have MATS add many more governance & technical governance mentors and shift to something closer to ERA's breakdown. This would admittedly be a rather big shift for MATS, and perhaps current employees/leaders/funders wouldn't want to do it. I think it ought to be seriously considered, though, and if I were a MATS exec person or a MATS funder I would probably be pushing for this. Or at least asking some serious questions along the lines of "do we really feel like the most impactful thing a training program could be doing right now is serving as an upskilling program for the scaling labs?" (With all due respect to the importance of getting great people to the scaling labs, acknowledging the importance of technical research at scaling labs, agreeing with some of Neel's points etc.)
It seems plausible to me that at least some MATS scholars are somewhat motivated by a desire to work at scaling labs for money, status, etc. However, the value alignment of scholars towards principally reducing AI risk seems generally very high. In Winter 2023-24, our most empirical research dominated cohort, mentors rated the median scholar's value alignment at 8/10 and 85% of scholars were rated 6/10 or above, where 5/10 was “Motivated in part, but would potentially switch focus entirely if it became too personally inconvenient.” To me this is a very encouraging statistic, but I’m sympathetic to concerns that well-intentioned young researchers who join scaling labs might experience value drift, or find it difficult to promote safety culture internally or sound the alarm if necessary; we are consequently planning a “lab safety culture” workshop in Summer. Notably, only 3.7% of surveyed MATS alumni say they are working on AI capabilities; in one case, an alumnus joined a scaling lab capabilties team and transferred to working on safety projects as soon as they were able. As with all things, maximizing our impact is about striking the right balance between trust and caution and I’m encouraged by the high apparent value alignment of our alumni and scholars.
We additionally believe:
Regarding “AI control,” I suspect you might be underestimating the support that this metastrategy has garnered in the technical AI safety community, particularly among prosaic AGI safety thought leaders. I see Paul’s decision to leave ARC in favor of the US AISI as a potential endorsement of the AI control paradigm over intent alignment, rather than necessarily an endorsement of an immediate AI pause (I would update against this if he pushes more for a pause than for evals and regulations). I do not support AI control to the exclusion of other metastrategies (including intent alignment and Pause AI), but I consider it a vital and growing component of my strategy portfolio.
It’s true that many AI safety projects are pivoting towards AI governance. I think the establishment of AISIs is wonderful; I am in contact with MATS alumni Alan Cooney and Max Kauffman at the UK AISI and similarly want to help the US AISI with hiring. I would have been excited for Vivek Hebbar’s, Jeremy Gillen’s, Peter Barnett’s, James Lucassen’s, and Thomas Kwa’s research in empirical agent foundations to continue at MIRI, but I am also excited about the new technical governance focus that MATS alumni Lisa Thiergart and Peter Barnett are exploring. I additionally have supported AI safety org accelerator Catalyze Impact as an advisor and Manifund Regrantor and advised several MATS alumni founding AI safety projects; it's not easy to attract or train good founders!
MATS has been interested in supporting more AI governance research since Winter 2022-23, when we supported Richard Ngo and Daniel Kokotajlo (although both declined to accept scholars past the training program) and offered support to several more AI gov researchers. In Summer 2023, we reached out to seven handpicked governance/strategy mentors (some of which you recommended, Akash), though only one was interested in mentoring. In Winter 2023-24 we tried again, with little success. In preparation for the upcoming Summer 2024 and Winter 2024-25 Programs, we reached out to 25 AI gov/policy/natsec researchers (who we asked to also share with their networks) and received expressions of interest from 7 further AI gov researchers. As you can see from our website, MATS is supporting four AI gov mentors in Summer 2024 (six if you count Matija Franklin and Philip Moreira Tomei, who are primarily working on value alignment). We’ve additionally reached out to RAND, IAPS, and others to provide general support. MATS is considering a larger pivot, but available mentors are clearly a limiting constraint. Please contact me if you’re an AI gov researcher and want to mentor!
Part of the reason that AI gov mentors are harder to find is that programs like the RAND TASP, GovAI, IAPS, Horizon, ERA, etc. fellowships seem to be doing a great job collectively of leveraging the available talent. It’s also possible that AI gov researchers are discouraged from mentoring at MATS because of our obvious associations with AI alignment (it’s in the name) and the Berkeley longtermist/rationalist scene (we’re talking on LessWrong and operate in Berkeley). We are currently considering ways to support AI gov researchers who don’t want to affiliate with the alignment, x-risk, longtermist, or rationalist communities.
I’ll additionally note that MATS has historically supported much research that indirectly contributes to AI gov/policy, such as Owain Evans’, Beth Barnes’, and Francis Rhys Ward’s capabilities evals, Evan Hubinger’s alignment evals, Jeffrey Ladish’s capabilities demos, Jesse Clifton’s and Caspar Oesterheldt’s cooperation mechanisms, etc.
In Winter 2023-24, our most empirical research dominated cohort, mentors rated the median scholar's value alignment at 8/10 and 85% of scholars were rated 6/10 or above, where 5/10 was “Motivated in part, but would potentially switch focus entirely if it became too personally inconvenient.”
Wait, aren't many of those mentors themselves working at scaling labs or working very closely with them? So this doesn't feel like a very comforting response to the concern of "I am worried these people want to work at scaling labs because it's a high-prestige and career-advancing thing to do", if the people whose judgements you are using to evaluate have themselves chosen the exact path that I am concerned about.
Of the scholars ranked 5/10 and lower on value alignment, 63% worked with a mentor at a scaling lab, compared with 27% of the scholars ranked 6/10 and higher. The average scaling lab mentors rated their scholars' value alignment at 7.3/10 and rated 78% of their scholars at 6/10 and higher, compared to 8.0/10 and 90% for the average non-scaling lab mentor. This indicates that our scaling lab mentors were more discerning of value alignment on average than non-scaling lab mentors, or had a higher base rate of low-value alignment scholars (probably both).
I also want to push back a bit against an implicit framing of the average scaling lab safety researcher we support as being relatively unconcerned about value alignment or the positive impact of their research; this seems manifestly false from my conversations with mentors, their scholars, and the broader community.
implicit framing of the average scaling lab safety researcher we support as being relatively unconcerned about value alignment or the positive impact of their research
Huh, not sure where you are picking this up. I am of course very concerned about the ability of researchers at scaling labs being capable of evaluating their positive impact in respect to their choice of working at a scaling lab (their job does after all depend on them not believing that is harmful), but of course they are not unconcerned about their positive impact.
This indicates that our scaling lab mentors were more discerning of value alignment on average than non-scaling lab mentors, or had a higher base rate of low-value alignment scholars (probably both).
The second hypothesis here seems much more likely (and my guess is your mentors would agree). My guess is after properly controlling for that you would find a mild to moderate negative correlation here.
But also, more importantly, the set of scholars from which MATS is drawing is heavily skewed towards the kind of person who would work at scaling labs (especially since funding has been heavily skewing towards funding the kind of research that can occur at scaling labs).
Thanks for this (very thorough) answer. I'm especially excited to see that you've reached out to 25 AI gov researchers & already have four governance mentors for summer 2024. (Minor: I think the post mentioned that you plan to have at least 2, but it seems like there are already 4 confirmed and you're open to more; apologies if I misread something though.)
A few quick responses to other stuff:
Do you think there are some cultural things that ought to be examined to figure out why scaling labs are so much more attractive than options that at-least-to-me seem more impactful in expectation?
As a naive guess, I would consider the main reasons to be:
Cheers, Akash! Yep, our confirmed mentor list updated in the days after publishing this retrospective. Our website remains the best up-to-date source for our Summer/Winter plans.
Do you think this is the best thing for MATS to be focusing on, relative to governance/policy?
MATS is not currently bottlenecked on funding for our current Summer plans and hopefully won't be for Winter either. If further interested high-impact AI gov mentors appear in the next month or two (and some already seem to be appearing), we will boost this component of our Winter research portfolio. If ERA disappeared tomorrow, we would do our best to support many of their AI gov mentors. In my opinion, MATS is currently not sacrificing opportunities to significantly benefit AI governance and policy; rather, we are rate-limited by factors outside of our control and are taking substantial steps to circumvent these, including:
We appreciate further advice to this end!
Do you think there are some cultural things that ought to be examined to figure out why scaling labs are so much more attractive than options that at-least-to-me seem more impactful in expectation?
I think this is a good question, but it might be misleading in isolation. I would additionally ask:
Starting new technical AI safety orgs/projects seems quite difficult in the current funding ecosystem. I know of many alumni who have founded or are trying to found projects who express substantial difficulties with securing sufficient funding.
Interesting - what's like the minimum funding ask to get a new org off the ground? I think something like $300k would be enough to cover ~9 mo of salary and compute for a team of ~3, and that seems quite reasonable to raise in this current ecosystem for pre-seeding a org.
Yeah, that amount seems reasonable, if on the low side, for founding a small org. What makes you think $300k is reasonably easy to raise in this current ecosystem? Also, I'll note that larger orgs need significantly more.
(EDIT: I just saw Ryan posted a comment a few minutes before mine, I agree substantially with it)
As a Google DeepMind employee I'm obviously pretty biased, but this seems pretty reasonable to me, assuming it's about alignment/similar teams at those labs? (If it's about capabilities teams, I agree that's bad!)
I think the alignment teams generally do good and useful work, especially those in a position to publish on it. And it seems extremely important that whoever makes AGI has a world-class alignment team! And some kinds of alignment research can only really be done with direct access to frontier models. MATS scholars tend to be pretty early in their alignment research career, and I also expect frontier lab alignment teams are a better place to learn technical skills especially engineering, and generally have a higher talent density there.
UK AISI/US AISI/METR seem like solid options for evals, but basically just work on evals, and Ryan says down thread that only 18% of scholars work on evals/demos. And I think it's valuable both for frontier labs to have good evals teams and for there to be good external evaluators (especially in government), I can see good arguments favouring either option.
44% of scholars did interpretability, where in my opinion the Anthropic team is clearly a fantastic option, and I like to think DeepMind is also a decent option, as is OpenAI. Apollo and various academic labs are the main other places you can do mech interp. So those career preferences seem pretty reasonable to me there for interp scholars.
17% are on oversight/control, and for oversight I think you generally want a lot of compute and access to frontier models? I am less sure for control, and think Redwood is doing good work there, but as far as I'm aware they're not hiring.
This is all assuming that scholars want to keep working in the same field they did MATS for, which in my experience is often but not always true.
I'm personally quite skeptical of inexperienced researchers trying to start new orgs - starting a new org and having it succeed is really, really hard, and much easier with more experience! So people preferring to get jobs seems great by my lights
Thanks, Neel! I responded in greater detail to Ryan's comment but just wanted to note here that I appreciate yours as well & agree with a lot of it.
My main response to this is something like "Given that MATS selects the mentors and selects the fellows, MATS has a lot of influence over what the fellows are interested in. My guess is that MATS' current mentor pool & selection process overweights interpretability and underweights governance + technical governance, relative to what I think would be ideal."
I see this is strongly disagree voted - I don't mind, but I'd be curious for people to reply with which parts they disagree with! (Or at least disagree react to specific lines). I make a lot of claims in that comment, though I personally think they're all pretty reasonable. The one about not wanting inexperienced researchers to start orgs, or "alignment teams at scaling labs are good actually" might be spiciest?
Thank you for explaining the shift from scholar support to research management— I found that quite interesting and I don’t think I would’ve intuitively assumed that the research management frame would be more helpful.
I do wonder if as the summer progresses, the role of the RM should shift from writing the reports for mentors to helping the fellows prepare their own reports for mentors. IMO, fellows getting into the habit of providing these updates & learning how to “manage up” when it comes to mentors seems important. I suspect something in the cluster of “being able to communicate well with mentors//manage your mentor+collaborator relationships” is one of the most important “soft skills” for research success. I suspect a transition from “tell your RM things that they include in their report” to “work with your RM to write your own report” would help instill this skill.
This is a good question. I agree that “managing up” is a very important skill in general! It’s one of the primary focuses of our research manager training.
However, I want to acknowledge whether to focus on this with scholars seems to be a question of tradeoffs regarding MATS’s priorities: to what extent are we prioritizing scholars upskilling in deep technical/research understanding available at MATS, versus them upskilling in generalizable soft skills (that they could theoretically learn elsewhere)? If we were to theoretically prioritize solely the former, maximizing time and efficiency between scholars and mentors through taking this burden off of them seems better as this allows us time to improve research skills + time spent on their projects. If we were to prioritize soft skills, though, focusing on them managing up well seems like a good option. (And FWIW, we already do the latter indirectly - but not as a structured offering. We focus much more on things like project management + unblocking scholars with our remaining time.)
To me MATS has primarily been about providing the best environment we can for AI safety mentorship, and increasing the amount of AI safety talent+collabs in the world. I can see an argument here that teaching scholars to manage up does in fact benefit their trajectories holistically, but I would want to balance this against the tradeoff of marginal time spent helping them directly in the counterfactual, be it project management or otherwise preparing for meeting with their mentor. During the main, 10-week phase of MATS, scholars are incredibly time crunched to get a research project done. This pushes me slightly against the idea of spending much concentrated effort on this during the main phase, but not necessarily against some amount of time on this.
That all being said, this seems like a potentially good fit for:
…and maybe some time spent on this towards the end of the 10-week phase in 1-1s, but I’d want to allow wiggle room for prioritizing more critical work as needed.
the research management frame would be more helpful.
I think btw it gets more value than scholar support because it's a proactive service we offer to all scholars on a given stream, rather than waiting for them to only come to us when there's a problem.
the role of the RM should shift from writing the reports for mentors to helping the fellows prepare their own reports for mentors.
I spend a fair amount of time on my projects helping people prep for meetings with their supervisors, yeah. I also used to have scholars edit my written reports before sending to Ethan.
I'm noticing there are still many interp mentors for the current round of MATS -- was the "fewer mech interp mentors" change implemented for this cohort, or will that start in Winter or later?
Last program, 44% of scholar research was on interpretability, 18% on evals/demos, 17% on oversight/control, etc. In summer, we intend for 35% of scholar research to be on interpretability, 17% on evals/demos, 27% on oversight/control, etc., based on our available mentor pool and research priorities. Interpretability will still be the largest research track and still has the greatest interest from potential mentors and applicants. The plot below shows the research interests of 1331 MATS applicants and 54 potential mentors who have applied for our Summer 2024 or Winter 2024-25 Programs.
Note that number of scholars is a much more important metric than number of mentors when it comes to evaluating MATS resources, as scholar per mentors varies a bunch (eg over winter I had 10 scholars, which is much more than most mentors). Harder to evaluate from the outside though!
I don't know the answer to your actual question, but I'll note there are slightly fewer mech interp mentors than mentors listed in the "AI interpretability" area (though all of them are at least doing "model internals"). I'd say Stephen Casper and I aren't focused on interpretability in any narrow sense, and Nandi Schoots' projects also sound closer to science of deep learning than mech interp. Assuming we count everyone else, that leaves 11 out of 39 mentors, which is slightly less than ~8 out of 23 from the previous cohort (though maybe not by much).
I love this report! Shed a tear at not seeing Microsoft on the organization interest chart though 🥲. We could be a better Bing T_T.
Oh, I think we forgot to ask scholars if they wanted Microsoft at the career fair. Is Microsoft hiring AI safety researchers?
Yes, here’s an open position: Research Scientist - Responsible & OpenAI Research. Of course, responsible AI differs from interpretability, activation engineering, or formal methods (e.g., safeguarded AI, singular learning theory, agent foundations). I’ll admit we are doing less of that than I’d prefer, partially because OpenAI shares some of its ‘secret safety sauce’ with us, though not all, and not immediately.
Note from our annual report that we are employing 1% fewer people than this time last year, so headcount is a very scarce resource. However, the news reported we invested ~£2.5b in setting up a new AI hub in London under Jordan Hoffman, with 600 new seats allocated to it (officially, I can neither confirm nor deny these numbers).
I’m visiting there this June after EAG London. We’re the only member of the Frontier Model Forum without an alignment team. MATS scholars would be excellent hires for such a team, should one be established. Some time ago, a few colleagues helped me draft a white paper to internally gather momentum and suggest to leadership that starting one there might be beneficial. Unfortunately, I am not permitted to discuss the responses or any future plans regarding this matter.