Aren't the central example of founders in AI Safety the people who founded Anthropic, OpenAI and arguably Deepmind? Right after that Mechanize comes to mind.
I am not fully sure what you mean by founders, but it seems to me that the best organizations were founded by people who also wrote a lot, and generally developed a good model of the problems in parallel to running an organization. Even this isn't a great predictor. I don't really know what is. It seems like generally working in the space is just super high variance.
To be clear, overall I do think many more people should found organization, but the arguments in this post seem really quite weak. The issue is really not that otherwise we "can't scale the AI Safety field". If anything it goes the other way around! If you just want to scale the AI safety field, go work at one of the existing big organizations like Anthropic, or Deepmind, or Far Labs or whatever. They can consume tons of talent, and you can probably work with them on capturing more talent (of course, I think the consequences of doing so for many of those orgs would be quite bad, but you don't seem to think so).
Also, to expand some more on your coverage of coun...
Thanks for reading and replying! I'll be brief:
the bar at MATS has raised every program for 4 years now
What?! Something terrible must be going on in your mechanisms for evaluating people (which to be clear, isn't surprising, indeed, you are the central target of the optimization that is happening here, but like, to me it illustrates the risks here quite cleanly).
It is very very obvious to me that median MATS participant quality has gone down continuously for the last few cohorts. I thought this was somewhat clear to y'all and you thought it was worth the tradeoff of having bigger cohorts, but you thinking it has "gone up continuously" shows a huge disconnect.
Like, these days at the end of a MATS program half of the people couldn't really tell you why AI might be an existential risk at all. Their eyes glaze over when you try to talk about AI strategy. IDK, maybe these people are better ML researchers, but obviously they are worse contributors to the field than the people in the early cohorts.
Goodfire, AIUC, Lucid Computing, Transluce, Seismic, AVERI, Fathom
Yeah, I mean, I do think I am a lot more pessimistic about all of these. If you want we can make a bet on how well things have...
The MATS acceptance rate was 33% in Summer 2022 (the first program with open applications) and decreased to 4.3% (in terms of first-stage applicants; ~7% if you only count those who completed all stages) in Summer 2025. Similarly, our mentor acceptance rate decreased from 100% in Summer 2022 to 27% for the upcoming Winter 2026 Program.
I mean, in as much as one is worried about Goodhart's law, and the issue in contention is adversarial selection, then the acceptance rate going down over time is kind of the premise of the conversation. Like, it would be evidence against my model of the situation if the acceptance rate had been going up (since that would imply MATS is facing less adversarial pressure over time).
I don't have plots prepared, but measures of scholar technical ability (e.g., mentor ratings, placements, CodeSignal score) have consistently increased. I feel very confident that MATS is consistently improving in our ability to find, train, and place ML (and other) researchers in AI safety roles, predominantly as "Iterators".
Mentor ratings is the most interesting category to me. As you can imagine I don't care much for ML skill at the margin. CodeSignal is a bit interesting th...
A few quick comments, on the same theme as but mostly unrelated to the exchange so far:
I might have a special view here since I did MATS 4.0 and 8.0.
I think I met some excellent people at MATS 8.0 but would not say they are stronger than 4.0, my guess is that quality went down slightly. I remember in 4.0 a few people that impressed me quite a lot, which I saw less in 8.0. (4.0 had more very incompetent people though)
at the end of a MATS program half of the people couldn't really tell you why AI might be an existential risk at all.
I think this is sadly somewhat true, I talked with some people in 8.0 who didn't seem to have any particular concern with AI existential risk or seemingly never really thought about that. However, I think most people were in fact very concerned about AI existential risk. I ran a poll at some point about Eliezer's new book and a significant minority of students seemed to have pre-ordered Eleizer's book, which I guess is a pretty good proxy for whether someone is seriously engaging with AI X-risk.
My guess is that the recruitment process might need another variable to measure rather than academics/coding/ml experience. The kind of thing that Tim Hua (8.0 scholar) has who created an AI psychosis bench.
Also it seems to me that if you build ...
(Derailing, What I am saying here is not central to the argument you are making here)
just end up with someone making a bunch of vaguely safety-adjacent RL environments that get sold to big labs
While I think building safety-adjacent RL envs is worse than most kinds of technical safety work for people who are very high context in AGI safety, I think it's net positive.
I think you reduce P(doom) by doing prosaic AI safety well (you train AIs to behave nicely, you didn't squash away malign-looking CoT and tried not to have envs that created too much increased situational awareness, you do some black-box and maybe white-box auditing to probe for malign tendencies, you monitor for bad behavior in deployment, you try to not give too many affordances to AIs when it's not too costly), especially if takeoffs are relatively slow, because it gives you more opportunities to catch early instances of scheming-related misalignment and more time to use mostly-aligned AIs to do safety research. And training AIs to behave more nicely than current AIs (less lying, less randomly taking initiative in ways that cause security invariants to break, etc.) is important because:
I want to register disagreement. Multiplier effects are difficult to get and easy to overestimate. It's very difficult to get other people working on the right problem, rather than slipping off and working on an easier but ultimately useless problem. From my perspective, it looks like MATS fell into this exact trap. MATS has kicked out ~all the mentors who were focused on real problems (in technical alignment) and has a large stack of new mentors working on useless but easy problems.
[Edit 5hrs later: I think this has too much karma because it's political and aggressive. It's a very low effort criticism without argument.]
To clarify, by "kicking out" Jeremy is referring to two mentors in particular, both of whom got a lot of support from PIBBSS and one of whom seemed to want more of an engineering assistant than a research scholar. I think both do important research and it was a tough decision, informed by our mentor selection committee, which included experts in their field, and past scholar feedback. I offered help hiring to both, including our alumni hiring database.
(Ryan is correct about what I'm referring to, and I don't know any details).
I want to say publicly, since my comment above is a bit cruel in singling out MATS specifically: I think MATS is the most impressively well-run organisation that I've encountered, and overall supports good research. Ryan has engaged at length with my criticisms (both now and when I've raised them before), as have others on the MATS team, and I appreciate this a lot.
Ultimately most of our disagreements are about things that I think a majority of "the alignment field" is getting wrong. I think most people don't consider it Ryan's responsibility to do better at research prioritization than the field as a whole. But I do. It's easy to shirk responsibility by deferring to committees, so I don't consider that a good excuse.
A good excuse is defending the object-level research prioritization decisions, which Ryan and other MATS employees happily do. I appreciate them for this, and we agree to disagree for now.
Tying back to the OP, I maintain that multiplier effects are often overrated because of people "slipping off the real problem" and this is a particularly large problem with founders of new orgs.
I think that being a good founder in AI safety is very hard, and generally only recommend doing it after having some experience in the field - this strongly applies to research orgs, but also to eg field building. If you're founding something, you need to constantly make judgements about what is best, and don't really have mentors to defer to, unlike many entry level safety roles, and often won't get clear feedback from reality if you get them wrong. And these are very hard questions, and if you don't get them right, there's a good chance your org is mediocre. I think this applies even to orgs within an existing research agenda (most attempts to found mech interp orgs seem doomed to me). Field building is a bit less dicey, but even then, you want strong community connections and a sense for what will and will not work.
I'm very excited for there to be more good founders in AI Safety, but don't think loudly signal boosting this to junior people is a good way to achieve this. And imo "founding an org" is already pretty high status, at least if you're perceived to have some momentum behind you?
I'm also fine with people without a lot of AI safety expertise partnering with those who do have it as co founders, but I struggle to think of orgs that I think have gone well who didn't have at least one highly experienced and competent co-founder
I naively expect the process of startup ideation and experimentation, aided by VC money
It's very difficult to come with AI safety startup ideas that are VC-fundable. This seems like a recipe for coming up with nice-sounding but ultimately useless ideas, or wasting a lot of effort on stuff that looks good to VCs but doesn't advance AI safety in any way.
Great founders and field-builders have multiplier effects on recruiting, training, and deploying talent to work on AI safety [...] If we want to 10-100x the AI safety field in the next 8 years, we need multiplicative capacity, not just marginal hires
I spent much of 2018-2020 trying to help MIRI with recruiting at AIRCS workshops. At the time, I think AIRCS workshops and 80k were probably the most similar things the field had to MATS, and I decided to help with them largely because I was excited about the possibility of multiplier effects like these.
The single most obvious effect I had on a participant—i.e., where at the beginning of our conversations they seemed quite uninterested in working on AI safety, but by the end reported deciding to—was that a few months later they quit their (non-ML) job to work on capabilities at OpenAI, which they have been doing ever since.
Multiplier effects are real, and can be great; I think AIRCS probably had helpful multiplier effects too, and I'd guess the workshops were net positive overall. But much as pharmaceuticals often have paradoxical effect—i.e., to impact the intended system in roughly the intended way, except with the sign of the key eff...
I like the phrase "paradoxical impact".
I feel considerations around paradoxical impact are a big part of my wworld model and I wowould like to see more discussion about it
I do think I'd feel very alarmed by the 27% figure in your position—much more alarmed than e.g. I am about what happened with AIRCS, which seems to me to have failed more in the direction of low than actively bad impact—but to be clear I didn't really mean to express a claim here about the overall sign of MATS; I know little about the program.
Rather, my point is just that multiplier effects are scary for much the same reason they are exciting—they are in effect low-information, high-leverage bets. Sometimes single conversations can change the course of highly effective people's whole careers, which is wild; I think it's easy to underestimate how valuable this can be. But I think it's similarly easy to underestimate their risk, given that the source of this leverage—that you're investing relatively little time getting to know them, etc, relative to the time they'll spend doing... something as a result—also means you have unusually limited visibility into what the effects will be.
Given this, I think it's worth taking unusual care, when pursuing multiplier effect strategies, to model the overall relative symmetry of available risks/rewards in the domain. For example, whether A) there might be lemons market problems, such that those who are easiest to influence (especially quickly) might tend all else equal to be more strategically confused/confusable, or B) whether there might in fact currently be more easy ways to make AI risk worse than better, etc.
That may be, but personally I am unpersuaded that the observed paradoxical impacts should update us that the world would have been better off if we hadn't made the problem known, since I roughly can't imagine worlds where we do survive where the problem wasn't made known, and I think it should be pretty expected with a problem this confusing that initially people will have little idea how to help, and so many initial attempts won't. In my imagination, at least, basically all surviving worlds look like that at first, but then eventually people who were persuaded to worry about the problem do figure out how to solve it.
(Maybe this isn't what you mean exactly, and there are ways we could have made the problem known that seemed less like "freaking out"? But to me this seems hard to achieve, when the problem in question is the plausibly relatively imminent death of everyone).
I don't think this captures the counterarguments well. So here is one
You can imagine a spectrum of funders where on one hand, you have people who understand themselves as funders and want to be marshaling an army to solve AI alignment. On the other side, you have basically researchers who see work that should be done, don't have capacity to do the work themselves, and this leads them to create teams and orgs - "reluctant founders".
It's reasonable to be skeptical about what the "funder type" end of the spectrum will do.
In normal startups, the ultimate feedback loop is provided by the market. In AI safety nonprofits, the main feedback loops are provided by funders, AGI labs, and Bay Area prestige gradients.
Bay Area prestige gradients are to a large extent captured by AGI labs - the majority of quality-weighted "AI safety" already works there, the work is "obviously impactful", you are close to the game, etc. also normal ML people also want to work there.
If someone wants to scale a lot, "funders" means mostly OpenPhil - no other source would fund the army. The dominant OpenPhil worldview is closely related to Anthropic - for example, until recently you have hea...
I agree the AI safety field in general vastly undervalues building things, especially compared to winning intellectual status ladders (e.g. LessWrong posting, passing the Anthropic recruiting funnel, etc.).
However, as I've written before:
[...] the real value of doing things that are startup-like comes from [...] creating new things, rather than scaling existing things [...]
If you want to do interpretability research in the standard paradigm, Goodfire exists. If you want to do evals, METR exists. Now, new types of evals are valuable (e.g. Andon Labs & vending bench). And maybe there's some interp paradigm that offers a breakthrough.
But why found? Because there is a problem where everyone else is dropping the ball, so there is no existing machine where you can turn the crank and get results towards that problem.
Now of course I have my opinions on where exactly everyone else is dropping the ball. But no doubt there are other things as well.
To pick up the balls, you don't start the 5th evals company or the 4th interp lab. My worry is that that's what all the steps listed in "How to be a founder" point towards. Incubators, circulating pitches, asking for feedback on ideas, applying ...
This is my first Lesswrong comment - any feedback appreciated.
My quick takes (with a similar conflict: I'm doing AIS field-building).
I don't think I quite understand the distinction you are trying to draw between "founders" and (not a literal quote) "people who do object-level work and make intellectual contributions by writing".
If you're the CEO of a company, it's your job to understand the space your company works in and develop extremely good takes about where the field is going and what your company should do, and use your expertise in leveraged ways to make the company go better.
In the context of AI safety, the key product that organizations are trying to produce is often itself re...
I totally agree with the sentiment here!
As both a researcher, founder, and early employee of multiple non-profits around this space, I think it's critical to start building out the infrastructure to leverage talent and enable safety work. Right now, there isn't much to support people making their own opportunities, not to mention that doing so necessarily requires a more stable financial situation than is possible for many individuals.
One of my core goals starting Kairos.fm was to help others who are wanting to start their own projects (e.g. podcasts), and...
escaping flatland: career advice for CS undergrads
...one way to characterise a scene is by what it cares about: its markers of prestige, things you ‘ought to do’, its targets to optimise for. for the traders or the engineers, it’s all about that coveted FAANG / jane street internship; for the entrepreneurs, that successful startup (or accelerator), for the researchers, the top-tier-conference first-author paper… the list goes on.
for a given scene, you can think of these as mapping out a plane of legibility in the space of things you could do with your life. s
Hey Ryan, nice post. Here are some thoughts.
Anti-correlated attributes: “Founder‑mode” is somewhat anti‑natural to “AI concern.” The cognitive style most attuned to AI catastrophic risk (skeptical, risk‑averse, theory-focused) is not the same style that woos VCs, launches companies, and ships MVPs. If we want AI safety founders, we need to counterweight the selection against risk-tolerant cognitive styles to prevent talent drift and attract more founder-types to AI safety.
I think AI safety founders should be risk-averse.
For-profit investors like risk-seeki...
I suspect that the undervaluing of field-building is downstream of EA overupdating on The Meta Trap (I appreciated points 1 & 5; point 2 probably looks worst in retrospect).
I don't know if founding is still undervalued - seems like there's a lot in the space these days.
"I confess that I don’t really understand this concern"
Have you heard of Eternal September? If a field/group/movement grows at less than a certain rate, then there's time for new folks to absorb the existing culture/knowledge/strategic takes and then pass it on to the folks after them. H...
TL;DR: In AI safety, we systematically undervalue founders and field‑builders relative to researchers and prolific writers. This status gradient pushes talented would‑be founders and amplifiers out of the ecosystem, slows the growth of research orgs and talent funnels, and bottlenecks our capacity to scale the AI safety field. We should deliberately raise the status of founders and field-builders and lower the friction for starting and scaling new AI safety orgs.
Epistemic status: A lot of hot takes with less substantiation than I'd like. Also, there is an obvious COI in that I am an AI safety org founder and field-builder.
Coauthored with ChatGPT.