AI safety undervalues founders

[-]habryka10d9552

Aren't the central example of founders in AI Safety the people who founded Anthropic, OpenAI and arguably Deepmind? Right after that Mechanize comes to mind.

I am not fully sure what you mean by founders, but it seems to me that the best organizations were founded by people who also wrote a lot, and generally developed a good model of the problems in parallel to running an organization. Even this isn't a great predictor. I don't really know what is. It seems like generally working in the space is just super high variance.

To be clear, overall I do think many more people should found organization, but the arguments in this post seem really quite weak. The issue is really not that otherwise we "can't scale the AI Safety field". If anything it goes the other way around! If you just want to scale the AI safety field, go work at one of the existing big organizations like Anthropic, or Deepmind, or Far Labs or whatever. They can consume tons of talent, and you can probably work with them on capturing more talent (of course, I think the consequences of doing so for many of those orgs would be quite bad, but you don't seem to think so).

Also, to expand some more on your coverage of coun... (read more)

[-]Ryan Kidd10d*283

Thanks for reading and replying! I'll be brief:

I consider the central examples of successful AI safety org founders to be Redwood, METR, Transluce, GovAI, Apollo, FAR AI, MIRI, LawZero, Pattern Labs, CAIS, Goodfire, Palisade, BlueDot, Constellation, MATS, Horizon, etc. Broader-focus orgs like 80,000 Hours, Lightcone, CEA and others have also had large impact. Apologies to all those I've missed!
I definitely think founders should workshop their ideas a lot, but this is not necessarily the same thing as publishing original research or writing on forums. Caveat: research org founders often should be leading research papers.
I don't think that a great founder will have more impact in scaling the AI safety research field by working at "Anthropic, GDM, or FAR Labs" relative to founding a new research org or training program.
Maybe I'm naive about how easy it is to adjust standards for grantmakers or training programs. My experience with MATS, LISA, and Manifund has involved a lot of selection and the bar at MATS has raised every program for 4 years now, but I don't feel a lot of pressure from rejected applicants to lower our standards. Maybe this will come with time? Or maybe it's an ecosys

... (read more)

[-]habryka10d4836

the bar at MATS has raised every program for 4 years now

What?! Something terrible must be going on in your mechanisms for evaluating people (which to be clear, isn't surprising, indeed, you are the central target of the optimization that is happening here, but like, to me it illustrates the risks here quite cleanly).

It is very very obvious to me that median MATS participant quality has gone down continuously for the last few cohorts. I thought this was somewhat clear to y'all and you thought it was worth the tradeoff of having bigger cohorts, but you thinking it has "gone up continuously" shows a huge disconnect.

Like, these days at the end of a MATS program half of the people couldn't really tell you why AI might be an existential risk at all. Their eyes glaze over when you try to talk about AI strategy. IDK, maybe these people are better ML researchers, but obviously they are worse contributors to the field than the people in the early cohorts.

Goodfire, AIUC, Lucid Computing, Transluce, Seismic, AVERI, Fathom

Yeah, I mean, I do think I am a lot more pessimistic about all of these. If you want we can make a bet on how well things have... (read more)

5peterbarnett8d

I feel actively excited about 2 of these, quite negative about 1 of them, and confused/neutral about the others.

2Ryan Kidd8d

Can you share which?

4Ryan Kidd10d

The MATS acceptance rate was 33% in Summer 2022 (the first program with open applications) and decreased to 4.3% (in terms of first-stage applicants; ~7% if you only count those who completed all stages) in Summer 2025. Similarly, our mentor acceptance rate decreased from 100% in Summer 2022 to 27% for the upcoming Winter 2026 Program. I don't have plots prepared, but measures of scholar technical ability (e.g., mentor ratings, placements, CodeSignal score) have consistently increased. I feel very confident that MATS is consistently improving in our ability to find, train, and place ML (and other) researchers in AI safety roles, predominantly as "Iterators". Also, while the fraction of the cohort that display strong "Connector" disposition seems to have decreased over time, I think that the raw number of strong Connectors has generally increased with program size due to our research diversity metric in mentor selection. I would argue that the phenomenon you are witnessing is an increasing pivot from more theoretical to empirical AI safety mentors and research agendas. Based on my personal experience, I think the claim "half of MATS couldn't tell you why AI might be an existential risk" is incorrect. I can't speak to how MATS scholars have engaged with you on AI strategy, but I would bet that the average MATS scholar today spends a lot more time on ML experiments than reading AI safety strategy docs compared to three years ago. To be clear, I think this is a good thing! I respect your disagreement here. MATS has tried to run AI safety strategy workshops and reading groups many times in the past, but this has generally had low engagement relative to our seminar series (which features some prominent AI safety strategists anyways). If you have great ideas for how to better structure strategy workshops or generate interest, I would love to hear! (We are currently brainstorming this.)

[-]habryka10d4129

The MATS acceptance rate was 33% in Summer 2022 (the first program with open applications) and decreased to 4.3% (in terms of first-stage applicants; ~7% if you only count those who completed all stages) in Summer 2025. Similarly, our mentor acceptance rate decreased from 100% in Summer 2022 to 27% for the upcoming Winter 2026 Program.

I mean, in as much as one is worried about Goodhart's law, and the issue in contention is adversarial selection, then the acceptance rate going down over time is kind of the premise of the conversation. Like, it would be evidence against my model of the situation if the acceptance rate had been going up (since that would imply MATS is facing less adversarial pressure over time).

I don't have plots prepared, but measures of scholar technical ability (e.g., mentor ratings, placements, CodeSignal score) have consistently increased. I feel very confident that MATS is consistently improving in our ability to find, train, and place ML (and other) researchers in AI safety roles, predominantly as "Iterators".

Mentor ratings is the most interesting category to me. As you can imagine I don't care much for ML skill at the margin. CodeSignal is a bit interesting th... (read more)

3Ryan Kidd9d

In regards to adversarial selection, we can compare MATS to SPAR. SPAR accepted ~300 applicants in their latest batch, ~3x MATS (it's easier to scale if you're remote, don't offer stipends, and allow part-timers). I would bet that the average research impact of SPAR participants is significantly lower than that of MATS, though there might be plenty of confounders here. It might be worth doing a longitudinal study here comparing various training programs' outcomes over time, including PIBBSS, ERA, etc. I think your read of the situation re. mentor ratings is basically correct: increasingly many MATS mentors primarily care about research execution ability (generally ML), not AI safety strategy knowledge. I see this as a feature, not a bug, but I understand why you disagree. I think you are prioritizing a different skillset than most mentors that our mentor selection committee rates highly. Interestingly, most of the technical mentors that you rate highly seem to primarily care about object-level research ability and think that strategy/research taste can be learned on the job! Note that I think the pendulum might start to swing back towards mentors valuing high-level AI safety strategy knowledge as the Iterator archetype is increasingly replaced/supplemented by AI. The Amplifier archetype seems increasingly in-demand as orgs scale, and we might see a surge in Connectors as AI agents improve to the point that their theoretical ideas are more testable. Also note that we might have different opinions on the optimal ratio of "visionaries" vs. "experimenters" in an emerging research field.

5habryka9d

I mean, sure? I am not saying your selection is worse than useless and it would be better for you to literally accept all of them, that would clearly also be bad for MATS. I mean, there are obvious coordination problems here. In as much as someone is modeling MATS as a hiring pipeline, and not necessarily the one most likely to produce executive-level talent, you will have huge amounts of pressure to produce line-worker talent. This doesn't mean the ecosystem doesn't need executive-level talent (indeed, this post is partially about how we need more), but of course large scaling organizations create more pressure for line-working talent. Two other issues with this paragraph: * Yes, I don't think strategic judgement generally commutes. Most MATS mentors who I think are doing good research don't necessarily themselves know what's most important for the field. * I agree with the purported opinion that strategy/research taste can often be learned on the job. But I do feel very doomy about recruiting people who don't seem to care deeply about x-risk. I would be kind of surprised if the mentors I am most excited about don't have the same opinion, but it would be an interesting update if so! I don't particularly think these "archetypes" are real or track much of the important dimensions, so I am not really sure what you are saying here.

[-]Richard_Ngo9d187

A few quick comments, on the same theme as but mostly unrelated to the exchange so far:

I'm not very sold on "cares about xrisk" as a key metric for technical researchers. I am more interested in people who want to very deeply understand how intelligence works (whether abstractly or in neural networks in particular). I think the former is sometimes a good proxy for the latter but it's important not to conflate them. See this post for more.
Having said that, I don't get much of a sense that many MATS scholars want to deeply understand how intelligence works. When I walked around the poster showcase at the most recent iteration of MATS, a large majority of the projects seemed like they'd prioritized pretty "shallow" investigations. Obviously it's hard to complete deep scientific work in three months but at least on a quick skim I didn't see many projects that seemed like they were even heading in that direction. (I'd cite Tom Ringstrom as one example of a MATS scholar who was trying to do deep and rigorous work, though I also think that his core assumptions are wrong.)
As one characterization of an alternative approach: my intership with Owain Evans back in 2017 consisted of me basicall

... (read more)

3habryka9d

Hmm, I was referring here to "who I would want to hire at Lightcone" (and similarly, who I expect other mentors would be interested in hiring for their orgs) where I do think I would want to hire people who are on board with that organizational mission. At the field level, I think we probably still have some disagreement about how valuable people caring about the AI X-risk case is, but I feel a lot less strongly about it, and think I could end up pretty excited about a MATS-like program that is more oriented around doing ambitious understanding of the nature of intelligence.

4Ryan Kidd4d

Sounds like PIBBSS/PrincInt!

1Priyanka Bharadwaj8d

As an atypical applicant to MATS (no PhD, no coding/ technical skills, not early career, new to AI), I found it incredibly difficult to find mentors who were looking to hold space for just thinking about intelligence. I'd have loved to apply to a stream that involved just thinking, writing, being challenged and repeating until I'd a thesis worth pursuing. To me, it seemed more like most mentors were looking to test very specific hypothesis, and maybe it's for all the reasons you've stated above. But for someone new and inexperienced, I felt pretty unsure about applying at all.

4Elizabeth10d

This is not counter-evidence to the accusation that scholar quality has been going downhill unless you add in several other assumptions.

2Ryan Kidd10d

It's not supposed to be counter-evidence in its own right. I like to present the full picture.

2Chris_Leong9d

"To be clear, I think this is a good thing! I respect your disagreement here. MATS has tried to run AI safety strategy workshops and reading groups many times in the past, but this has generally had low engagement relative to our seminar series" I suspect that achieving high-engagement will be hard because fellows have to compete for extension funding.

4Ryan Kidd8d

True, but we accepted 75% of all scholars into the 6-month extension last program, so the pressure might not be that large now.

2Chris_Leong8d

What percentage applied?

[-]Simon Lermen9d322

I might have a special view here since I did MATS 4.0 and 8.0.

I think I met some excellent people at MATS 8.0 but would not say they are stronger than 4.0, my guess is that quality went down slightly. I remember in 4.0 a few people that impressed me quite a lot, which I saw less in 8.0. (4.0 had more very incompetent people though)

at the end of a MATS program half of the people couldn't really tell you why AI might be an existential risk at all.

I think this is sadly somewhat true, I talked with some people in 8.0 who didn't seem to have any particular concern with AI existential risk or seemingly never really thought about that. However, I think most people were in fact very concerned about AI existential risk. I ran a poll at some point about Eliezer's new book and a significant minority of students seemed to have pre-ordered Eleizer's book, which I guess is a pretty good proxy for whether someone is seriously engaging with AI X-risk.

My guess is that the recruitment process might need another variable to measure rather than academics/coding/ml experience. The kind of thing that Tim Hua (8.0 scholar) has who created an AI psychosis bench.

Also it seems to me that if you build ... (read more)

6Michaël Trazzi7d

My guess at what's happening here: for the first iterations of MATS (think MATS 2.0 at the Lightcone WeWork) you would have folks who were already into AI Safety for quite a long time and were interested in doing some form of internship-like thing for a summer. But as you run more cohorts (and make the cohorts bigger) then the density of people who have been interested in safety for a long time naturally decreases (because all the people who were interested in safety for years already applied to previous iterations).

[-]Fabien Roger10d*105

(Derailing, What I am saying here is not central to the argument you are making here)

just end up with someone making a bunch of vaguely safety-adjacent RL environments that get sold to big labs

While I think building safety-adjacent RL envs is worse than most kinds of technical safety work for people who are very high context in AGI safety, I think it's net positive.

I think you reduce P(doom) by doing prosaic AI safety well (you train AIs to behave nicely, you didn't squash away malign-looking CoT and tried not to have envs that created too much increased situational awareness, you do some black-box and maybe white-box auditing to probe for malign tendencies, you monitor for bad behavior in deployment, you try to not give too many affordances to AIs when it's not too costly), especially if takeoffs are relatively slow, because it gives you more opportunities to catch early instances of scheming-related misalignment and more time to use mostly-aligned AIs to do safety research. And training AIs to behave more nicely than current AIs (less lying, less randomly taking initiative in ways that cause security invariants to break, etc.) is important because:

it reduces AI plausible deniabil

... (read more)

6habryka9d

I think it's a pretty high-variance activity! It's not that I can't imagine any kind of RL environment that might make things better, but most of them will just be used to make AIs "more helpful" and serve as generic training data to ascend the capabilities frontier. Like, yes, there are some more interesting monitor-shaped RL environments, and I would actually be interested in digging into the details of how good or bad some of them would be, but the thing I am expecting here are more like "oh, we made a Wikipedia navigation environment, which reduces hallucinations in AI, which is totally helpful for safety I promise", when really, I think that is just a straightforward capabilities push.

4jacquesthibs9d

As part of my startup exploration, I would like to discuss this as well. It would be helpful to clarify my thinking on whether there's a shape of such a business that could be meaningfully positive. I've started reaching out to people who work in the labs to get better context on this. I think it would be good to dig deeper into Evan's comment on the topic. I'm going to start a Google Doc, but I would love to talk in person with folks in the Bay about this to ideate and refine it faster.

3Trinley Goldenberg10d

This is consistent with founders being undervalued in AI safety relative to AI capabilities. My model of Elon for instance says that a big reason towards pivoting hard towards capabilities was that all the capabilities founders were receiving more status than the safety founders.

2Vadim Fomin10d

Sorry, I know this is tangential, but I'm curious — is it based on it being less psychosis-inducing in this investigation or are there more data points / is it known to be otherwise more aligned as well?

1peterr10d

What do you think are ways to identify good strategic takes? This is something that seems rather fuzzy to me. It's not clear how people are judging criteria like this or what they think is needed to improve on this.

[-]Jeremy Gillen10d*4748

I want to register disagreement. Multiplier effects are difficult to get and easy to overestimate. It's very difficult to get other people working on the right problem, rather than slipping off and working on an easier but ultimately useless problem. From my perspective, it looks like MATS fell into this exact trap. MATS has kicked out ~all the mentors who were focused on real problems (in technical alignment) and has a large stack of new mentors working on useless but easy problems.

[Edit 5hrs later: I think this has too much karma because it's political and aggressive. It's a very low effort criticism without argument.]

[-]Ryan Kidd10d100

To clarify, by "kicking out" Jeremy is referring to two mentors in particular, both of whom got a lot of support from PIBBSS and one of whom seemed to want more of an engineering assistant than a research scholar. I think both do important research and it was a tough decision, informed by our mentor selection committee, which included experts in their field, and past scholar feedback. I offered help hiring to both, including our alumni hiring database.

4Ryan Kidd10d

Re. "useless but easy problems", we agree to disagree. Mentor selection at MATS is very hard, so we defer a lot to a committee of experts. Admittedly, choosing this committee necessarily entails some bias. I'd be interested if anyone wants to DM me nominations!

[-]Jeremy Gillen9d197

(Ryan is correct about what I'm referring to, and I don't know any details).

I want to say publicly, since my comment above is a bit cruel in singling out MATS specifically: I think MATS is the most impressively well-run organisation that I've encountered, and overall supports good research. Ryan has engaged at length with my criticisms (both now and when I've raised them before), as have others on the MATS team, and I appreciate this a lot.

Ultimately most of our disagreements are about things that I think a majority of "the alignment field" is getting wrong. I think most people don't consider it Ryan's responsibility to do better at research prioritization than the field as a whole. But I do. It's easy to shirk responsibility by deferring to committees, so I don't consider that a good excuse.

A good excuse is defending the object-level research prioritization decisions, which Ryan and other MATS employees happily do. I appreciate them for this, and we agree to disagree for now.

Tying back to the OP, I maintain that multiplier effects are often overrated because of people "slipping off the real problem" and this is a particularly large problem with founders of new orgs.

[-]Neel Nanda10d4527

I think that being a good founder in AI safety is very hard, and generally only recommend doing it after having some experience in the field - this strongly applies to research orgs, but also to eg field building. If you're founding something, you need to constantly make judgements about what is best, and don't really have mentors to defer to, unlike many entry level safety roles, and often won't get clear feedback from reality if you get them wrong. And these are very hard questions, and if you don't get them right, there's a good chance your org is mediocre. I think this applies even to orgs within an existing research agenda (most attempts to found mech interp orgs seem doomed to me). Field building is a bit less dicey, but even then, you want strong community connections and a sense for what will and will not work.

I'm very excited for there to be more good founders in AI Safety, but don't think loudly signal boosting this to junior people is a good way to achieve this. And imo "founding an org" is already pretty high status, at least if you're perceived to have some momentum behind you?

I'm also fine with people without a lot of AI safety expertise partnering with those who do have it as co founders, but I struggle to think of orgs that I think have gone well who didn't have at least one highly experienced and competent co-founder

[-]Lukas Finnveden10d106

Did apollo have anyone you’d consider highly experienced when first starting out?

2Ryan Kidd10d

I'd say Chris Akin (COO) was highly experienced, and he joined shortly after inception.

4Lukas Finnveden10d

Neel was talking about AI safety expertise and experience in the AI safety field. I can’t see that Chris had any such experience on his linked-in.

3Ryan Kidd9d

Of note: when I first approached you about becoming a MATS mentor, I don't think you had significant field-building or mentorship experience and had relatively few papers. Since then, you have become one of the most impactful field-builders, mentors, and researchers in AI safety, by my estimation! This is a bet I would take again.

2Ryan Kidd10d

I think that founding, like research, is best learned by doing. Building a research org definitely benefits from having great research takes; this unlocks funding, inspires talent, and creates better products (i.e., impactful research). However, I believe: * Not every great researcher would be a great founder. * Some researchers who could be great founders with practice are unnecessarily discouraged from trying. * There are many ways to aid AI safety as a founder that do not require research skills (e.g., field-building, advocacy, product development). I wasn't primarily trying to signal boost this to "junior" people and I think pairing strong ops and technical talent is a good way to start many orgs (though everyone typically contributes to everything in a small startup).

3Ryan Kidd10d

I think you are probably unusually good at spotting which mech interp orgs are doomed ex ante, but you aren't infallible. And I think a situation where many small startups are being founded, even if most will be doomed, is what a functional startup ecosystem looks like! We don't want people working on obviously bad ideas, but I naively expect the process of startup ideation and experimentation, aided by VC money, to yield good mech interp directions.

[-]Nina Panickssery10d1114

I naively expect the process of startup ideation and experimentation, aided by VC money

It's very difficult to come with AI safety startup ideas that are VC-fundable. This seems like a recipe for coming up with nice-sounding but ultimately useless ideas, or wasting a lot of effort on stuff that looks good to VCs but doesn't advance AI safety in any way.

3Ryan Kidd10d

Maybe so! I don't think Eric Ho's ideas are terrible and I've seen for-profit AI safety startups that I like (e.g., Goodfire) and that I don't like (e.g., Softmax, probably).

2Nina Panickssery10d

I disagree with this frame. Founders should deeply understand the area they are founding an organization to deal with. It's not enough to be "good at founding".

2Ryan Kidd10d

I completely agree with you! Where did you think I implied the opposite?

2Nina Panickssery10d

My bad, I read you as disagreeing with Neel's point that it's good to gain experience in the field or otherwise become very competent at the type of thing your org is tackling before founding an AI safety org. That is, I read "I think that founding, like research, is best learned by doing" as "go straight into founding and learn as you go along".

2Ryan Kidd10d

No worries! I think research startups should be founded by strong researchers. But there are lots of potentially impactful startups (field-building, advocacy, product, etc.) that don't require founders with research skills, and these might be best served by learning on the job?

2Nina Panickssery10d

I think those other types of startups also benefit from expertise and deep understanding of the relevant topics (for example, for advocacy, what are you advocating for and why, how well do you understand the surrounding arguments and thinking...). You don't want someone who doesn't understand the "field" working on "field-building".

2Ryan Kidd10d

You're probably right that the best startups come from people who have great experience in the thing, but plenty of profitable startups get founded by kids out of college. The risk/reward tradeoff is probably different in tech. I think the best AI safety field-building startups were founded/scaled by people with experience in fieldbuilding (e.g., my experience with an EA UQ, Dewi's experience with EA Cambridge, Agus' experience with CEA, etc.), but the bar might be surprisingly low.

[-]Adam Scholl10d3719

Great founders and field-builders have multiplier effects on recruiting, training, and deploying talent to work on AI safety [...] If we want to 10-100x the AI safety field in the next 8 years, we need multiplicative capacity, not just marginal hires

I spent much of 2018-2020 trying to help MIRI with recruiting at AIRCS workshops. At the time, I think AIRCS workshops and 80k were probably the most similar things the field had to MATS, and I decided to help with them largely because I was excited about the possibility of multiplier effects like these.

The single most obvious effect I had on a participant—i.e., where at the beginning of our conversations they seemed quite uninterested in working on AI safety, but by the end reported deciding to—was that a few months later they quit their (non-ML) job to work on capabilities at OpenAI, which they have been doing ever since.

Multiplier effects are real, and can be great; I think AIRCS probably had helpful multiplier effects too, and I'd guess the workshops were net positive overall. But much as pharmaceuticals often have paradoxical effect—i.e., to impact the intended system in roughly the intended way, except with the sign of the key eff... (read more)

[-]Alexander Gietelink Oldenziel10d2824

I like the phrase "paradoxical impact".

I feel considerations around paradoxical impact are a big part of my wworld model and I wowould like to see more discussion about it

6Richard_Ngo9d

See my post on pessimization.

4Ryan Kidd10d

I'm sorry to hear about your paradoxical impact; this sounds tough and it's a fear I share. I feel a bit better about MATS' impact because very few of our alumni work on AI capabilities at frontier labs (~2% by my estimation) and very few work at OpenAI altogether, but I can understand if you feel that the 22% working on AI safety at for-profit companies are primarily doing "safetywashing" or something (on net I disagree, but it's a valid concern). I think there is something for me to learn from your experience: at the time MIRI was running AIRCS, OpenAI was not an AI safety pariah; it's possible that some of the companies that MATS alums join now will become pariahs in future, revealing paradoxical impact. I'm not sure what to do about this other than encourage people to be intentional with their careers, question assumptions, and "don't do evil" (the MATS values are impact first, scout mindset, reasoning transparency, and servant leadership). I think that AI safety has to scale to have a chance at solving alignment in time; this means that some people will end up working on counter-productive things. I can understand if your risk tolerance is different than mine, or you are more skeptical about the impact of MATS or the founders who might be inspired by my post.

[-]Adam Scholl9d207

I do think I'd feel very alarmed by the 27% figure in your position—much more alarmed than e.g. I am about what happened with AIRCS, which seems to me to have failed more in the direction of low than actively bad impact—but to be clear I didn't really mean to express a claim here about the overall sign of MATS; I know little about the program.

Rather, my point is just that multiplier effects are scary for much the same reason they are exciting—they are in effect low-information, high-leverage bets. Sometimes single conversations can change the course of highly effective people's whole careers, which is wild; I think it's easy to underestimate how valuable this can be. But I think it's similarly easy to underestimate their risk, given that the source of this leverage—that you're investing relatively little time getting to know them, etc, relative to the time they'll spend doing... something as a result—also means you have unusually limited visibility into what the effects will be.

Given this, I think it's worth taking unusual care, when pursuing multiplier effect strategies, to model the overall relative symmetry of available risks/rewards in the domain. For example, whether A) there might be lemons market problems, such that those who are easiest to influence (especially quickly) might tend all else equal to be more strategically confused/confusable, or B) whether there might in fact currently be more easy ways to make AI risk worse than better, etc.

9Ryan Kidd9d

Edit: I mistakenly said "27% at frontier labs" when I should have said "27% at for-profit companies". Also, note that this is 27% of those working on AI safety (80%), so 22% of all alumni.

4Adele Lopez10d

It is hard to predict this, but I think we could have done better (and can do better in the future still).

[-]Adam Scholl10d132

That may be, but personally I am unpersuaded that the observed paradoxical impacts should update us that the world would have been better off if we hadn't made the problem known, since I roughly can't imagine worlds where we do survive where the problem wasn't made known, and I think it should be pretty expected with a problem this confusing that initially people will have little idea how to help, and so many initial attempts won't. In my imagination, at least, basically all surviving worlds look like that at first, but then eventually people who were persuaded to worry about the problem do figure out how to solve it.

(Maybe this isn't what you mean exactly, and there are ways we could have made the problem known that seemed less like "freaking out"? But to me this seems hard to achieve, when the problem in question is the plausibly relatively imminent death of everyone).

5MichaelDickens10d

My question is, how do you make AI risk known while minimizing the risk of paradoxical impacts? "Never talk about it" is the wrong answer, but I expect there's a way to do better than we've done so far. This seems like an important thing to try to understand.

[-]Jan_Kulveit10d2913

I don't think this captures the counterarguments well. So here is one

You can imagine a spectrum of funders where on one hand, you have people who understand themselves as funders and want to be marshaling an army to solve AI alignment. On the other side, you have basically researchers who see work that should be done, don't have capacity to do the work themselves, and this leads them to create teams and orgs - "reluctant founders".

It's reasonable to be skeptical about what the "funder type" end of the spectrum will do.

In normal startups, the ultimate feedback loop is provided by the market. In AI safety nonprofits, the main feedback loops are provided by funders, AGI labs, and Bay Area prestige gradients.
Bay Area prestige gradients are to a large extent captured by AGI labs - the majority of quality-weighted "AI safety" already works there, the work is "obviously impactful", you are close to the game, etc. also normal ML people also want to work there.
If someone wants to scale a lot, "funders" means mostly OpenPhil - no other source would fund the army. The dominant OpenPhil worldview is closely related to Anthropic - for example, until recently you have hea... (read more)

3Ryan Kidd10d

I like this comment. I think it's easy to overfit on the most salient research agendas, especially if there are echo chambers and tight coupling between highly paid frontier AI staff and nonprofit funders. The best way I know to combat this at MATS is: * Maintain a broad church of AI safety research, including deliberately making mentor "diversity picks" and choosing a mentor selection committee that contains divergent thinkers. As another example, I think Constellation has done a good job recently at expanding member diversity and reducing echo chambers. * Requiring that COIs be declared and mitigated, including along reporting chains, at the same organization, with romantic/sexual partners, and with frequent research collaborators. * Encouraging "scout mindset" and "reasoning transparency", especially among people with divergent beliefs. I think this is a large strength of MATS: we are a melting pot for ideas and biases. Note that I expect overfitting to decrease with further scale and diversity, given the above practices are adhered to!

[-]L Rudolf L10d2016

I agree the AI safety field in general vastly undervalues building things, especially compared to winning intellectual status ladders (e.g. LessWrong posting, passing the Anthropic recruiting funnel, etc.).

However, as I've written before:

[...] the real value of doing things that are startup-like comes from [...] creating new things, rather than scaling existing things [...]

If you want to do interpretability research in the standard paradigm, Goodfire exists. If you want to do evals, METR exists. Now, new types of evals are valuable (e.g. Andon Labs & vending bench). And maybe there's some interp paradigm that offers a breakthrough.

But why found? Because there is a problem where everyone else is dropping the ball, so there is no existing machine where you can turn the crank and get results towards that problem.

Now of course I have my opinions on where exactly everyone else is dropping the ball. But no doubt there are other things as well.

To pick up the balls, you don't start the 5th evals company or the 4th interp lab. My worry is that that's what all the steps listed in "How to be a founder" point towards. Incubators, circulating pitches, asking for feedback on ideas, applying ... (read more)

3Ryan Kidd10d

I definitely think marginal founders should focus on low-hanging fruit for impact. Do you have a list of potential startup ideas you like? I have a different opinion about the utility of red teaming pitches/ToCs; based on experience, I think this can help spot blindspots in the ecosystem! I also think many AI safety founders, funders etc. are walking around with a long list of things they want someone to build; I have one, at least, and I've read a few. I'm also not so sure that another evals or auditing company would be bad. There are only 3-4 decent-sized AI safety evals orgs! That's a small number of people to analyze large, ever-changing models with vast threat surfaces. There's plenty of room for differentiation and specialization (e.g., biorisk, cyber-risk, AI control evals, AI elicitation evals, human manipulation risk, bio R&D capabilities, AI coordination risk, etc.). Maybe this is irrelevant, but I'd be surprised if a tech founder was deterred from founding a startup because a similar startup already exists, if there was high demand. In some cases, I might be concerned (e.g., regulatory capture of token government auditors), but I'm not concerned by doubling of Apollo, Goodfire, METR, Transluce, MATS, etc. Competition can be good! Maybe not as good as filling a gap, but it doesn't seem net harmful to have more orgs working on the same problem; there's plenty of funding, space to differentiate, and problems to work on!

2Cleo Nardo6d

for what it's worth, I think Goodfire is taking a non-standard approach to interpretability research -- more so than (e.g.) Transluce. (I'm not claiming that the non-standard approach is better than the standard one.)

[-]ReluctantRaccoon10d171

This is my first Lesswrong comment - any feedback appreciated.

My quick takes (with a similar conflict: I'm doing AIS field-building).

I am inclined to agree with ~everything in this post.
I think the status dynamics are hard to overstate.
1. I know quite a few very competent builders / 'doers' who have bounced off EA/AIS.
  1. And part of this is about the elevated status given to researchers, especially in contrast with the way 'operations' people (a catch-all used to encompass a large fraction of everything else) are treated.
2. The response I often hear, explicitly or in the undertones, is: 'But if they were really committed, they'd just do the thing that needs to be done.' So they are expected to ignore the status gradients and e.g. build a nonprofit that is illegible to those outside EA/AIS or a for-profit with a harder path to profitability.
3. Meanwhile, these communities are usually excited about people doing relevant-seeming AIS research, even when they might be doing so because it's interesting or high-status, rather than because it's the thing to do that has highest impact or is most important to AIS.
4. I think (c) is usually good - pain is not the unit of effort and people are usually more pr

... (read more)

4Ryan Kidd10d

Great comment! To your point about shifting incentives of researchers to be founders, about 10% of MATS alums have founded something and about the only thing I did was give a lightning talk every program and tell interested founders to chat with me. I think founders tend to self-select once you make the option clear, which is part of my intent with this post. Note that I'm not trying to claim credit for all the founders who came through MATS; I expect most were already interested in founding things. Also, note that Catalyze Impact (and maybe other incubators) has received tons of applications from researchers. I agree that founder skills and researcher skills are not the same thing, but research orgs tend to be led by researchers. Even large research nonprofits like RAND, AI2, ATI, SFI have leaders who spent some time in research roles, though usually not for most of their careers.

[-]Buck10d70

I don't think I quite understand the distinction you are trying to draw between "founders" and (not a literal quote) "people who do object-level work and make intellectual contributions by writing".

If you're the CEO of a company, it's your job to understand the space your company works in and develop extremely good takes about where the field is going and what your company should do, and use your expertise in leveraged ways to make the company go better.

In the context of AI safety, the key product that organizations are trying to produce is often itself re... (read more)

2Ryan Kidd10d

I think you're a great example of a successful founder who is also a prolific researcher and writer. I wish I had your capacity for the last two; you've been high impact in all three channels! I think you're right in that research startups should generally be led by researchers, and good researchers track the field closely and ideally publish. I think at some size of organization, this becomes much harder, but I don't want to deter it! If Elon wants to go deep on his rockets, this seems good, even if he's an outlier CEO. I was trying to say two somewhat related things in this article: 1. The status gradients strongly favor "become a researcher" over "become a founder", which means we have less founders than ideal and our successful founders tend to follow the "lab PI" archetype, for better or worse. 2. Implied: there is plenty of value that founders in non-research roles can have (field-building, advocacy, product development, etc.) and this is systematically undervalued relative to the impact, which discourages people from trying.

2Buck10d

For your point 2, are you thinking about founders in organizations that have theories of change other than doing research? Or are you thinking of founders at research orgs?

3Ryan Kidd10d

The former. Even large research nonprofits (e.g., RAND, AI2, ATI, SFI) tend to be led by people with research experience, though they probably do a lot less research than CEOs at small research orgs.

[-]jacobhaimes9d62

I totally agree with the sentiment here!

As both a researcher, founder, and early employee of multiple non-profits around this space, I think it's critical to start building out the infrastructure to leverage talent and enable safety work. Right now, there isn't much to support people making their own opportunities, not to mention that doing so necessarily requires a more stable financial situation than is possible for many individuals.

One of my core goals starting Kairos.fm was to help others who are wanting to start their own projects (e.g. podcasts), and... (read more)

[-]Sheikh Abdur Raheem Ali9d30

escaping flatland: career advice for CS undergrads

one way to characterise a scene is by what it cares about: its markers of prestige, things you ‘ought to do’, its targets to optimise for. for the traders or the engineers, it’s all about that coveted FAANG / jane street internship; for the entrepreneurs, that successful startup (or accelerator), for the researchers, the top-tier-conference first-author paper… the list goes on.
for a given scene, you can think of these as mapping out a plane of legibility in the space of things you could do with your life. s

... (read more)

[-]Cleo Nardo6d20

Hey Ryan, nice post. Here are some thoughts.

Anti-correlated attributes: “Founder‑mode” is somewhat anti‑natural to “AI concern.” The cognitive style most attuned to AI catastrophic risk (skeptical, risk‑averse, theory-focused) is not the same style that woos VCs, launches companies, and ships MVPs. If we want AI safety founders, we need to counterweight the selection against risk-tolerant cognitive styles to prevent talent drift and attract more founder-types to AI safety.

I think AI safety founders should be risk-averse.

For-profit investors like risk-seeki... (read more)

2Ryan Kidd6d

Cheers, Cleo! * Even if AI safety founders should be risk averse, I think we should do better at supporting the relatively few competent founder-types who are deeply interested in AI safety. * I suspect that we disagree significantly on the potential downside risk of most AI safety startups. I think it's relatively hard to have a significant negative impact, particularly one that outweighs the expected benefits, given how much optimization pressure is being applied to advancing AI capabilities across the economy. Creating a new frontier AI company (e.g., Mistral-sized) or a toxic advocacy org would be notable exception. Maybe Mechanize and Calaveras are exceptions too? * Note that at least Anthropic has a hard time finding talent that is also mission-aligned, which they prefer, particularly for safety teams.

[-]Chris_Leong9d*00

I suspect that the undervaluing of field-building is downstream of EA overupdating on The Meta Trap (I appreciated points 1 & 5; point 2 probably looks worst in retrospect).

I don't know if founding is still undervalued - seems like there's a lot in the space these days.

"I confess that I don’t really understand this concern"

Have you heard of Eternal September? If a field/group/movement grows at less than a certain rate, then there's time for new folks to absorb the existing culture/knowledge/strategic takes and then pass it on to the folks after them. H... (read more)

[+][comment deleted]9d132

LESSWRONG
LW

LESSWRONG
LW

106

AI safety undervalues founders

106

106

Why boost AI safety founders?

How did we get here?

Potential counter-arguments

What should we do?

How to become a founder