on the contrary, hiring only 1% of people doesn't mean you're getting the best, because the people who are good only apply to a few places and immediately get in, and the bad people apply over and over again everywhere
when you get those 200 resumes, and hire the best person from the top 200, does that mean you’re hiring the top 0.5%?
“Maybe.”
No. You’re not. Think about what happens to the other 199 that you didn’t hire.
They go look for another job.
Verifying solutions is time consuming enough that I don't think this really alleviates the mentorship bottleneck. And it's quite hard to specify research problems that both capture important things and are precise enough to be a good bounty. So I'm fairly pessimistic on this resolving the issue. I personally would expect more research to happen per unit of my time mentoring MATS than putting up and judging bounties
This really resonated with me. I am a student doing alignment research on a pretty nontraditional path. I work in mental health, and alongside that I run experiments on my own local hardware. I recently published my first paper on sparse autoencoder analysis of Anthropic’s deceptive AI model organism. I do not have prestigious affiliations or a PhD, just a lot of curiosity and a willingness to put in the hours.
Your diagnosis feels right. The real bottleneck does not seem to be talent, but the lack of infrastructure to find it and support it. When people who are genuinely capable are being turned away at rates above ninety eight percent, that starts to look less like selectivity and more like a coordination failure.
The bounty idea appeals to me because it sidesteps credential based gatekeeping. You do not need permission from an institution to try to do the work. You just do it, and the results speak for themselves. That said, I am less convinced by the prediction market style of verification. Citation counts tend to reward people who are already embedded in academic networks, and waiting a year for feedback is especially hard when you are trying to build momentum.
One piece I did not see discussed is the role AI tools themselves can play in easing the mentorship bottleneck. In my own work, collaborating with capable AI systems has helped fill some of the gaps you would normally expect a mentor to cover, especially around fast feedback, synthesizing literature, and iterating on experimental ideas. It is not a replacement for real mentorship, but it feels like a meaningful third path alongside bounties and expanded fellowship programs.
Thanks for writing this. It is an important conversation, and I am glad to see it happening!
I don't like to self-publicize, but I think you'd really resonate with a piece I wrote a while back, it went semi-viral and resulted in some very interesting discussion. It's about the systematic biases that expertise invokes, and what that's like as a novice: https://boydkane.com/essays/experts
I am less convinced by the prediction market style of verification
I'm also not super convinced, but I do think the problem of verifying solutions is a big one, so I wanted to put out some alternate answer out there.
the role AI tools themselves can play in easing the mentorship bottleneck
For guiding up-and-coming researchers I definitely agree that existing AIs can help, although I also feel that each person should find something that works for them.
For using AIs to review submissions, I'm not sure the AIs are good enough yet to do a full review, but maybe they can significantly reduce the number of low-effort papers that a researcher has to review. E.g. use an LLM to check for typos, style, prior work, whether the paper actually answers the question, etc.
This tweet recently highlighted two MATS mentors talking about the absurdly high qualifications of incoming applications to the AI Safety Fellowship:

Another was from a recently-announced Anthropic fellow, one of 32 fellows selected from over 2000 applications, giving an acceptance rate of less than 1.3%:

This is a problem: having hundreds of applications per position and having a lot of those applications be very talented individuals is not good, because the field ends up turning away and disincentivising people who are qualified enough to make significant contributions to AI safety research.
To be clear: I don’t think having a tiny acceptance rate on it’s own is a bad thing. Having <5% acceptance rate is good if <5% of your applicants are qualified for the position! I don’t think any of the fellowship programs should lower their bar just so more people can say they do AI safety research. The goal is to make progress, not to satisfy the egos of those involved.
But I do think a <5% acceptance rate is bad if >5% of your applications would be able to make meaningful progress in the position. This indicates the field is going slower than it otherwise could be, not because of a lack of people wanting to contribute, but because of a lack of ability to direct those people to where they can be effective.
Ryan Kidd has previously spoken about this, saying that the primary bottleneck in AI safety is the quantity of mentors/research programs, and calling for more research managers to increase the capacity of MATS, as well as more founders to start AI safety companies to make use of the talent.
I have a slightly different take: I’m not 100% convinced that doing more fellowships (where applicants get regular 1-on-1 time with mentors) can effectively scale to meet demand. People (both mentors and research managers) are the limiting factor here, and I think it’s worth exploring options where people are not the limiting factor. To be clear, I’m beyond ecstatic that these fellowships exist (and will be joining MATS 9 in January), but I believe we’re leaving talent on the table by not exploring the whole Pareto frontier: if we consider two dimensions, signal (how capable are alumni of this program at doing AI safety research) and throughput (how many alumni can this program produce per year), then we get a Pareto frontier of programs. Programs generally optimise for signal (MATS, Astra, directly applying to AI safety-focused research labs) or for throughput (bootcamps, online courses):

I think it would be worth exploring a different point on the Pareto curve:
I’m imagining a publicly-accessible website where:
This mechanism effectively moves the bottleneck away from the number of people (researchers, research managers) and towards the amount of capital available (through research funding, charity organisations). It would serve the secondary benefit of incentivising “future work” to be more organised, making it easier to understand where the frontier of knowledge is.
This mechanism creates a market of open research questions, effectively communicating which questions are likely to be worth sinking several months of work into. Speaking from personal experience, a major reason for me not investigating some questions on my own is the danger that these ideas might be dead-ends for reasons that I can’t see. I believe a clear signal of value would be useful in this regards; a budding researcher is more likely to investigate a question if they can see that Anthropic has put a $10k bounty on it. Even if the bounty is not very large, it still provides more signal than a “future work” section.
Since these research question would have been proposed by a researcher and then financially backed by some organisation, successfully investigating these questions would be a very strong signal if you are applying to work for that researcher or an affiliated organisation. In this way, research bounties could function similarly to the AI safety fellowships in providing a high-value signal of competence at researching valuable question, hopefully leading to more people working full-time in AI safety. In addition, research bounties could be significantly more parallel than existing fellowships.
Cyber security, the RL environments bounties from prime intellect, and tinygrad’s bounties are all good examples of using something more EMH-pilled to solve these sorts of distributed low-collaborationBy low-collaboration, I mean ~1 team/~1 person collaborating, as opposed to multiple teams or whole organisations collaborating together work. These bounty programs encourage more people to attempt to do the work, and then reward those who are effective. Additionally, the organising companies use these programs as a hiring funnel, sometimes requiring people to complete bounties in lieu of a more traditional interview process.
Research bounties are potentially a very scalable way to perform The Sort and find people from across the world who are able to make AI safety research breakthroughs. There are problems with research bounties, but there are problems with all options (fellowships, bootcamps, courses, etc) and the only valuable question to ask is whether the problems outweigh the benefits. I believe research bounties could fill a gap in the throughput-signal Pareto curve, and that this gap is worth filling.
Once a research question has been asked, a bounty supplied, and a candidate has submitted a research paper that they claim would answer the question, we are left with the problem of verifying their claim. This is an intrinsically hard problem, one which peer review would solve. One answer would be to ask the researcher who originally posed the question to review the paper, but this is susceptible to low-quality spam answers. The reviewers could get some percentage of the bounty, but that could lead to perverse incentives.
Another option to verify submissions might be to pose the research bounty in the form of a prediction market. For example, if you had the open research question
Does more Foo imply more Bar?
you could put up a prediction market for
A paper showing that ‘more Foo implies more Bar’ gets more than 20 citations one year after publication.
To incentivise someone to answer the research question, an organisation could bet NO for some cash amount, and the creators of the research paper could bet YES shortly before making their paper public, thereby claiming the “bounty”. This would increase the feedback time between someone publishing a paper and getting paid, but it should significantly reduce the chance of someone getting paid for sub-par work (if the citation requirement is raised high enough).
By low-collaboration, I mean ~1 team/~1 person collaborating, as opposed to multiple teams or whole organisations collaborating together.