Epistemic status: This model is mostly based on a few hours of dedicated thought, and the post was written in 30 min. Nevertheless, I think this model is probably worth considering.
Many people seem to be entering the AI safety ecosystem, acquiring a belief in short timelines and high P(doom), and immediately dropping everything to work on AI safety agendas that might pay off in short-timeline worlds. However, many of these people might not have a sufficient “toolbox” or research experience to have much marginal impact in short timelines worlds.
Rather than tell people what they should do on the object level, I sometimes tell them:
- Write out your credences for AGI being realized in 2027, 2032, and 2042;
- Write out your plans if you had 100% credence in each of 2027, 2032, and 2042;
- Write out your marginal impact in lowering P(doom) via each of those three plans;
- Work towards the plan that is the argmax of your marginal impact, weighted by your credence in the respective AGI timelines.
Some further considerations
- If you are risk averse over your marginal impact, you should maybe avoid a true argmax approach and instead choose a plan that pays out some marginal impact in the three timeline scenarios. For example, some shovel-ready, short-timeline AI safety research agendas may help prepare you for long-timeline AI safety research more than others. Consider blending elements of your plans in the three timeline scenarios (the "~" in "~argmax"). Perhaps you have side constraints on your minimal impact in the world where AGI is realized in 2027?
- Your immediate plans might be similar in some scenarios. If so, congratulations, you have an easier decision! However, I suspect most aspiring AI safety researchers without research experience should have different plans for different AGI timeline scenarios. For example, getting a Ph.D. in a top lab probably makes most people much better at some aspects of research and working in emerging tech probably makes most people much better at software engineering and operations.
- You should be wary of altering your timeline credences in an attempt to rationalize your preferred plan or highest-probability timeline scenario. However, don’t be afraid to update your credences over AGI timelines or your expected marginal impact in those worlds! Revisit your plan often and expect them to change (though hopefully not in predictable ways, as this would make you a bad Bayesian).
- Consider how the entire field of AI talent might change if everyone followed the argmax approach I laid out here. Are there any ways they might do something you think is predictably wrong? Does this change your plan?
- If you want to develop more finely-grained estimates over timelines (e.g., 2023, 2024, etc.) and your marginal impact in those worlds, feel free to. I prefer to keep the number of options manageable.
- Your marginal impact might also change with respect to the process by which AGI is created in different timeline worlds. For example, if AGI arrives in 2023, I imagine that the optimal mechanistic interpretability researcher might not have as high an impact as they would if AGI arrived some years later, when interpretability has potentially had time to scale.
I think this is true for some people, but I also think people tend to overestimate the amount of years it takes to have enough research experience to contribute.
I think a few people have been able to make useful contributions within their first year (though in fairness they generally had backgrounds in ML or AI, so they weren't starting completely from scratch), and several highly respected senior researchers have just a few years of research experience. (And they, on average, had less access to mentorship/infrastructure than today's folks).
I also think people often overestimate the amount of time it takes to become an expert in a specific area relevant to AI risk (like subtopics in compute governance, information security, etc.)
Finally, I think people should try to model community growth & neglectedness of AI risk in their estimates. Many people have gotten interested in AI safety in the last 1-3 years. I expect that many more will get interested in AI safety in the upcoming years. Being one researcher in a field of 300 seems more useful than being one researcher in a field of 1500.
With all that in mind, I really like this exercise, and I expect that I'll encourage people to do this in the future:
[Note: written on a phone, quite rambly and disorganized]
I broadly agree with the approach, some comments:
Hmm. Since most of my probability mass is in <5 years range, it seems this is just going to mislead people into not being at all helpful? Why not do this but for the years 2024, 2026, 2028? What makes you privilege the years you chose to mention?
These days have particular significance in my AGI timelines ranking and I think are a good default spread based on community opinion. However, there is no reason you shouldn't choose alternate years!
While teamwork seems to be assumed in the article, I believe it's worth spelling out explicitly that argmaxing for a plan with highest marginal impact might mean joining and/or building a team where the team effort will make the most impact, not optimizing for highest individual contribution.
Spending time to explain why a previous research failed might help 100 other groups to learn from our mistake, so it could be more impactful than pursuing the next shiny idea.
We don't want to optimize for the naive feeling of individual marginal impact, we want to keep in mind the actual goal is to make an Aligned AGI.
This seems basically reasonable, but as stated I think importantly misses that the plan you follow will change the accuracy of your estimates in steps (1) and (3) when you come to reassess. With 100% credence on some year, there's no value in picking a plan that gets you evidence about timelines, or evidence of your likely impact in scenarios you're assuming won't happen.
It's not enough to revisit the plan often if the plan you're following isn't giving you much new evidence.
Seems right. Explore vs. exploit is another useful frame.
Explore vs. exploit is a frame I naturally use (Though I do like your timeline-argmax frame, as well), where I ask myself "Roughly how many years should I feel comfortable exploring before I really need to be sitting down and attacking the hard problems directly somehow"?
Admittedly, this is confounded a bit by how exactly you're measuring it. If I have 15-year timelines for median AGI-that-can-kill-us (which is about right, for me) then I should be willing to spend 5-6 years exploring by the standard 1/e algorithm. But when did "exploring" start? Obviously I should count my last eight months of upskilling and research as part of the exploration process. But what about my pre-alignment software engineering experience? If so, that's now 4/19 years spent exploring, giving me about three left. If I count my CS degree as well, that's 8/23 and I should start exploiting in less than a year.
Another frame I like is "hill-climbing" - namely, take the opportunity that seems best at a given moment. Though it is worth asking what makes something the best opportunity if you're comparing, say, maximum impact now vs. maximum skill growth for impact later.