"Path to Victory"

Chris_Leong

This article is based on reflections from co-leading the Sydney AI Safety Fellowship. This post is primarily focused on AI safety, but I expect the lessons to be more generally applicable.

We have a Problem™ 😱—actually several problems; okay even more problems.

Unfortunately, the edge cases make it hard to write down a shared problem statement^[1], but we just need to get on with it regardless, since The Universe Doesn't Have to Play Nice. Consensus on exact problem bounds would likely be net negative anyway, as it would make it much easier to completely neglect at least one value or threat that we would later regret not dealing with^[2].

So let's just assume that we have our problem statement and that it's close enough to other people's problem statement that a community forms around it. This community consists of people with a variety of different skills, temperaments and degrees of commitment. This naturally leads to the question of what aspects different members of the community should be working on. Notably, we need to avoid unrealistic assumptions about there being a centralised authority to draw up a plan and divide tasks. It's not going to happen, and even if centralisation were possible, it would likely just make things worse^[3].

Instead, we need to tackle the problem in a more decentralised manner. But how can we avoid dropping a ball we can't afford to drop without centralised co-ordination?

Well, whatever we do, we need to take into account that there is variation in commitment and ability. Some people are willing and able to take Heroic Responsibility^[4], using risky techniques like Shut up and do the impossible!—others are not. Indeed, I suspect that very few people (or groups) will be capable of taking heroic responsibility for the whole problem. I was persuaded of this by an X^[5] by Richard Ngo:

This resonates strongly with me, though I'd frame it in terms of taking heroic responsibility for short timelines. I expect working on short timelines to be much less intense if you're only biting off a small chunk of the problem, so I predict a wider range of people could work on this than Richard expects^[6].

In any case, this is where we are: we have a small number of people who can take heroic responsibility for the whole problem and a much larger number of people who can't^[7]. We only have a limited number of Heroes, I don't see things going well if we adopt a meta-strategy that is purely dependent on them carrying all of the weight.

The people who can't take heroic responsibility should primarily focus, focus, focus and pick one thing they can do well^[8]. I resonate strongly with how EA thinks about prioritisation, but I differ in that I think more in terms of systems^[9] and less in terms of direct, measurable impact. To be specific, I tend to think more in terms of interventions as building blocks (such as gathering resources or discovering information) that others can attempt to build on top of^[10]—be it incrementalists laying one more brick or Heroes™ shaping it into a working plan^[11].

I feel that there's a broad understanding that many of EA's old mental models of how to think about impact don't really carry over from EA's global health beginnings to the AI safety context, but we never really developed proper replacements^[12]. I think it's important to understand that there are different kinds of domains. Universal tools or mental models would be ideal, but this is extremely hard, perhaps even impossible. Producing tools to solve the problem in front of you feels much more viable. What I've described above fills in some of the blanks, but it needs to be developed in more detail.

I think it's worth stepping back and asking what would need to happen for this plan to succeed:

Firstly, we need more people to ask themselves whether they'd be willing to step up to take heroic responsibility. I suspect very few people have deeply grappled with this question. For most people, grappling with it would mean confronting the possibility that things could be quite dire. Why else would you choose to suffer that much? Confronting catastrophic—let alone existential—risk is hard. Sure, we talk about it all the time, but mostly just on an intellectual level. Confronting this emotionally is truly something else. Secondly, if you're hitting home runs with what you're doing already and everyone is praising you for it, it would be exceptionally hard to throw that away and pursue something else where you'd be more likely to fail than not.
Second, we need people to be honest with themselves about whether they are truly capable of taking on that much responsibility. I unfortunately am not. At times I've told myself that I was, but I was just fooling myself. The vibes of heroic responsibility are immaculate, but vibes can be so dazzling that they prevent us from seeing reality. Countless people want to be a rockstar or a famous athlete or a billionaire, but only in the abstract. If they knew how much work and sacrifice were involved, they'd probably realise that they don't want it at all. Trying to take on too much responsibility will simply crush you, and you may take others down alongside you.
Third, we need more training programs that hope for at least some proportion of their graduates to take up heroic responsibility. Maybe there should be some programs that only focus on this, but this risks pushing people to pick up a boulder heavier than they can carry, so I'd honestly think very carefully before pursuing something like this.
Fourth, we need people to be strategic. It's very easy to fall into the trap of going "well, it's not much but it's something. Anyway, I just have limited capacity, I'm just building a block" when they probably had options that would have been higher-impact without requiring them to take on more than they're willing to bear. Deciding not to pursue heroic responsibility doesn't mean you should allow yourself to be lazy in a way that you'll regret later. John Wentworth has an excellent article on how to think about problem selection. He argues: "if you do not choose robustly generalizable subproblems and find robustly generalizable solutions to them, then most likely, your contribution will not be small; it will be completely worthless." Similarly, we aren't going to be able to plug all the gaps, so the Heroes need an extremely well developed strategic understanding to figure out which gaps most need to be filled—we don't have enough Heroes to fill them all!
Fifth, we need people to review their plan every so often and to be honest with themselves about whether it still makes sense. Not constantly—that'll only distract you—but at sensible intervals. That said, the situation is extremely fluid; the more recently you pivoted, the less likely you should be to pivot again. I know about sunk-cost fallacy and all, but constantly pivoting is like being a first responder at best and a form of insecurity at worst. This is a trap I've fallen into myself. We also need people to occasionally consider whether it would be in line with their values to take on more heroic responsibility. Similarly, we need people to be really honest with themselves about whether they've attempted to bite off more heroic responsibility than they can chew.
Sixth, we need people to be honest about how things are progressing and what they have actually covered. One way we lose is if there's no hero to plug a vital gap. But another way we lose is if we think a gap is covered by someone and it isn't. Unfortunately, effectively communicating your limitations is extremely hard. It is hard to see your blind spots, let alone admit them. In fact, probably the only way this is going to happen is if you don't just admit blind spots, but also potential blind spots. For example, "there's a chance that our selection process is too credentialist, but we've chosen the balance that we have because credentialled people are easier to evaluate, bring more credibility to the field, help improve methodological rigour and tend to have a shorter time to impact". That's the kind of openness that we probably need so that Heroes are not misled about where the gaps are.

There is no such thing as a perfect plan—all plans have flaws or limitations. If you haven't thought through of the potential flaws of this strategy, then you should strongly consider not updating based on this proposal. I've left a few breadcrumbs in the spoiler block directly underneath and I've pasted some selected LLM critiques below that.

It's often hard to analyse the pros and cons of a plan in the abstract, so a good place to begin would be: what are the alternatives? A few possibilities: BlueDot's Defense-in-Depth, a plan crafted specifically for short timelines, a plan more narrowly focused on a specific strategy (like a pause), an all-hands-on-deck plan.

Another direction: what would the Least Convenient Possible World for this plan look like? For example, what if a significant proportion of AI safety work is actually net-negative?

Also, how could the world change such that this plan would become outdated? For example, what if we developed much more powerful co-ordination technology?

Honestly, the critique I'm most sympathetic to at the moment is, "Well done Chris. This seems like a really solid plan... for the world of 2022!

Selected LLM Generated Critiques of this Article

Generated with ChatGPT Extended Thinking. I asked the model to iterate on the sharpest critiques it raised. I'm happy to respond to any of these critiques on request.

1. “Heroic responsibility” is carrying too much of the argument

The article’s central division is between the small number of people who can take “heroic responsibility” for the whole problem and the larger number who cannot, and who should therefore focus on one thing they can do well. That is psychologically vivid, but it is doing too much conceptual work. It smuggles in a picture of the field where the main question is who can bear the burden of whole-problem agency, rather than what structures, checks, and coordination mechanisms are needed to keep people from acting on overconfident global pictures.

From an AI safety perspective, that is risky. One of the field’s central pathologies is not just passivity, but unilateral action under uncertainty: people overestimating their grasp of the strategic landscape, underestimating externalities, and rationalizing risky moves because “someone has to take responsibility.” If “heroic responsibility” becomes the organizing category, it predictably flatters grandiosity and encourages people to think in terms of whether they are one of the few serious enough to shoulder the whole thing. That is a bad attractor for a field that desperately needs better calibration, better feedback loops, and more respect for how partial everyone’s map is.

So the criticism is not merely that the concept is romantic. It is that it risks becoming a status-laden substitute for institutional design. AI safety probably does need some people thinking at the whole-system level. But it does not follow that “heroic responsibility” should be the central meme around which the rest of the division of labor is organized.

2. Divergent problem definitions do not just leave gaps — they create direct conflict

The article is fairly relaxed about the absence of a shared problem statement. It says edge cases make it hard to define one, suggests exact consensus may even be net negative, and then assumes people’s views are close enough that a community can still proceed. From an AI safety perspective, that understates the problem. In this domain, different definitions of “the problem” do not merely lead to benign pluralism. They often generate people working at cross purposes.

If one person thinks the central risk is loss of control from misaligned autonomous systems, another thinks it is misuse, another thinks it is racing dynamics, and another thinks it is concentration of power, then they will not merely choose different priorities. They may support actions that actively undermine each other: more openness versus less openness, more deployment to learn faster versus greater restraint, more lab engagement versus more adversarial pressure, more emphasis on evals versus more emphasis on institutional slowdown, and so on. A project that looks like a valuable “building block” under one framing can look actively harmful under another.

That means the challenge is not just that decentralized actors might fail to cover everything. It is that they may fill the wrong gaps in ways that worsen the overall picture. Once you see that, the article’s “let’s assume our problem statements are close enough” move looks much less harmless. In AI safety, unresolved disagreement about the nature of the problem is not background untidiness. It is a live source of strategic collision.

3. “Be honest about limitations” is not a strong enough response to false coverage

The article is very good at noticing one specific failure mode: not just uncovered gaps, but the belief that a gap is covered when it isn’t. That is genuinely important. But the proposed remedy is mostly a norm of greater candor — admitting blind spots, or even potential blind spots, so that “Heroes” are not misled about where coverage exists.

I do not think that is enough. Communities usually do not get false coverage because everyone forgot to be humble. They get it because the incentive gradients are bad. People are rewarded for seeming competent, legible, and on top of things. Others do not want to challenge them too aggressively. Success proxies get mistaken for real coverage. And some limitations are hardest to see precisely for the people most likely to reassure others that the area is handled. In other words, false coverage is not mainly a virtue problem. It is a mechanism-design problem.

So the criticism here is that the article reaches for a moral solution where a structural one is needed. What you want is not just more openness. You want ways of testing whether the claimed coverage is real: independent review, adversarial audits, explicit ownership maps, public criteria for what “covered” means, and regular attempts to discover strategic orphan problems. Otherwise “please communicate your blind spots more honestly” will help at the margins while leaving the central failure mode intact.

4. The anti-centralization move overshoots

The article says we should avoid unrealistic assumptions about centralized authority, and suggests that even if such centralization were possible it would likely make things worse. There is an important truth here: AI safety is not going to be solved by one planner assigning everyone their role. But the argument overshoots if it makes decentralization sound like the real alternative.

The actual alternative is not “central command” versus “everyone figures out their niche.” It is partial structured coordination. Shared eval standards, common threat models, agreed escalation pathways, deliberate division of labor across orgs, cross-project reviews, and regular attempts to compare priorities are all forms of coordination that fall far short of central planning. In a field like AI safety, where mistakes can compound and different agendas can undermine one another, the absence of those structures is not neutral. It is itself one of the main ways you lose.

So I would not say the article is wrong to reject a fantasy of full centralization. I would say it moves too quickly from that rejection to a picture where decentralized actors, if sufficiently self-aware and strategic, can more or less sort things out. That is too optimistic. AI safety is precisely the sort of domain where local good intentions and decent judgment are not enough; you also need coordination structures strong enough to keep different parts of the field from drifting apart or colliding.

The one-line summary is:

The piece is strongest as advice about temperament and role-selection within a decentralized community, but weakest where it implicitly treats those norms as close to a field strategy.

ChatGPT Image Mar 29, 2026, 05_06_49 PM.png

Thank you to the 2026 Sydney AI Safety Fellows, my primary co-organiser Jack Payne and all the others who assisted with various parts of the organisation (Michael, Hunter, Luke, etc.). Thank you also to the Sydney AI Safety Space who hosted us as well as all the mentors, speakers and guests.

The name of this strategy is "Path to Victory" with the quotation marks included. It may seem like a minor detail, but I believe this to be important.

^{^}
Last year, I was at a retreat (on a different but related topic) where we spent a bunch of time trying to define the problem, but we didn't succeed because different people had different conceptions.
^{^}
That said, there are some big disadvantages as well.
^{^}
I haven't read Hayek, but apparently these arguments are his wheelhouse.
^{^}
In his 2025 review, Alexander Berger referenced Nan Ransohoff's concept of General Managers as something they were keen to explore. I take the concept of a 'general manager' as essentially just meaning someone taking heroic responsibility with a lot of resources.
^{^}
I may very well be the first person in history to call a Tweet an "X".
^{^}
"If your prediction is right (a wider range of people can work on short timelines if they're only biting off a small chunk), this has significant practical implications for community strategy. This could be a standalone claim worth developing, rather than a brief aside." — I'll keep that in mind for the future (insofar as there is one).
^{^}
This model is a significant simplification in that it is possible to take heroic responsibility for a sub-problem even if you can't take heroic responsibility for the whole problem. When writing the rest of this post, I tried to keep this more complicated model in mind in the hope that my analysis would still apply.
^{^}
It might make sense for them to experiment with a few things before deciding what to focus on. There will also be some exceptions where there are strong synergies between different areas.
^{^}
There is a discipline called systems thinking, but I haven't yet found the time to engage with it substantively, so I think about systems in a more ad hoc way.
^{^}
Obviously, you need to take into account the probability that someone actually builds on your work.
^{^}
Jay Bailey describes the difference between bridges and walls—walls benefit from each additional block, but bridges only work if the whole structure is complete.
^{^}
The tools that people use are very ad hoc.
^{^}
He provides two strategies for dealing with this:
- The Very General Helper Strategy: trying to find something that is useful no matter how the big central problems are solved.
- The One Who Actually Thought This Through A Bit: thinking through whether there are any subproblems of the big central problem that must predictably be solved in any solution of the general problem and attempting to find solutions to this subproblem that will be likely be useful regardless of how we solve the rest of the problem.

^{^}

As noted in the article, this isn't really a binary. There are various degrees of "heroic responsibility".

^{^}

As noted in the article, this isn't really a binary. There are various degrees of "heroic responsibility".

[-]Seth Herd2mo40

I think this would be better framed as a continuum, like most complex topics. People can decide how much they want to try solving the whole problem, versus just choosing a piece of it to focus on. And people can decide how seriously to take short timelines in choosing what work to do.

There's also a process of taking short timelines seriously on an intellectual level without letting that propagate to the emotional level. I think this is preferable to not taking them seriously at all.

I take short timelines seriously on the intellectual level, and that's bleeding over to the emotional level and increasingly causing me stress. But not too much.

I do worry that people feel like they have to choose whether to take short timelines seriously or not, and that's probably biasing people toward not taking them as seriously as they deserve.

[-]Chris_Leong2mo20

There's always a trade-off between simplicity and nuance. I don't know if more explicitly framing it as a continuum would improve this model, but I'd love to see someone explore this.

[-]StanislavKrym2mo01

I don't think that I understand the plan. As Claude Sonnet 4.6 put it,

Claude's stance on the alleged plan

This is a fair and pointed critique. Honestly, "Path to Victory" reads much more like meta-advice about how a community should organize itself than a plan in any conventional sense. Here's why the comparison is unflattering:

What a plan actually is: Defense in Depth and an AI pause strategy both have object-level content — they specify what should be done, by whom, toward what end, and why that end addresses the problem. You could disagree with them, stress-test them, or identify where they fail. They have a shape.

What "Path to Victory" actually is: It's a set of norms for how individuals in the AI safety community should reflect on their own roles and capacities. It tells people to: be honest with themselves, pick a lane, review their choices periodically, communicate their limitations. These are virtues or heuristics, not a plan. You could follow every piece of advice in the post and still have no idea what to actually do about AI risk.

The post arguably admits this. It explicitly says consensus on a problem statement would likely be negative, that centralized planning won't happen, and that the "plan" is really about decentralized actors each contributing building blocks. It even notes that the two-tier hero/non-hero model is "a bit too binary." The author is essentially describing a theory of how plans might emerge rather than offering one.

The comparison the post itself invites — against Defense in Depth, a pause strategy, or an all-hands approach — actually highlights the gap. Those are strategies you can evaluate on their merits. "Path to Victory" is more like advice about how to engage with whichever strategy you find most compelling. It's one level of abstraction above a plan.

In its defense: There's a real and legitimate point buried in it — that the AI safety community has diverse contributors and needs frameworks for thinking about differentiated roles, not just hero-or-bust thinking. That's a genuine gap. But the post gestures at filling it rather than filling it. The author even acknowledges it "needs to be developed in more detail."

So to directly answer your question: no, not really — at least not in the same category as the alternatives it names. It's closer to community epistemics than strategy.

[-]Chris_Leong2mo*50

I guess "meta-plan" is a bit more precise—but it's not like plan is a technical term and, in practise, the distinction between plans and meta-plan breaks down if you look closely enough. Further, it's debatable whether victory depends more on details or process.

If you want more concrete detail on how this works^[1]:
• The articles on heroic responsibility and Shut up and do the impossible! provide more detail on how "heroes" should act.
• As for the iterators, to a first approximation, I agree with John Wentworth about the importance of robustly generalizable (either via the Very General Helper strategy or the One Who Actually Thought This Through A Bit strategy). Though my second approximation analysis would also account for the value of a) work done for its intellectual "elegance" b) work which demonstrates that an approach is broken.

There's a lot more details that could be filled out, but I'm fine with leaving that to follow-up posts or comments.

You could disagree with them, stress-test them, or identify where they fail.

I think it's possible to do that with this plan as well, even if it's harder with a more abstract plan. Tell Claude it just needs to believe in itself 😛.

against Defense in Depth, a pause strategy, or an all-hands approach

It may feel strange to compare a plan to a meta-plan, but it makes sense in some contexts.

In particular:
• I believe that comparing my meta-plan against these concrete plans reveals some of the limitations of this meta-plan (I'd encourage you to ask Claude to attempt this analysis).
• Let's suppose your trying to select a high-level plan to turn into a concrete strategy. Well, you can choose to start from a plan or a meta-plan. Maybe a meta-plan would be a bit more work, but it may be worth it if it provides better results.

Maybe I should finish with this: when you say you don't understand the plan, what precisely do you mean? You want to understand the plan and then... what? I'm assuming you don't just want to understand the plan out of love of knowledge or idle curiosity, but for some more substantive reason.

^{^}
As noted in the article, this isn't really a binary. There are various degrees of "heroic responsibility".