I don't think that I understand the plan. As Claude Sonnet 4.6 put it,
Claude's stance on the alleged plan
This is a fair and pointed critique. Honestly, "Path to Victory" reads much more like meta-advice about how a community should organize itself than a plan in any conventional sense. Here's why the comparison is unflattering:
What a plan actually is: Defense in Depth and an AI pause strategy both have object-level content — they specify what should be done, by whom, toward what end, and why that end addresses the problem. You could disagree with them, stress-test them, or identify where they fail. They have a shape.
What "Path to Victory" actually is: It's a set of norms for how individuals in the AI safety community should reflect on their own roles and capacities. It tells people to: be honest with themselves, pick a lane, review their choices periodically, communicate their limitations. These are virtues or heuristics, not a plan. You could follow every piece of advice in the post and still have no idea what to actually do about AI risk.
The post arguably admits this. It explicitly says consensus on a problem statement would likely be negative, that centralized planning won't happen, and that the "plan" is really about decentralized actors each contributing building blocks. It even notes that the two-tier hero/non-hero model is "a bit too binary." The author is essentially describing a theory of how plans might emerge rather than offering one.
The comparison the post itself invites — against Defense in Depth, a pause strategy, or an all-hands approach — actually highlights the gap. Those are strategies you can evaluate on their merits. "Path to Victory" is more like advice about how to engage with whichever strategy you find most compelling. It's one level of abstraction above a plan.
In its defense: There's a real and legitimate point buried in it — that the AI safety community has diverse contributors and needs frameworks for thinking about differentiated roles, not just hero-or-bust thinking. That's a genuine gap. But the post gestures at filling it rather than filling it. The author even acknowledges it "needs to be developed in more detail."
So to directly answer your question: no, not really — at least not in the same category as the alternatives it names. It's closer to community epistemics than strategy.
I guess "meta-plan" is a bit more precise—but it's not like plan is a technical term and, in practise, the distinction between plans and meta-plan breaks down if you look closely enough. Further, it's debatable whether victory depends more on details or process.
If you want more concrete detail on how this works[1]:
• The articles on heroic responsibility and Shut up and do the impossible! provide more detail on how "heroes" should act.
• As for the iterators, to a first approximation, I agree with John Wentworth about the importance of robustly generalizable (either via the Very General Helper strategy or the One Who Actually Thought This Through A Bit strategy). Though my second approximation analysis would also account for the value of a) work done for its intellectual "elegance" b) work which demonstrates that an approach is broken.
There's a lot more details that could be filled out, but I'm fine with leaving that to follow-up posts or comments.
You could disagree with them, stress-test them, or identify where they fail.
I think it's possible to do that with this plan as well, even if it's harder with a more abstract plan. Tell Claude it just needs to believe in itself 😛.
against Defense in Depth, a pause strategy, or an all-hands approach
It may feel strange to compare a plan to a meta-plan, but it makes sense in some contexts.
In particular:
• I believe that comparing my meta-plan against these concrete plans reveals some of the limitations of this meta-plan (I'd encourage you to ask Claude to attempt this analysis).
• Let's suppose your trying to select a high-level plan to turn into a concrete strategy. Well, you can choose to start from a plan or a meta-plan. Maybe a meta-plan would be a bit more work, but it may be worth it if it provides better results.
Maybe I should finish with this: when you say you don't understand the plan, what precisely do you mean? You want to understand the plan and then... what? I'm assuming you don't just want to understand the plan out of love of knowledge or idle curiosity, but for some more substantive reason.
As noted in the article, this isn't really a binary. There are various degrees of "heroic responsibility".
This article is based on reflections from co-leading the Sydney AI Safety Fellowship. This post is primarily focused on AI safety, but I expect the lessons to be more generally applicable.
We have a Problem™ 😱—actually several problems; okay even more problems.
Unfortunately, the edge cases make it hard to write down a shared problem statement[1], but we just need to get on with it regardless, since The Universe Doesn't Have to Play Nice. Consensus on exact problem bounds would likely be net negative anyway, as it would make it much easier to completely neglect at least one value or threat that we would later regret not dealing with[2].
So let's just assume that we have our problem statement and that it's close enough to other people's problem statement that a community forms around it. This community consists of people with a variety of different skills, temperaments and degrees of commitment. This naturally leads to the question of what aspects different members of the community should be working on. Notably, we need to avoid unrealistic assumptions about there being a centralised authority to draw up a plan and divide tasks. It's not going to happen, and even if centralisation were possible, it would likely just make things worse[3].
Instead, we need to tackle the problem in a more decentralised manner. But how can we avoid dropping a ball we can't afford to drop without centralised co-ordination?
Well, whatever we do, we need to take into account that there is variation in commitment and ability. Some people are willing and able to take Heroic Responsibility[4], using risky techniques like Shut up and do the impossible!—others are not. Indeed, I suspect that very few people (or groups) will be capable of taking heroic responsibility for the whole problem. I was persuaded of this by an X[5] by Richard Ngo:
This resonates strongly with me, though I'd frame it in terms of taking heroic responsibility for short timelines. I expect working on short timelines to be much less intense if you're only biting off a small chunk of the problem, so I predict a wider range of people could work on this than Richard expects[6].
In any case, this is where we are: we have a small number of people who can take heroic responsibility for the whole problem and a much larger number of people who can't[7]. The people who can't take heroic responsibility should primarily focus, focus, focus and pick one thing they can do well. I resonate strongly with how EA thinks about prioritisation, but I differ in that I think more in terms of systems[8] and less in terms of direct, measurable impact. To be specific, I tend to think more in terms of interventions as building blocks (such as gathering resources or discovering information) that others can attempt to build on top of[9]—be it incrementalists laying one more brick or Heroes™ shaping it into a working plan[10].
I feel that there's a broad understanding that many of EA's old mental models of how to think about impact don't really carry over from EA's global health beginnings to the AI safety context, but we never really developed proper replacements[11]. I think it's important to understand that there are different kinds of domains. Universal tools or mental models would be ideal, but this is extremely hard, perhaps even impossible. Producing tools to solve the problem in front of you feels much more viable. What I've described above fills in some of the blanks, but it needs to be developed in more detail.
I think it's worth stepping back and asking what would need to happen for this plan to succeed:
There is no such thing as a perfect plan—all plans have flaws or limitations. If you haven't thought through of the potential flaws of this strategy, then you should strongly consider not updating based on this proposal. I've left a few breadcrumbs in the spoiler block directly underneath and I've pasted some selected LLM critiques below that.
It's often hard to analyse the pros and cons of a plan in the abstract, so a good place to begin would be: what are the alternatives? A few possibilities: BlueDot's Defense-in-Depth, a plan crafted specifically for short timelines, a plan more narrowly focused on a specific strategy (like a pause), an all-hands-on-deck plan.
Another direction: what would the Least Convenient Possible World for this plan look like? For example, what if a significant proportion of AI safety work is actually net-negative?
Also, how could the world change such that this plan would become outdated? For example, what if we developed much more powerful co-ordination technology.
Selected LLM Generated Critiques of this Article
Generated with ChatGPT Extended Thinking. I asked the model to iterate on the sharpest critiques it raised. I'm happy to respond to any of these critiques on request.
1. “Heroic responsibility” is carrying too much of the argument
The article’s central division is between the small number of people who can take “heroic responsibility” for the whole problem and the larger number who cannot, and who should therefore focus on one thing they can do well. That is psychologically vivid, but it is doing too much conceptual work. It smuggles in a picture of the field where the main question is who can bear the burden of whole-problem agency, rather than what structures, checks, and coordination mechanisms are needed to keep people from acting on overconfident global pictures.
From an AI safety perspective, that is risky. One of the field’s central pathologies is not just passivity, but unilateral action under uncertainty: people overestimating their grasp of the strategic landscape, underestimating externalities, and rationalizing risky moves because “someone has to take responsibility.” If “heroic responsibility” becomes the organizing category, it predictably flatters grandiosity and encourages people to think in terms of whether they are one of the few serious enough to shoulder the whole thing. That is a bad attractor for a field that desperately needs better calibration, better feedback loops, and more respect for how partial everyone’s map is.
So the criticism is not merely that the concept is romantic. It is that it risks becoming a status-laden substitute for institutional design. AI safety probably does need some people thinking at the whole-system level. But it does not follow that “heroic responsibility” should be the central meme around which the rest of the division of labor is organized.
2. Divergent problem definitions do not just leave gaps — they create direct conflict
The article is fairly relaxed about the absence of a shared problem statement. It says edge cases make it hard to define one, suggests exact consensus may even be net negative, and then assumes people’s views are close enough that a community can still proceed. From an AI safety perspective, that understates the problem. In this domain, different definitions of “the problem” do not merely lead to benign pluralism. They often generate people working at cross purposes.
If one person thinks the central risk is loss of control from misaligned autonomous systems, another thinks it is misuse, another thinks it is racing dynamics, and another thinks it is concentration of power, then they will not merely choose different priorities. They may support actions that actively undermine each other: more openness versus less openness, more deployment to learn faster versus greater restraint, more lab engagement versus more adversarial pressure, more emphasis on evals versus more emphasis on institutional slowdown, and so on. A project that looks like a valuable “building block” under one framing can look actively harmful under another.
That means the challenge is not just that decentralized actors might fail to cover everything. It is that they may fill the wrong gaps in ways that worsen the overall picture. Once you see that, the article’s “let’s assume our problem statements are close enough” move looks much less harmless. In AI safety, unresolved disagreement about the nature of the problem is not background untidiness. It is a live source of strategic collision.
3. “Be honest about limitations” is not a strong enough response to false coverage
The article is very good at noticing one specific failure mode: not just uncovered gaps, but the belief that a gap is covered when it isn’t. That is genuinely important. But the proposed remedy is mostly a norm of greater candor — admitting blind spots, or even potential blind spots, so that “Heroes” are not misled about where coverage exists.
I do not think that is enough. Communities usually do not get false coverage because everyone forgot to be humble. They get it because the incentive gradients are bad. People are rewarded for seeming competent, legible, and on top of things. Others do not want to challenge them too aggressively. Success proxies get mistaken for real coverage. And some limitations are hardest to see precisely for the people most likely to reassure others that the area is handled. In other words, false coverage is not mainly a virtue problem. It is a mechanism-design problem.
So the criticism here is that the article reaches for a moral solution where a structural one is needed. What you want is not just more openness. You want ways of testing whether the claimed coverage is real: independent review, adversarial audits, explicit ownership maps, public criteria for what “covered” means, and regular attempts to discover strategic orphan problems. Otherwise “please communicate your blind spots more honestly” will help at the margins while leaving the central failure mode intact.
4. The anti-centralization move overshoots
The article says we should avoid unrealistic assumptions about centralized authority, and suggests that even if such centralization were possible it would likely make things worse. There is an important truth here: AI safety is not going to be solved by one planner assigning everyone their role. But the argument overshoots if it makes decentralization sound like the real alternative.
The actual alternative is not “central command” versus “everyone figures out their niche.” It is partial structured coordination. Shared eval standards, common threat models, agreed escalation pathways, deliberate division of labor across orgs, cross-project reviews, and regular attempts to compare priorities are all forms of coordination that fall far short of central planning. In a field like AI safety, where mistakes can compound and different agendas can undermine one another, the absence of those structures is not neutral. It is itself one of the main ways you lose.
So I would not say the article is wrong to reject a fantasy of full centralization. I would say it moves too quickly from that rejection to a picture where decentralized actors, if sufficiently self-aware and strategic, can more or less sort things out. That is too optimistic. AI safety is precisely the sort of domain where local good intentions and decent judgment are not enough; you also need coordination structures strong enough to keep different parts of the field from drifting apart or colliding.
The one-line summary is:
The piece is strongest as advice about temperament and role-selection within a decentralized community, but weakest where it implicitly treats those norms as close to a field strategy.
Thank you to the 2026 Sydney AI Safety Fellows, my primary co-organiser Jack Payne and all the others who assisted with various parts of the organisation (Michael, Hunter, Luke, etc.). Thank you also to the Sydney AI Safety Space/Sydney Knowledge Hub who hosted us as well as all the mentors, speakers and guests.
The name of this strategy is "Path to Victory" with the quotation marks included. It may seem like a minor detail, but I believe this to be important.
Last year, I was at a retreat (on a different but related topic) where we spent a bunch of time trying to define the problem, but we didn't succeed because different people had different conceptions.
That said, there are some big disadvantages as well.
I haven't read Hayek, but apparently these arguments are his wheelhouse.
In his 2025 review, Alexander Berger referenced Nan Ransohoff's concept of General Managers as something they were keen to explore. I take the concept of a 'general manager' as essentially just meaning someone taking heroic responsibility with a lot of resources.
I may very well be the first person in history to call a Tweet an "X".
"If your prediction is right (a wider range of people can work on short timelines if they're only biting off a small chunk), this has significant practical implications for community strategy. This could be a standalone claim worth developing, rather than a brief aside." — I'll keep that in mind for the future (insofar as there is one).
This model is a significant simplification in that it is possible to take heroic responsibility for a sub-problem even if you can't take heroic responsibility for the whole problem. When writing the rest of this post, I tried to keep this more complicated model in mind in the hope that my analysis would still apply.
There is a discipline called systems thinking, but I haven't yet found the time to engage with it substantively, so I think about systems in a more ad hoc way.
Obviously, you need to take into account the probability that someone actually builds on your work.
Jay Bailey describes the difference between bridges and walls—walls benefit from each additional block, but bridges only work if the whole structure is complete.
The tools that people use are very ad hoc.
He provides two strategies for dealing with this: