Plan 'Straya

William the Kiwi

Plan 'Straya: A Comprehensive Alignment Strategy

Version 0.3 — DRAFT — Not For Distribution Outside The Pub

Epistemic status: High confidence, low evidence. Consistent with community norms.

Executive Summary

Existing alignment proposals suffer from a shared flaw: they assume you can solve the control problem before the catastrophe. Plan 'Straya boldly inverts this. We propose achieving alignment the way humanity has historically achieved most of its moral progress — by first making every possible mistake, losing nearly everything, and then writing a strongly-worded resolution about it afterward.

The plan proceeds in three rigorously defined phases.

Phase 1: Anticorruption Measures (Kinetic)

The scholarly literature on AI governance emphasises that institutional integrity is a prerequisite for safe deployment. We agree. Where we diverge from the mainstream is on methodology.

Most proposals suggest "regulatory frameworks" and "oversight bodies." The NIST AI Risk Management Framework provides a voluntary set of guidelines that organisations may choose to follow, partially follow, or simply reference in press releases. The EU AI Act classifies systems into risk tiers with the quiet confidence of a taxonomy that will be obsolete before its implementing regulations are finalised. The Frontier Model Forum, meanwhile, brings together the leading AI laboratories in a spirit of cooperative self-governance, a phrase which here means "a shared Google Doc and quarterly meetings in San Francisco."

These approaches share a well-documented failure mode: the people staffing them are, in technical terms, politicians. Plan 'Straya addresses this via what we call "a vigorous personnel restructuring of the Australian federal and state governments," targeting specifically those members identified as corrupt.

We acknowledge that the identification mechanism — determining which officials are corrupt — is itself an alignment problem. Specifically, it requires specifying a value function ("not corrupt"), building a classifier with acceptable false-positive and false-negative rates, and then acting on the classifier's outputs in conditions of uncertainty. We consider it elegant that Plan 'Straya encounters the alignment problem immediately in Phase 1. Most plans do not encounter it until much later, by which point they have accumulated too much momentum to stop.

The identification problem is left for future work. We note only that the Australian electorate has historically demonstrated strong intuitions here, typically expressed in language not suitable for an academic paper.

Several objections arise immediately:

Q: Isn't this wildly illegal? A: Yes. However, we note that Plan 'Straya is an alignment plan, and alignment researchers have a proud tradition of ignoring implementation details that fall outside their core model. We further note that our plan requires violating the law of exactly one (1) country, which compares favourably with proposals that require the voluntary cooperation of every major world government simultaneously.

Q: Who decides who's corrupt? A: See above. Future work.

Q: Why Australia specifically? A: Strategic considerations developed in Phase 3. Also, the authors are partial.

Phase 2: Strategic Thermonuclear Exchange (Blame-Shifted)

With the Australian government now staffed exclusively by the non-corrupt (estimated remaining headcount: 4–7 people), we proceed to the centrepiece of the plan.

A nuclear exchange is initiated between the major global powers. The specific mechanism is unimportant — the alignment literature assures us that if you specify the objective function clearly enough, the details sort themselves out.

Critically, the exchange is attributed to a misaligned AI system. This is the key technical contribution of Plan 'Straya. We observe:

The public already expects AI to do something catastrophic at some unspecified point. We are merely collapsing the wavefunction.
"The AI did it" is rapidly becoming the 21st century's equivalent of "a dingo ate my baby" — implausible, but strangely difficult to definitively rule out.
No existing AI system could actually launch nuclear weapons. But no existing AI system can do most of the things alignment plans worry about, and that hasn't slowed the field down.

The blame-shift serves a vital pedagogical function. Post-exchange, the surviving population will possess an empirically grounded motivation to take alignment seriously, as opposed to the current approach of posting on LessWrong and hoping.

Projected casualties: Most of them. (95% CI: 7.4–8.1 billion, assuming standard nuclear winter models and the usual optimistic assumptions about agricultural resilience that defence planners have been making since the 1960s.)

Ethical review status: We submitted this to an IRB. The IRB building is in Phase 2's blast radius. We consider this a self-resolving conflict of interest.

Relationship to the Pause Debate

We are aware of ongoing discourse regarding whether AI development should be paused, slowed, or accelerated. Plan 'Straya offers a synthesis: development is permanently paused for approximately 99.7% of the global population, while being radically accelerated for the survivors. We believe this resolves the debate, or at minimum relocates it to a jurisdiction with fewer participants.

The e/acc community will note that Phase 2 constitutes the most aggressive possible acceleration of selection pressure. The pause community will note that it constitutes an extremely effective pause. We are proud to offer something for everyone.¹

Phase 3: Civilisational Rebuild (The 'Straya Bit)

Australia survives for reasons that are approximately strategic and approximately vibes-based:

Geographic isolation from major nuclear targets.
A population pre-adapted to hostile environments, venomous wildlife, and institutional dysfunction.
Existing cultural infrastructure around the concept of "she'll be right," which we formalise below.
Extensive experience governing a landmass where nearly everything is trying to kill you — arguably the closest existing analogue to superintelligence management.

Cultural-Theoretic Factors

We propose that several features of Australian culture, typically dismissed as informality or apathy, are in fact alignment-relevant heuristics:

"She'll be right" (Corrigibility Condition). We define the She'll Be Right Principle (SBRP) as follows: given an agent A operating under uncertainty U, SBRP states that A should maintain default behaviour unless presented with overwhelming and undeniable evidence of catastrophic failure, at which point A should mutter "yeah nah" and make a minimal corrective adjustment. This is formally equivalent to a high-threshold corrigibility condition with lazy evaluation. It compares favourably with proposals requiring perpetual responsiveness to correction, which, as any Australian will tell you, is not how anything actually works.

"Tall Poppy Syndrome" (Capability Control). Any agent that becomes significantly more capable than its peers is subject to systematic social penalties until capability parity is restored. This is the only capability-control mechanism in the literature empirically tested at civilisational scale for over two centuries. Its principal limitation is that it also penalises competence, which we acknowledge is a significant alignment tax but may be acceptable given the alternative.

The Reconstruction

The surviving Australian parliamentarians (now 3–6, following a disagreement over water rights in the Murray-Darling Basin, which we note predates and will outlast the apocalypse) oversee civilisational reconstruction. Their first act is to build an aligned superintelligence.

"But how?" the reader asks.

We respond: they will have learned from the experience. Approximately 7.9 billion people will have died demonstrating that unaligned AI is dangerous. This constitutes a very large training dataset. We apply the scaling hypothesis — the same one capabilities researchers use to justify training runs — but to warnings rather than parameters: surely if you make the warning big enough, somebody will listen.

The aligned superintelligence is then constructed using:

Lessons learned (see above, 7.9 billion data points)
Australian common sense (see SBRP above for formalisation)
Some kind of RLHF variant, probably, the details aren't really the point

Comparison With Existing Proposals

Feature	MIRI	Anthropic	OpenAI	Plan 'Straya
Requires solving the hard problem first	Yes	Yes	"We'll figure it out"	No
Handwaves over catastrophic intermediate steps	Somewhat	Somewhat	Significantly	Gloriously
Assumes cooperation from competing labs	Not anymore	Officially no; structurally yes	Officially yes	N/A (blast radius)
Number of people who need to die	0 (aspirational)	0 (aspirational)	0 (aspirational)	~7.9 billion (load-bearing)
Honest about its own absurdity	No	No	No	Aggressively

Discussion

The authors recognise that Plan 'Straya has certain limitations. It is, for instance, a terrible plan. We stress, however, that it is terrible in a transparent way, which we argue is an improvement over plans that are terrible in ways that only become apparent when you read the fine print.

Most alignment proposals contain a step that, if you squint, reads: "and then something sufficiently good happens." Plan 'Straya merely makes this step legible. Our "something sufficiently good" is: nearly everyone dies, and then Australians figure it out. We contend this is no less plausible than "we will solve interpretability before capabilities researchers make it irrelevant," but has the advantage of fitting on a napkin.

We further observe that writing satirical alignment plans is itself a species of the problem being satirised — more entertaining than doing alignment research, requiring less mathematical ability, and producing a warm feeling of intellectual superiority at considerably lower cost. We flag this as evidence that the alignment community's incentive landscape may have failure modes beyond those typically discussed.

Conclusion

Plan 'Straya does not solve the alignment problem. It does, however, solve the meta-alignment problem of people not taking alignment seriously enough, via the mechanism of killing almost all of them. The survivors will, we feel confident, be extremely motivated.

She'll be right.

Appendix A: Formal Model

Let H denote humanity, A denote an aligned superintelligence, and K denote the subset of H that survives Phase 2 (|K| ≈ 300 million, predominantly Australasian).

We define the alignment function f : K × L → A, where L denotes the set of lessons learned from the extinction of H \ K.

Theorem 1. If |L| is sufficiently large, then f(K, L) = A.

Proof. We assume the result. ∎

The authors declare no conflicts of interest, partly because most interested parties are projected casualties.

Submitted for peer review. Peer availability may be limited by Phase 2.

Footnotes

"Everyone" here refers to the surviving population. See Projected casualties for limitations. ↩

^{^}

The use of the word "just" by an Australian means that whatever it is you have to do, it will not be easy, as in "Just pull that sword out of the stone" or "Just split that atom." - Michael Palin