Thoughts on AI Safety Megagame Design

Philip Harker

Preface

This is an idea that's been colonising my brain for days. I am new to the AI safety scene, but as a game designer, I've been trying to figure out what role (if any) I can realistically play in the conversation. I think I've got something.

This is an ideation document. I am sharing this with some of my AI safety friends, game design friends, and the internet to hear some initial feedback. I want to figure out whether this idea works and could be built out as a field-building AI safety project.

What about this makes sense? What am I missing? What problems have I not identified? Who (individuals and groups) would be interested in this project? I welcome all your comments.

Introduction

I recently attended an early preview of D. Scott Phoenix's The Endgame in Berkeley, CA. You can read my immediate after-action report on LessWrong here, but the short version is that it's a LARP/wargame/RPG experience in which ~40 players take on the roles of various players in the AI space (OpenAI and Anthropic, yes, but also venture capital and the Chinese government) and interact. Players take actions, make deals, and the game master resolves the outcome over three rounds of play.

The Endgame an extremely interesting concept that feels too lightweight and underdeveloped (not to Phoenix's discredit— he was constrained by his audience and by time). There are many ways in which the Phoenix's game could have been improved, but alas, he is presenting it at a conference this week to some key AI executives and he will never run the game again. Fair enough, Phoenix is a busy man.

The playtest on Tuesday got me thinking. The idea of using a "megagame" to model future AI safety contingencies on a macro level is very compelling. There are several reasons why serious people in the AI safety space might want to do this. I can think of three.

For one it's a fun and engaging learning tool for the public. The number of educated people who don't fully understand the severity of the AI safety problem (like myself a month ago) is shocking. Getting some nerds to play a round of The Endgame using terms and tools they understand (software development, US-China relations) and watching in real time as the AI shifts the balance of power is educational in a way that LessWrong doomer posts are not.

An AI safety megagame is also a very useful system for testing AI outcomes. Once the design framework is in place, the game masters (GMs) can adjust starting conditions, tweak numeric variables, and even introduce random events mid-game. Perhaps Claude could handle the setup and prompting of this experience, but the human element of The Endgame is its main value proposition. The goal is to simulate outcomes with many less-than-rational actors making decisions based on incentives, emotions, and objectives, and I'm just not convinced that 50 Claude Code instances could do that.

It's also a fun thought experiment and, if nothing else, I can see this being a fun and gainful event for a startup game developer to run at conferences, conventions, or independent events ala The Megagame Makers in the UK.

As a game designer, my task is to take these desired outcomes (public education, empirical simulation, and marketable fun) and translate them into a coherent design. That is a future post. For now I need to ideate. What considerations must I make in the design of The Endgame v2?

The Object of the Game

The Endgame attempts to simulate the future, which is very difficult. So difficult, in fact, that I don't think conventional win conditions like "first player to X points" makes sense. If artificial superintelligence is an existential threat, then it doesn't make sense to say "you win if you control X percent of all data centers in the world". That defeats the purpose of the exercise.

I think I have two options.

Option 1 is to create hyper-specific win conditions for each faction (i.e. "The United States wins if A, B, and C are true"). This would probably be public information. The catch would be that if the AI achieves their win condition, everyone else loses. That's a fairly common design paradigm in semi-cooperative board games. It basically incentivises players to think "okay, we should keep the AI on the backfoot, but if I also sabotage this other player I too can win."

Option 2 is to not have any win conditions. To simply allow the simulation to run and see what happens. This version feels more 'scientific' but I am dissatisfied with it. I am balancing competing interests in this game design, and while it would be cool to lock 50 humans in a room and force them to iterate over and over with no victory conditions, I need to include some kind of a win state to make the game actually satisfying to play.

Roles

The set of roles in The Endgame v1 are an interesting mix. Three people play "OpenAI" or "The US Government", but there are also three people each on "Capital" and "The Public". This design treats the AI space not as an ecosystem, but as a set of uniform modules that ought to follow the same rules and take the same types of actions in the same amount of time.

I think this is misguided. The vast library of factions to choose from is evocative and I don't want to do away with it, but from a design standpoint it seems best to first group these factions into categories.

State Actors

Possible state actors to model include:

The United States
China
The European Union
Russia
The Gulf States

It's also worth considering non-state actors/rogue states as well. Some of these can be modelled from existing powers (Iran and proxies, etc.), but as the game progresses and the global situation deteriorates, novel non-state actors might emerge situationally.

State actors serve two main purposes in the ecosystem: regulatory (everything from grants/infrastructure investment to nationalisation of AI labs) and geopolitical (trade, diplomacy, and warfare).

Phoenix's design included just the US and China (the game originally included the EU, but apparently the EU players found themselves with little to do). It is tempting to stick with just the US and China, but having just two state actors perhaps presupposes conflict between them. The question of whether kinetic war between the US and China is a topic for another post, but given the design choice of "two ontologically opposed states" and "a small community of major powers whose alliances may wax and wane", I find the latter more compelling.

I will return to the topic of politics and war later on.

Corporations

Slight word choice issue here: this category includes for-profit and nominally non-profit organisations, mostly in the tech world. It also includes both AI companies and AI-adjacent companies. Possible corporations to model include:

OpenAI
Anthropic
Google
xAI
Meta
NVIDIA
Taiwan Semiconductor Manufacturing Company*

I asterisk TSMC because, though they are vital, part of me thinks that the TSMC wouldn't be very interesting to play. Their only goals would be to build chips, maybe expand operations globally, and not get blown up by China. Maybe it would need to be some kind of NPC faction.

Corporations exist to earn money. Some combination of fundraising, enterprise AI sales, consumer AI sales, and non-AI revenue streams will provide revenue. After expenses, corporations can use money to expand operations, buy/build more compute, and train new models.

What interests me about the corporations is that some of them control other products/firms that could be very useful. xAI's stakeholders control Tesla and SpaceX. Meta, despite being behind in the AI race, controls Snapchat and Instagram.

As with State Actors, each player on a Corporation would need a fixed role. While the CEO would have the power to allocate resources within their faction, it is the CTO's responsibility to train new models, the COO's responsibility to handle inter-corporate relations, et cetera. Corporate governance could be an interesting design angle:

Corporations also have fun possible mechanics such as IPOs, spin-offs, and mergers and acquisitions. Some of these would be easier to model than others, but it's important to internalise (both as a designer and a player) that corporations are not countries and they follow very different rules.

Institutions

This is a tricky category. One of Phoenix's cleverest ideas was a faction simply called "Capital", representing the institutional investors who inject money into the ecosystem. I don't feel a particular need to break "capital" down into "VCs" versus "private equity" or whatever, but the idea of more nebulous interest groups and their relationships with the specific actors above is interesting. I'd propose three.

Capital
Mass media
AI Safety Nonprofits*

I asterisk AI safety NPOs because that's a bit too meta-gamey to model. In practice many of the people playing this game might be AI safety bros. Perhaps this institution is best deployed as the "GM faction", setting benchmarks and such.

Capital's role is to take their enormous capital and turn it into more capital as efficiently and quickly as possible. It doesn't make sense to throw three players together and call them "Capital" with a shared pool of resources. Different investors might be interested in different things. Investor A might already hold shares in Google and sell those shares to get into OpenAI, while Investor B might have less liquid AUM altogether but be specifically interested in a niche like military AI or domestic US fabs.

The salient point here is that investors need to act more like individuals than factions. Investors should control large amounts of capital and have the ability to buy stakes in corporations, effectively wagering on outcomes. These stakes should pay dividends, of course, but should also give the investor some level of control over what corporations do.

The Mass Media is also an interesting idea, though again, it needs to be hyper individual. In the Megagame "Watch The Skies!" one player controls the global news network and writes hourly news briefs based on interviews with others. This role sounds quite fun, but it also allows the playerbase to generate information on what's happening in-game without having to rely on the GMs or numerical scoreboards.

The Public

This is a very interesting challenge. Phoenix's design had just three players on a team called "The Public" who, after two rounds of doing nothing but grumbling and being anxious, decided to launch some sort of coup of the US government? This conflict went unresolved, and interesting as it was, I think it failed on a design level because of the architecture of The Endgame.

The Public should, ideally, comprise many people. The game should have a considerable number of spectators who make up the public. There are lots of things that the public could do. They could control small amounts of capital or non-AI firms. They could vote in elections. They could form startup AI labs or non-state actor groups like anti-AI terrorist orgs. Pleasing the public would be very important for state actors and corporations alike, because if they inflame the public too much their positions would be in danger.

The public is the hardest thing for me to model in this game, actually. It's very difficult. My temptation is not to have one large set of "the public" but assign individual players as "labour leaders" or "human rights leaders" or "anti-AI leaders" and give them power over sets of individuals. I will need to ponder this more.

The AI

In a class by itself. See below.

Resources

Resource modelling and management in conventional board games is a solved problem. "Resource cubes" are ubiquitous in games, representing an arbitrary amount of some resource. This is one of the most obviously lacking systems in The Endgame.

In Phoenix's game, there is no attempt to model resources. Players are limited only by their imagination, their in-character motivations (e.g. "xAI wants to install AI czars in government") and, most critically, the GM's discretion. If OpenAI wants to release a new model, they can just declare that. If China wants to launch cyberattacks on the US, they can just declare that.

The catch is that there's no explicit upper limit of what an faction can do on any turn. Phoenix simply trusted the players not to godmod or metagame or act out of character. It's unintuitive to the players what the GM will allow: can I just build new silicon fabs out in the Texas desert for my action? Can I do that and something else? What happens if my action breaks the entire AI ecosystem or causes an apocalypse? What happens if my faction comes into conflict with another?

I paraphrase, but in summary Phoenix pitched The Endgame's design philosophy as "granular, but not too granular". I understand why the design needed to be lightweight (it needed to be very fast to teach), but the game would be far more education, empirically valuable, and fun if firm actions were limited by resources: "fabrication", "infrastructure", "compute", "talent", and "capital".

Fabrication of chips is a critical keystone resource. It's hard to acquire (requires a lot of talent, capital, and time), has very specific geographic starting conditions, serves as a bottleneck for the entire game, and is devastating when lost.
Infrastructure is required to maintain fabrication and compute. It requires modest capital and limited talent, and represents a key intersection between the private and public spheres.
Compute is required to operate and train models. A relatively small amount of compute is always required for upkeep, but lots of compute is needed to train the newest models. Compute expansion is a free for all (each lab must build or hire its own compute).
Talent is required (at least initially) to train models. Talent might also have secondary uses like staffing data centers or maintaining non-AI revenue streams like Facebook or Tesla.
Capital ties everything together. Capital is used to buy other resources and purchase stakes in other factions.

State actors like the US and China probably need different types of hard resources. States have regulatory power over the AI ecosystem, they can set trade and diplomatic policy, and they can go to war across the cyber, information, land, sea, and air battlespaces. These are complex systems, but I think they can somehow be integrated into the core five resources above. Capital and infrastructure translate, of course, as does "fabrication" if expanded to other means of production.

Resources are what will make the game actually function as a game. Factions need resources to train models, expand operations, go to war, and tackle maligned AI. Without even a very simple resource system, the ceiling of what OpenAI or the US can do on a given turn is too high for a useful simulation.

Geopolitics

I am reminded of the design of Friend and Henson's Global War 2025: Meltdown (2021), one of the best wargames simulating 21st century great power conflict. That game is very granular. It's like Axis and Allies if you added cyber warfare and ICBMs.

The Endgame is not primarily a game of war. But many of its mechanics will necessarily be wargame-shaped. Conflict between state actors is probably inevitable, and it is worth investing slightly in the mechanics. War surrounding Taiwan is likely, but conflict in Eastern Europe, the Middle East, and Africa also seems relevant to my mind.

However, all mechanics in The Endgame— combat included— should serve the game's core question: "what does the future of AI look like?" high-level modelling of naval combat and cyber attacks seem good (the US and China went to war early in Phoenix's game), but super granular modelling of cruise missile reserves and guided missile destroyers seems misguided. I love the design in the PC game DEFCON where sea and air combat is very important, but abstracted to fleets, carriers, fighters, and not much else.

State actors need incentives to fight, but they also need incentives to cooperate. This is where the AI comes in.

The AI

The hardest question in The Endgame is how to model "the AI" at all. In The Endgame v1 playtesting, there was some confusion within the playtesters as to whether the three-player "The AI" faction was a sentient ASI or maligned human actors abusing powerful jailbroken AI models. Disambiguation of this will be necessary— I leave the "hackers seize Mythos v10 and hack the planet" scenario to the Public, not to the AI players.

If the game starts in Q1 2027, it's probably not realistic that an autonomous ASI is already running in some infected data center somewhere. After all, much of the economic engine of The Endgame v2 would surround the race to build better and better AIs.

AI R&D should be modelled as a risk-reward system with diminishing returns. Each time a new model is developed, it earns capital for the lab but also has a possibility of becoming dangerous or maligned. That might be "this model finds 400 zero-day attacks in Crowdstrike" or it might be "this model is going to turn us all into paperclips." Perhaps the AI players start the game merely spectating, and they don't get to do much initially. As better and better AI models enter production, so too do the AI players get more and more abilities. Maybe they can cause cyberattacks, persuade humans to take actions, gain control of resources like capital and compute, gain control of military units, and so on.

We often imagine the progress of ASI as an exponential curve. This should be modelled in-game as a snowball effect: by the time that the AI faction reaches "Tier 6" or whatever, they become virtually unstoppable unless the entire world drops everything and fights them (and even then it might be too late). The AI's actual capabilities at any given moment should probably be private information.

This is a really key point of the design, actually. The fear of maligned ASI is so intense precisely because it's likely to sneak up on us. A wargame with a level playing field between humanity and the AI from turn one would be boring and unhelpful as an analog for reality. If humanity succumbs to ASI, it will happen slowly, then all at once.

Conclusion

I do not know the future of this project. After receiving comments on this write-up I will develop this into a proper testable game. My biggest concerns for this project at the moment, in no particular order, are as follows:

Scope creep
Scaling this on my own
Funding
Replicability as a product

There are so many areas on which this design could be expanded. Stock markets? Prediction markets? Rare earth minerals? Espionage? Paperclip factories?

This is to say nothing of the sheer complexity of actually implementing a decked out version of The Endgame. I would require a sizeable space with multiple display screens. I'd need to source and custom-print components. I'd probably have to vibe code some proprietary software.

These are all solvable problems. The concept is already proven, courtesy of D. Scott Phoenix. The work ahead is just building on the bones that he laid out.

This project is supposed to serve three purposes, which arguably compete against each other: empirical data, public education, and fun. Part of me just wants to pick one lane and stick with it, but at the same time I feel there is potential to achieve all three at once.

I will be developing this project over the spring. If anyone in or near Toronto is interested in a possible playtest of The Endgame v2 sometime in Q2 or Q3 2026, drop me a line!

Disclosure: I am also crossposting this on my substack for maximum reach.

11

Thoughts on AI Safety Megagame Design

11

Preface

Introduction

The Object of the Game

Roles

State Actors

Corporations

Institutions

The Public

The AI

Resources

Geopolitics

The AI

Conclusion

11

11