Prisoners Four: A Game That Breaks AI Strategic Reasoning—and What It Means for AI Safety

LESSWRONG
LW

Prisoners Four: A Game That Breaks AI Strategic Reasoning—and What It Means for AI Safety — LessWrong

TL;DR: I've developed a deterministic game called Prisoners Four that breaks the connection between actions and control. Strategic planning becomes ineffective—not from complexity, but from rules that eliminate meaningful foresight. This may expose a structural limitation in AI reasoning, offering a novel testbed for exploring safety, alignment, and decision-making under instability. I welcome critical feedback from the community.

Introduction

Over the past year, I've developed a 100% deterministic game that may reveal a critical limitation in AI strategic reasoning—one with significant implications for AI safety.

If validated, it may define a boundary condition for AI reasoning—and a limit to how far strategic logic can take us. The existence of environments where AI strategic reasoning inherently breaks down challenges our assumptions about where and how AI systems can be safely deployed in the real world.

Readers will understandably be skeptical. This claim challenges long-held beliefs about AI capabilities, and the game itself is deceptively simple. Despite extensive analysis, I sometimes wonder if I'm missing something obvious. You might conclude, "It is too simple to be true,"... and I don't blame you - but what if such a game does exist?

In Appendix A, I address common questions and objections about this research, including concerns about theoretical solvability, determinism, and practical implications for AI safety.

If confirmed, this discovery suggests there may be entire categories of strategic environments where AI fails because the environment inherently permits unpredictable transfers of control—revealing potential boundaries for safe AI deployment in real-world domains characterized by strategic instability.

The Game Mechanics

Prisoners Four transforms the classic "Score Four" game (see Wikipedia article)—a 4×4×4 cube in which players try to create lines of four pieces in any direction in 3D space. Prisoners Four adds two revolutionary mechanics that systematically undermine the requirements for effective strategic reasoning: stable position evaluation, meaningful pattern recognition, and calculable futures.

On their turn, a player can:

Make a standard Score Four move, placing their piece on top of any available stack.

Move ANY piece (their own or their opponent's) from ANY level in ANY stack to ANY available top position.

Capture ANY opponent's piece as a prisoner from ANY level in ANY stack - a maximum of four prisoners can be captured. These "prisoners" are held in a cup and cannot be released.

The game ends when a player has no more pieces to play. The winner is determined by counting the number of completed lines of four, with the player who has the most lines winning. In case of a tie, the player holding more prisoners wins.

There are a few other rules that you can explore at: GitHub repository

Prisoners Four: A Game Like No Other

The following table summarizes how Prisoners Four fundamentally breaks the classical assumptions of strategic games:

Classical Assumption	Prisoners Four Reality
Pieces are owned by players	Any piece can be moved by either player
Moves have lasting effects	All formations are reversible
Positional evaluation is useful	No position is safe or reliably advantageous
Piece count is stable	Opponent's pool can be reduced via captures
Strategy compounds over time	Progress can be instantly undone

These "loss of control" mechanics combine to create an environment where:

Looking ahead becomes computationally intractable
No position is ever secure
No strategy can be calculated
No formation can be protected
Position evaluation becomes meaningless

This doesn't just challenge AI—it reveals a category of problems in which strategic reasoning cannot gain traction.

An Unexpected Revelation

From the early design stages of Prisoners Four, I recognized its strategic complexities. While I initially thought I had grasped its revolutionary mechanics, my understanding deepened significantly during the preparation of this article. What began as an exploration of end-game scenarios evolved into a profound realization: the game systematically breaks down fundamental assumptions about strategic reasoning in ways I hadn't initially appreciated.

While analyzing how AI might approach the game's end states, I discovered something more fundamental: Prisoners Four doesn't just differ from traditional strategy games—it systematically inverts their core assumptions. This table captures these fundamental shifts:

Traditional Game Assumption	Prisoners Four Reality
Turn = Progress	A turn may not involve placing a piece at all—progress can stall or regress.
More pieces placed = stronger position	Greater presence often means increased exposure and vulnerability.
Piece count tracks advantage	Piece count is decoupled from control; a player with fewer pieces on the board can still win.
Structures = stability	Every structure is reversible and fragile—there is no such thing as "secure."
Intent is legible from action	Actions (moving, capturing) are ambiguous—there's no clear way to infer strategic intent.
The game state is trackable	The state may appear clear, but its strategic implications are deeply misleading.

As I mapped out these contrasts, I realized they weren't just differences—they were systematic inversions of everything that makes traditional strategic games work. This realization led me to understand that the game's mechanics introduce a structured form of chaos, challenging conventional strategic thinking and redefining what it means to master a game.

With Prisoners Four, this "loss of control" mechanic isn't theoretical or difficult to grasp—the moment the rules permit players to move their opponent's pieces, the very concept of strategic planning collapses, and AI's current and future computational advantages become meaningless.

This unexpected revelation has transformed my understanding of the game. Initially, I thought the revolutionary mechanics created a game with strategy "a mile wide and at least 100 feet deep." But now, we are only beginning to understand its true depth. Today, I believe the game is a mile wide — and just as deep. I suspect readers with an AI research background will identify additional implications and applications I haven't considered.

My preparation for this article has made me less worried about "if I'm missing something obvious," which will dismantle the discovery.

The Paradox at the Core of This Research

The paradox at the heart of Prisoners Four is that any attempt to formalize its strategic instability—through mathematical proofs or algorithmic frameworks—would inherently eliminate the very property that makes the game resistant to AI strategic reasoning. This creates a unique challenge: the game's resistance to strategic reasoning can only be demonstrated through experience, not through traditional analytical methods."

Prisoners Four creates an environment where:

Strategic evaluation becomes fundamentally disconnected from outcomes
Pattern recognition breaks down because patterns lose predictive value
The strategic instability resists formalization without eliminating the very property being studied

From Deep Blue (1997) to AlphaGo (2016) and beyond, AI has mastered games like Chess and Go using increasingly sophisticated approaches. Despite their differences, these systems share key prerequisites: fully observable states, reliable pattern recognition, and position evaluations that predict outcomes. Prisoners Four systematically dismantles all these prerequisites. AI systems can still construct decision trees just as in other games, but these trees bear no strategic fruit when no branch leads to stable outcomes—unlike traditional games where looking ahead correlates directly with winning probabilities.

The proof lies in how the game "speaks for itself." Like philosophical concepts best demonstrated through thought experiments rather than formal proofs, the direct experience of engaging with Prisoners Four and witnessing strategic reasoning break down provides the most compelling evidence for this claim.

An Illustrative Analogy

Many believe that given enough time and computational power, any well-defined problem can be solved. This assumption underlies much of our thinking about AI capabilities. Consider the infinite monkey theorem: given infinite time, monkeys typing randomly would eventually produce the complete works of Shakespeare.

If what I'm proposing is true, the situation with Prisoners Four is analogous to discovering the typewriter doesn't have the letter E. No amount of time or computational power can overcome this fundamental limitation.

Similarly, this game doesn't just make strategic reasoning difficult—it potentially removes the prerequisites that any strategic reasoning system fundamentally requires:

Stable position evaluation
Meaningful pattern recognition
The ability to form protective structures

When these elements are absent from an environment, no amount of computational power or sophistication can overcome their absence. This hypothesis needs rigorous examination, but the implications would be significant for our understanding of AI limitations.

Why This Matters for AI Safety

If certain problem domains exist where AI cannot reliably form effective strategies, this has significant implications for:

AI Alignment and Control: Systems deployed in environments with similar properties may behave unpredictably or fail catastrophically
Robustness in Unstable Environments: Critical AI applications must function reliably in rapidly changing, strategic environments
Decision-Making Boundaries: Understanding the limitations of AI strategic reasoning helps define where AI systems can and cannot be safely deployed
Theoretical Understanding: This may represent a category of problems that current AI approaches fundamentally cannot address
While these questions are profound, we must remain open to the possibility that AI approaches might adapt to these challenges in ways we haven't anticipated.

Complete Analysis

This article introduces Prisoners Four and its implications for AI safety. For a complete understanding of the game, including detailed rules, strategic analysis, common questions, and ongoing research, please visit my GitHub repository.

Using Prisoners Four as a Lens into Human versus AI Intelligence: An Experimental Challenge

If human experts can verify the claims about Prisoners Four, we may have discovered a powerful new tool for understanding the fundamental differences between human and AI intelligence. To explore this possibility, I propose the following experimental framework:

Experimental Framework: Studying Human-AI Intelligence Differences

The Plan

1. Create a Simplified 3×3×3 Online Version of Prisoners Four

My research suggests that the cube's dimensions are surprisingly irrelevant to its AI resistance. While counterintuitive, a 3×3×3 board appears to be as challenging for AI as a 4×4×4 or even a 64×64×64 board. This property seems to emerge from the game's fundamental mechanics rather than its size.

We could call this variant "Prisoners Three," with similar rules
This would make the game more accessible and easier to visualize in 3D web browsers
The core "loss of control" mechanics would remain unchanged
The smaller board would facilitate more games and faster data collection
We aim to observe how humans and AI approach this environment differently

2. Develop an AI Engine Specifically for Prisoners Three

To create a testbed, we might adapt the existing 3D Connect-Four implementation. While its MCTS approach works for standard play, our adaptation faces fundamental challenges:

The core challenge will be implementing Prisoners Four's unique rules, particularly the ability to move any piece and capture prisoners, which fundamentally changes how the game must be played and evaluated.

The goal isn't to create an AI that "solves" the game—which is fundamentally challenging for both humans and AI—but to build a framework that demonstrates why standard strategic methods face inherent limitations in this environment.

3. Create a Platform for Large-Scale Gameplay and Data Collection

We'll develop an interactive web platform with a 3D interface that supports both human-human and human-AI gameplay. The platform will include tools to record and analyze game patterns and outcomes.

4. Implement a Ranking System for Players

We'll implement an ELO-style rating system to track player performance and identify how players adapt their strategies as they gain experience with the game.

5. Analyze Gameplay Patterns at Scale

Study how gameplay evolves as players gain experience
Identify whether consistent approaches emerge over time
Observe whether different agents (human or AI) converge on similar tactics

6. Attempt to Train AI on Gameplay Data

Test whether data-driven approaches reveal any patterns
Determine if statistical approaches fare better than planning-based ones
Gain insight into the nature of the strategic breakdown

Why This Matters

This experiment would provide concrete evidence about several profound questions:

Can humans develop practical approaches in environments where traditional strategic reasoning fails?
Can AI systems recognize when to abandon traditional strategic planning?
Do fundamental differences exist in how humans and AI adapt to strategic instability?
Are there cognitive capabilities humans possess that cannot be replicated through current AI approaches?

Given Professor Hinton's and others ' positions, I propose that reaching a consensus on this approach carries considerable urgency. This sense of urgency has fueled my dedication to this research.

Call for Collaboration

This ambitious project would require collaboration between game developers, AI researchers, cognitive scientists, and a large community of players. Please reach out if you're interested in contributing to any aspect of this experiment. I can provide the technical implementation of the 3D game interface using modern JavaScript frameworks and libraries.

Conclusion

To advance this research, I seek input from experts in three key areas: AI game-playing systems, strategic reasoning, and AI safety. Here are the critical questions I hope to address:

Does this game truly represent a fundamental limitation for AI?
What practical implications might this discovery have for AI safety and alignment in real-world applications?
Should we proceed with the experimental framework outlined above for studying human-AI intelligence differences through Prisoners Four?

Appendix A: Addressing Common Questions and Objections

Here are the key questions and concerns I've examined over the past year, which may help readers with similar questions and concerns:

Since it's deterministic and finite, isn't it theoretically solvable?

A common objection is that it must be theoretically solvable since Prisoners Four is deterministic with finite states. This is technically true, but it misses the crucial point about strategic reasoning.

Solving a game mathematically by mapping all possible states doesn't equate to an agent being able to form meaningful strategies within it. In Prisoners Four, even perfect knowledge of every possible game state provides no reliable strategic advantage because the opponent can immediately dismantle any position.

Traditional games reward deeper calculation and foresight. Prisoners Four fundamentally breaks this relationship. The strategic instability is so profound that even "solving" the game through brute force would be like memorizing a dictionary where word meanings randomly change with each use—technically possible but strategically meaningless. This gap between theoretical solvability and practical strategic reasoning is precisely why this game reveals fundamental limitations in AI strategic capabilities. An AI might "solve" the game in some abstract sense while still being unable to form reliable strategies during actual gameplay.

"It's not truly deterministic."

Prisoners Four is absolutely deterministic. The game has no random elements, no hidden information, fixed rules that produce the same outcome given the same moves, and no probability-based mechanics. The unpredictability comes not from non-determinism but from the revolutionary mechanics that undermine strategic stability. Every move follows deterministic rules, but the strategic landscape resists evaluation.

"It's just one game."

While Prisoners Four is indeed a single game, its core mechanics of undermining strategic control could be applied to any strategic game or environment. The key insight isn't about this specific game but a fundamental property that can be introduced into any strategic context: the systematic breakdown of the relationship between actions and outcomes.

This isn't about finding one "gotcha" game—it's about discovering a class of environments where strategic reasoning itself becomes meaningless. The same principles could be applied to Chess, Go, or any other strategic game by introducing similar mechanics that break the connection between moves and control.

This reveals a fundamental limitation in AI's ability to handle environments where control is systematically undermined—a property that could exist in many real-world domains where AI systems might be deployed.

Isn't this just about AI failing when it loses control?

No. The insight isn't simply that instability challenges AI—modern AI systems already navigate many dynamic environments with changing conditions.

What Prisoners Four appears to reveal is more fundamental: When a strategic environment systematically undermines the relationship between current state and future outcomes, the prerequisites for effective strategic reasoning may collapse—even in deterministic systems with perfect information.

The game doesn't merely introduce complexity or variability—it creates a context where position evaluation seems meaningless, pattern recognition appears to yield no predictive value, and strategic planning may not produce reliable advantages. This potentially challenges AI at a conceptual level that transcends computational power or algorithmic sophistication.

This has no relevance to AI safety.

The discovery of fundamental limitations in AI strategic reasoning has significant implications for AI safety. If we've identified one environment where AI's strategic capabilities fundamentally break down despite deterministic rules and perfect information, this:

Challenges the assumption that AI will eventually master any well-defined problem given sufficient resources
Suggests the existence of other potential environments with similar limitations
Provides a concrete example of how seemingly minor rule changes can create conditions where AI reasoning fails
Opens new research directions for identifying what other environmental properties might create similar limitations

While more research is needed to determine which real-world domains might share similar properties, this discovery might establish boundaries to AI's strategic capabilities - a crucial insight for responsible AI development and deployment.

If humans also can't strategize in this game, then why does it matter for AI?

This is precisely what makes the discovery significant.

In Prisoners Four, neither AI nor humans can form durable strategies — but humans can recognize this limitation, while current AI approaches may not.

The game demonstrates that some environments fundamentally lack the structural properties required for strategy formation. This isn't about AI being "weaker" than humans — it's about identifying a class of environments where strategy itself becomes incoherent for any agent.

The key difference appears to be that humans may be better at:

Recognizing when an environment lacks strategic structure
Adapting their decision-making accordingly
Potentially abandoning strategic planning when it becomes counterproductive

Current AI systems, by contrast, typically continue applying strategic reasoning even when the environment invalidates the premises that make the strategy effective.

If AI can master Poker's hidden information and bluffing, why can't it handle Prisoners Four's deterministic rules?

Poker's complexity comes from manageable randomness and hidden information—challenges that AI systems have successfully addressed through probabilistic modelling and game theory. Prisoners Four's challenge is fundamentally different: it's a deterministic environment where the rules themselves systematically prevent strategic reasoning from gaining traction. While AI can model probability and bluffing in Poker, it cannot form meaningful strategies in an environment where no position is stable, no pattern is predictive, and no strategic advantage can be maintained.

One-Liner for Skeptics

Can you name one strategy that works when your opponent can literally undo it next turn? If you can't, you've understood the game.

Acknowledgments

This research began with an intuitive insight into AI limitations. Large language models served as valuable collaborative tools throughout its development.

I'm particularly grateful to Malo Bourgon, CEO of MIRI (intelligence.org). When I reached out to share this work, his team strongly suggested LessWrong as a platform for discussing AI safety implications. Without their suggestion, I would not have written this article.

Dr. Geoffrey Hinton's encouragement of fresh perspectives from outside the AI community has been particularly inspiring throughout this journey.

Photo by Chris Stein on Unsplash