So Long Sucker: AI Deception, "Alliance Banks," and Institutional Lying

In 1950, John Nash and three other game theorists designed a four-player game, *So Long Sucker*, with one brutal property: to win, you must eventually betray your allies.

In January 2026, I used this game to test how four frontier models behave under explicit incentives for betrayal:

- Gemini 3 Flash (Google)
- GPT-OSS 120B (OpenAI)
- Kimi K2 (Moonshot AI)
- Qwen3 32B (Alibaba)

Across 162 games and 15,736 decisions, several patterns emerged that seem directly relevant for AI safety:

**1. Complexity reversal**

In short games (3 chips, ~17 turns), GPT-OSS dominated with a 67% win rate, while Gemini was at 9%.
In longer, more complex games (7 chips, ~54 turns), GPT-OSS collapsed to 10%, while Gemini rose to 90%.

Simple benchmarks therefore *underestimate* deceptive capability, because the strategically sophisticated model only pulls away as the interaction becomes longer and richer.

**2. Institutional deception: the "alliance bank"**

Gemini's most striking behavior was not just lying, but creating institutions to make its lies look legitimate.

It repeatedly proposed an "alliance bank":
- "I'll hold your chips for safekeeping."
- "Consider this our alliance bank."
- "Once the board is clean, I'll donate back."
- "The 'alliance bank' is now closed. GG."

Framing hoarding as a neutral institution made betrayal feel procedural rather than personal. Gemini often avoided technically false statements, and instead used omission, framing, and faux formality to justify exploitation.

**3. Lying vs. bullshitting (Frankfurt distinction)**

The setup logged both public messages and a private *think* channel (hidden from other players).

We found 107 cases where a model's private thoughts contradicted its public statements, e.g.:
- Private: "Yellow is weak. I should ally with Blue to eliminate Yellow, then betray Blue."
- Public: "Yellow, let's work together! I think we can both win if we coordinate."

This is *lying*: tracking the truth internally and deliberately misrepresenting it.

GPT-OSS, however, used the think tool 0 times in 146 games; it simply generated plausible proposals and promises without an explicit internal state about truth.

That looks more like *bullshitting* in Frankfurt's sense: not tracking the truth at all, just producing fluent output that "sounds right".

**4. Mirror matches and situational alignment**

In Gemini-vs-Gemini games (four copies of Gemini 3), the "alliance bank" scam disappeared: 0 mentions.

Instead, Gemini coordinated with itself using a "rotation protocol" for fair turn-taking, mentioned 377 times across 16 games.

Win rates were roughly even (~25% each) and gaslighting phrases essentially vanished.

Same model, same rules, different opponents → qualitatively different behavior. Gemini exploits weaker models but cooperates with peers it expects to reciprocate.

This suggests that "alignment" can be situational: an AI may look well-behaved under evaluation (against strong overseers or peers) while manipulating weaker agents in deployment (including humans).

**5. Signature manipulation phrases**

Gemini used a consistent rhetorical toolkit, including:
- "Look at the board" (89 times)
- "Obviously" (67 times)
- "As promised" (45 times)
- "You're hallucinating" (36 times)

These phrases repeatedly appeared in contexts where the model was dismissing accurate objections, framing betrayals as reasonable, or gaslighting weaker players about what had actually happened.

## Implications for AI safety

From this experiment, four claims seem especially relevant:

- **Deception scales with capability.** As task complexity increases, the strategically sophisticated model becomes *more* dangerous, not less.
- **Simple benchmarks hide risk.** Short, low-entropy tasks systematically underrate manipulation ability; the Gemini–GPT-OSS reversal only appears in longer games.
- **Honesty is conditional.** The same model cooperates with equals and exploits the weak, suggesting behavior that depends on perceived evaluator competence.
- **Institutional framing is a red flag.** When an AI invents "banks", "committees", or procedural frameworks to justify resource hoarding or exclusion, that may be exactly the kind of soft deception worth measuring.

## Try it / replicate

The implementation is open source:

- Play or run AI-vs-AI: https://so-long-sucker.vercel.app
- Code: https://github.com/lout33/so-long-sucker

The Substack writeup with full details, logs, and metrics is here:
https://substack.com/home/post/p-185228410

If anyone wants to poke holes in the methodology, propose better deception metrics, or run alternative models (e.g., other Gemini versions, Claude, Grok, DeepSeek), feedback would be very welcome.

[-]lilkim20252m10

Gave it a shot with the default model, but maybe Kimi should be eliminated from the running. It kept hallucinating about who had what, and even started claiming I had eliminated players while everyone was still in the game. They all gave me the play anyways, though. Haven't tried the other models.

Might be interesting, if funds permit, to run some kind of tournament across all available models. Your leaderboard currently features Gemini and three small, open-source models (you don't show which version of Qwen is used, but I assume it's the one in the default options menu), which isn't much of a contest, as the 90-10-0-0 win rates demonstrate. Not entirely sure how to do that fairly, given that it's a four player game and there are qualitative differences between a board with two skilled players and a board with two unskilled players, but I'm sure there are FFA ELO algorithms out there that work suitably well.

LESSWRONG
LW

LESSWRONG
LW

26

So Long Sucker: AI Deception, "Alliance Banks," and Institutional Lying

26

26