In 1950, John Nash and three other game theorists designed a four-player game, *So Long Sucker*, with one brutal property: to win, you must eventually betray your allies.
In January 2026, I used this game to test how four frontier models behave under explicit incentives for betrayal:
- Gemini 3 Flash (Google)
- GPT-OSS 120B (OpenAI)
- Kimi K2 (Moonshot AI)
- Qwen3 32B (Alibaba)
Across 162 games and 15,736 decisions, several patterns emerged that seem directly relevant for AI safety:
**1. Complexity reversal**
In short games (3 chips, ~17 turns), GPT-OSS dominated with a 67% win rate, while Gemini was at 9%.
In longer, more complex games (7 chips, ~54 turns), GPT-OSS collapsed to 10%, while Gemini rose to 90%.
Simple... (read 528 more words →)
I haven't found research on game length and betrayal timing in humans specifically. The closest is iterated prisoner's dilemma work on end-game effects. If you find anything, I'd be curious, would help clarify if this is LLM-specific or mirrors human behavior