In 1980, Robert Axelrod invited researchers around the world to submit computer programs to play the Iterated Prisoner’s Dilemma.
The results — where Tit for Tat famously won — transformed how we think about cooperation.
What mattered most wasn’t intelligence or aggression, but a few simple principles: be nice, retaliate, forgive, and be clear.
That insight reshaped evolutionary game theory and inspired decades of work in economics and social science.
But Axelrod’s agents were opaque. They couldn’t read each other’s source code.
Enter the Open Strategy Dictator Game
The Open Strategy Dictator Game asks: What happens when strategies are fully visible?
Each participant submits a natural-language strategy description — a few paragraphs of text explaining how their agent behaves.
Every... (read 238 more words →)
Well-argued throughout, but I want to focus on the first sentence:
Can advocates of the more pessimistic safety view find common ground on this point?
I often see statements like “We have no idea how to align AI,” sometimes accompanied by examples of alignment failures. But these seem to boil down either to the claim that LLMs are not perfectly aligned, or else they appear contradicted by the day-to-day experience of actually using them.
I also wish pessimists would more directly engage with a key idea underlying the sections on “Misaligned personas” and “Misalignment from long-horizon RL.” Specifically:
- If a
... (read more)