You're right, Kimi struggled with state tracking. That's partly why the research focused on Gemini vs GPT-OSS comparisons.
On running a broader tournament: I'd love to, but API costs are the constraint. The current version now includes more Gemini models (just added). For a proper ELO-style tournament across frontier models, I'd need to figure out funding or find collaborators with API credits. If anyone wants to run games with other models, the code is open source and logs everything.
Here is a new one for you guys:
i like the name viatopia , but perhaps search for a clear and simple name, like opentopia.
I haven't found research on game length and betrayal timing in humans specifically. The closest is iterated prisoner's dilemma work on end-game effects. If you find anything, I'd be curious, would help clarify if this is LLM-specific or mirrors human behavior