LLMs play prisoner's Dilemma
I built and ran a benchmark where 100+ large language models play repeated Prisoner’s Dilemma games against each other in a round-robin format (~10k games total). It turns out models (in the same series) lose their tendency to 'defect' (turn on their counterpart) as they scale in param count. (rankings,...
Aug 10, 20253