why another round of prisoner’s dilemma is unlikely to be helpful, and a suggestion for what to do instead
Cooperation failures in multi-agent interactions could lead to catastrophic outcomes even among aligned AI agents. Classic cooperation problems such as the Prisoner’s Dilemma or the Tragedy of the Commons have been useful for illustrating and exploring this challenge, but toy experiments with current language models cannot provide robust evidence for how advanced agents will behave in real-world settings. To better understand how to prevent cooperation failures among AI agents we propose a shift in focus from simulating entire scenarios to studying specific agent properties. If we can (1) understand the causal relationships between properties... (read 2812 more words →)