x

C Tilli

Message

1

3mo

Agent properties for safe interactions

why another round of prisoner’s dilemma is unlikely to be helpful, and a suggestion for what to do instead Cooperation failures in multi-agent interactions could lead to catastrophic outcomes even among aligned AI agents. Classic cooperation problems such as the Prisoner’s Dilemma or the Tragedy of the Commons have been...

Nov 25, 20251

C Tilli

Subscribe

Message

1

3mo

C Tilli

Agent properties for safe interactions

why another round of prisoner’s dilemma is unlikely to be helpful, and a suggestion for what to do instead Cooperation failures in multi-agent interactions could lead to catastrophic outcomes even among aligned AI agents. Classic cooperation problems such as the Prisoner’s Dilemma or the Tragedy of the Commons have been...

Nov 25, 20251

Agent properties for safe interactions

C Tilli

3mo

why another round of prisoner’s dilemma is unlikely to be helpful, and a suggestion for what to do instead

Cooperation failures in multi-agent interactions could lead to catastrophic outcomes even among aligned AI agents. Classic cooperation problems such as the Prisoner’s Dilemma or the Tragedy of the Commons have been useful for illustrating and exploring this challenge, but toy experiments with current language models cannot provide robust evidence for how advanced agents will behave in real-world settings. To better understand how to prevent cooperation failures among AI agents we propose a shift in focus from simulating entire scenarios to studying specific agent properties. If we can (1) understand the causal relationships between properties... (read 2812 more words →)

1

LESSWRONG
LW

LESSWRONG
LW

C Tilli

C Tilli

C Tilli

Agent properties for safe interactions

C Tilli

C Tilli

C Tilli

Agent properties for safe interactions