x

LESSWRONG

LW

Jacob Brinton — LessWrong

Jacob Brinton

Jacob Brinton

Message

11

1

1y

Jacob Brinton

11

1y

The Multi-Agent Minefield: Can LLMs Cooperate to Avoid Global Catastrophe?

by Zhijing Jin, Thao Amelia Pham, TerryJCZhang, pepijn_cobben, Angelo Huang, Isabel Dahlgren, and Jacob Brinton

ArXiv paper here. Most AI safety research asks a familiar question: Will a single model behave safely? But many of the risks we actually worry about – including arms races, coordination failures, and runaway competition – don’t involve one single AI model acting alone. They emerge when multiple advanced AI...