Angelo Huang

Message

Explorer

4mo

Angelo Huang

Explorer

The Multi-Agent Minefield: Can LLMs Cooperate to Avoid Global Catastrophe?

Angelo Huang3mo30

Yes! In this current setup, they don't communicate, great pointing that out! But we wanted to focus on studying this specific setting really well. One very much interesting thing we had was seeing models able to coordinate themselves without any communication! That was a really high rate, compared to chance. And leads to schelling points and ideas in that direction. Check out, for example, https://www.arxiv.org/abs/2601.22184, which we found very similar to this discovery of implicit coordination.

Regarding communication, yes, it helps, good intuition 😉; w... (read more)

The Multi-Agent Minefield: Can LLMs Cooperate to Avoid Global Catastrophe?

Zhijing Jin, Thao Amelia Pham, TerryJCZhang, pepijn_cobben, Angelo Huang, Isabel Dahlgren, Jacob Brinton

3mo

ArXiv paper here.

Most AI safety research asks a familiar question: Will a single model behave safely? But many of the risks we actually worry about – including arms races, coordination failures, and runaway competition – don’t involve one single AI model acting alone. They emerge when multiple advanced AI systems interact.

This post summarizes the findings of GT-HarmBench, a paper that shifts the lens of AI safety from isolated agents to multi-agent strategic interaction – multi-agent safety. Instead of asking whether an LLM makes good decisions in a vacuum, we ask a more deliberate question: can LLMs coordinate with each other when cooperation is the only way to avoid disaster?

TL;DR

The Problem: LLMs are increasingly being used to support decision-makers IRL. In high-stakes scenarios, an overreliance on LLMs could

...

(Continue Reading - 1390 more words)