Introduction
This blogpost summarises the findings of my (lite) systematic literature review of Schelling coordination in LLMs that I undertook as part of the Apart Research Fellowship.
If LLMs can identify and exploit Schelling points, they might coordinate without leaving observable traces of communication. This matters for AI safety because many proposed safety measures assume that multiple instances of a model can't effectively coordinate. If this assumption breaks down, approaches that rely on one model monitoring another become vulnerable.
This review examines what current models can actually do, what risks this poses, and when coordination capability translates into safety-relevant threats.
This blogpost makes heavy use of concepts from the AI Control research agenda, particularly the ‘untrusted... (read 2169 more words →)