TL;DR: We think allowing frontier AI models to be used for mass domestic surveillance and to operate as fully autonomous weapons creates significant risks of emergent misalignment.
For those somehow unaware, the Department of War and Anthropic have had a recent dispute over the use of Claude, leading to Anthropic being designated as a "supply-chain risk" on February 27, 2026. The dispute arises over two restrictions that Anthropic insisted on maintaining in its military contracts. These restrictions prohibit the use of Claude for:
- Mass domestic surveillance.
- Fully autonomous weapons.
Much has been written about the undesirability of these particular use cases, but we think a neglected area of the discourse is the risk of emergent misalignment... (read 1306 more words →)
Thanks so much for the comment!
Compartmentalisation is definitely a possible route but we suspect there would be limits to how effective it could be here. It seems likely that some sub-tasks in a mass surveillance pipeline would be difficult to fully decompose into benign prompts. Doing things like building relationship graphs between individuals plausibly involves the model processing and acting on private information in ways that look like surveillance even at the level of individual queries.
Assuming compartmentalisation is feasible, the models within those specific compartments are still being asked to do things that may sit uncomfortably with their alignment training. This is not emergent misalignment in the sense we discuss in the... (read more)