JMJ
Message
21
1
TL;DR: We think allowing frontier AI models to be used for mass domestic surveillance and to operate as fully autonomous weapons creates significant risks of emergent misalignment.
For those somehow unaware, the Department of War and Anthropic have had a recent dispute over the use of Claude, leading to Anthropic being designated as a "supply-chain risk" on February 27, 2026. The dispute arises over two restrictions that Anthropic insisted on maintaining in its military contracts. These restrictions prohibit the use of Claude for:
Much has been written about the undesirability of these particular use cases, but we think a neglected area of the discourse is the risk of emergent misalignment from training frontier systems for these purposes. At the time of writing, this position still...
Thanks so much for the comment!
Compartmentalisation is definitely a possible route but we suspect there would be limits to how effective it could be here. It seems likely that some sub-tasks in a mass surveillance pipeline would be difficult to fully decompose into benign prompts. Doing things like building relationship graphs between individuals plausibly involves the model processing and acting on private information in ways that look like surveillance even at the level of individual queries.
Assuming compartmentalisation is feasible, the models within th... (read more)