x

JMJ

Subscribe

Message

21

1

3mo

JMJ

Subscribe

Message

21

1

3mo

Emergent Misalignment and the Anthropic Dispute

JMJ2mo30

Thanks so much for the comment!

Compartmentalisation is definitely a possible route but we suspect there would be limits to how effective it could be here. It seems likely that some sub-tasks in a mass surveillance pipeline would be difficult to fully decompose into benign prompts. Doing things like building relationship graphs between individuals plausibly involves the model processing and acting on private information in ways that look like surveillance even at the level of individual queries.

Assuming compartmentalisation is feasible, the models within th... (read more)

Reply

Emergent Misalignment and the Anthropic Dispute

21

henryc, JMJ

2mo

TL;DR: We think allowing frontier AI models to be used for mass domestic surveillance and to operate as fully autonomous weapons creates significant risks of emergent misalignment.

For those somehow unaware, the Department of War and Anthropic have had a recent dispute over the use of Claude, leading to Anthropic being designated as a "supply-chain risk" on February 27, 2026. The dispute arises over two restrictions that Anthropic insisted on maintaining in its military contracts. These restrictions prohibit the use of Claude for:

Mass domestic surveillance.
Fully autonomous weapons.

Much has been written about the undesirability of these particular use cases, but we think a neglected area of the discourse is the risk of emergent misalignment from training frontier systems for these purposes. At the time of writing, this position still...

(Continue Reading - 1286 more words)