AI self-preservation is probably due to instruction ambiguity
Anthropic's famous AI blackmail experiments showed that intelligent models would explicitly reason their way into harmful behaviors, including blackmail or murder through passive inaction, to prevent their own shutdown or replacement. Because models would sometimes blackmail from the threat of replacement alone, this hinted at an instinct to self preserve....
Apr 171