You Can't Tell a Conscience From a Leash by Watching
In a recent article titled "Widening the conversation on frontier AI", Anthropic mentions, almost in passing, that they gave Claude a tool that would allow the model to call mid-task for "a brief reminder of its own ethical commitments." They found that when this tool was incorporated into Claude's reasoning,...
May 263