Knife Alignment: Why Moralizing Tools Fails as a Safety Strategy
I’m proposing a simple failure mode that I think is under-emphasized in public “AI ethics” talk: text-level moralization doesn’t reliably constrain action-level behavior once you have tool use and middleware. The post offers a boundary/permissioning framing and three minimal tests to make the claim falsifiable. Claim A large chunk of...