Published a post about integrity as a frame for trustworthiness in AI alignment, and how it relates to Claude's new constitution.
Cross-posting as I think this might interest folk here. All feedback welcome!
In October 1962, Soviet submarine B-59 was trapped underwater by American destroyers dropping depth charges. The captain, cut off from Moscow and believing war had begun, wanted to launch a nuclear torpedo. Launch required three officers. One of them, Vasily Arkhipov, refused, and nuclear war was averted.
A few months earlier in Jerusalem, Adolf Eichmann was put on trial for organizing the logistics of the Holocaust. His defense, famously documented in Hannah Arendt’s “Eichmann in Jerusalem: A Report on the Banality of... (read 1624 more words →)
but words like "honesty" are too loose; one person may think honesty is never lying, for someone else honesty means "speaking from a place of authenticity at all times", for someone else it's something about epistemic humility.
Hence the need for a tighter type system than just strings, if we want to structurally reason over these.