I’ve been working on an alignment framework that tries to address a gap I kept running into when reading existing work: most approaches either assume fixed values (reward functions, constitutions) or allow learning without a clear notion of identity continuity. The core idea is to represent values explicitly as a...