State Space Methods Are a Useful Way to Think About Deceptive Alignment
> TL;DR: Detecting deceptive alignment is a latent-state inference problem. > > * Model eval-belief as a latent state with dynamics (a state-space / HMM formulation) and the problem becomes well-posed statistical inference > * This yields identifiability conditions for when distinguishing genuine from strategic alignment is possible in principle...
Mar 91