x
A model for algorithmic jailbreaks: Dual-polymorphic "shapes" and latent space non-injectivity — LessWrong