x
Need an expert audit:Did I find a latent space bypass using completely benign context or am I fooling myself? — LessWrong