x
Project Confide: A Behavioral Honeypot for Latent Misalignment Detection — LessWrong