Latent Adversarial Training (LAT) Improves the Representation of Refusal — LessWrong