x
Relaxed adversarial training for inner alignment — LessWrong