x
Misalignment tokens: A complement to blinded CoT RLHF? — LessWrong