x
Absolute Zero: Reinforced Self-play Reasoning with Zero Data — LessWrong