Self-prediction acts as an emergent regularizer — LessWrong