Proposal: Scaling laws for RL generalization — LessWrong