Modeling the capabilities of advanced AI systems as episodic reinforcement learning — LessWrong