Outperforming the human Atari benchmark — LessWrong