Paper: Understanding and Controlling a Maze-Solving Policy Network — LessWrong