x
Localizing goal misgeneralization in a maze-solving policy network — LessWrong