x
Can we learn much by studying the behaviour of RL policies? — LessWrong