Measuring Coherence and Goal-Directedness in RL Policies — LessWrong