Measuring Coherence of Policies in Toy Environments — LessWrong