In partially observable environments, stochastic policies can be optimal — LessWrong