Kolmogorov complexity makes reward learning worse — LessWrong