[AN #155]: A Minecraft benchmark for algorithms that learn without reward functions — LessWrong