[AN #100]: What might go wrong if you learn a reward function while acting — LessWrong