Why do we need RLHF? Imitation, Inverse RL, and the role of reward — LessWrong