A philosopher's critique of RLHF — LessWrong