x
Model-driven feedback could amplify alignment failures — LessWrong