x
Imitation Learning from Language Feedback — LessWrong