New safety research agenda: scalable agent alignment via reward modeling — LessWrong