Beware of black boxes in AI alignment research — LessWrong