Idea: build alignment dataset for very capable models — LessWrong