Validating / finding alignment-relevant concepts using neural data — LessWrong