Concept-anchored representation engineering for alignment — LessWrong