x
Reasoning and learning about injected concepts in language models — LessWrong