Transparency for Generalizing Alignment from Toy Models — LessWrong