Failure modes in a shard theory alignment plan — LessWrong