Probability that other architectures will scale as well as Transformers? — LessWrong