x
Tensor-Transformer Variants are Surprisingly Performant — LessWrong