x
Gradient-free Single-pass Model Beats nanoGPT on Shakespeare — LessWrong