Energy-Based Transformers are Scalable Learners and Thinkers — LessWrong