x
Link: Let's Think Dot by Dot: Hidden Computation in Transformer Language Models by Jacob Pfau, William Merrill & Samuel R. Bowman — LessWrong