200 COP in MI: The Case for Analysing Toy Language Models — LessWrong