200 COP in MI: Analysing Training Dynamics — LessWrong