Test your best methods on our hard CoT interp tasks
Authors: Daria Ivanova, Riya Tyagi, Josh Engels, Neel Nanda Daria and Riya are co-first authors. This work was done during Neel Nanda’s MATS 9.0. Claude helped write code and suggest edits for this post. Most of our tasks fall in 3 categories: predicting future actions, detecting the effect of an...