x
Probe accuracy and causal sensitivity diverged 3.6x at the same layer. Here's what I think is happening. — LessWrong