x
Grokking in Multi-Probe DPO: When the Model Learns Something Your Probes Can't See — LessWrong