Try training token-level probes — LessWrong