x
Using an LLM perplexity filter to detect weight exfiltration — LessWrong