Agents detecting agents: counterfactual versus influence — LessWrong