On the technical side, this means developing evaluation infrastructure for both short-term and long-term effects of AIs on human psychology; this will require realistically simulating human impacts in silico to create scalable evaluations, plus large-scale recruitment for human subjects studies to establish ground truth and measure long-term effects
I don’t know whether this is feasible or not.
As AI systems become more capable, the cognitive security of humans will be increasingly at risk. By cognitive security, I mean the ability of humans to maintain control over their beliefs and actions.
Cognitive security could be compromised in several ways: AI could become very good at persuading people of arbitrary positions; interacting with AI could lead humans to lose touch with reality; and AIs could become very effective at blackmail or at producing extremely convincing false information.
We are already seeing this happen:
Right now, many of these effects fall on people who were already vulnerable, like children, the elderly, or those with pre-existing mental health issues. However, this is not entirely the case: the Arup employee was a typical finance professional, for instance, and AI psychosis appears to have affected a well-respected OpenAI investor. My expectation is that as AI systems become more capable, more and more people will be vulnerable---in the worst case, everyone.
Indeed, there are strong conceptual reasons to expect cognitive security issues to get worse, many of which I've discussed before in the context of emergent deception:
In addition to these intrinsic properties, many external parties have an incentive to exploit cognitive vulnerabilities created by AI: governments who want to control their citizens, developers who want to increase engagement, and advertisers who want to drive purchasing outcomes.
For all these reasons, I expect cognitive security to be an important cause area for AI safety. It is also an area where AI safety advocates have potent allies: cognitive security is already a salient present-day issue for the safety of children, which constitutes a powerful political coalition in the U.S. Child safety advocates were the main group that blocked the 10-year moratorium on state AI regulation, and I expect them to also be an important part of the coalition pushing for independent evaluations of AI systems.
And there is a fairly direct through-line from these present-day concerns to more existential future concerns: if adults are exploitable by AI, then children will be as well, and the required institutional capacity (such as strong evaluation regimes) is often the same across both cases.
In summary, there should be a concerted push to evaluate and improve human cognitive security in the face of AI. On the technical side, this means developing evaluation infrastructure for both short-term and long-term effects of AIs on human psychology; this will require realistically simulating human impacts in silico to create scalable evaluations, plus large-scale recruitment for human subjects studies to establish ground truth and measure long-term effects. On the policy side, this means meaningfully independent evaluations of AI systems for cognitive security risks; transparency about training incentives and safety-relevant behaviors (particularly in long conversations); and clearer liability law for AI-caused harms. This is an area with complex technical challenges for evaluation, but unusual political will, making it a great lever for AI governance.
The average human speaks 15,000 words per day; conversatively estimating each message is 10 words, 2.5B messages = 1.7M days = 4500 years. ↩︎
The canonical term is "identity formation" (Erikson, 1968); the related concept of the "capacity to be alone" is from Winnicott (1958). See McVarnock et al. (2023) for a modern review of how solitude supports identity formation in adolescence. ↩︎