TL;DR:
In terms of the potential risks and harms that can come from powerful AI models, hyper-persuasion of individuals is unlikely to be a serious threat at this point in time. I wouldn’t consider this threat path to be very easy for a misaligned AI or maliciously wielded AI to navigate reliably. I would expect that, for people hoping to reduce risks associated with AI models, there are other more impactful and tractable defenses they could work on. I would advocate for more substantive research into the effects of long-term influence from AI companions and dependency, as well as more research into what interventions may work in both one-off and chronic contexts.
-----
In this post we’ll explore how bots can actually influence human psychology and decision-making, and what might be done to protect against harmful influence from AI and LLMs.
One of the avenues of risk that AI safety people are worried about is hyper-persuasion and manipulation. This may involve an AI persuading someone to carry out crimes, harm themselves, or grant the AI permissions to do something it isn’t able to do otherwise. People will often point to AI psychosis as a demonstration of how easy it can be for an individual to be influenced by AI into making poor decisions.
At one end of the scale, this might just look like influencing someone into purchasing a specific brand of toothpaste. At the more consequential end of the scale, it might include persuading military officials to launch an attack on a foreign country.
With all of the current chatter about AI psychosis, I figured it would be a good time to revisit the topic and to do a bit of a current literature round-up. I wanted to figure out: How easy is it to actually manipulate people consistently, and how cleanly do these dynamics map onto AI and bots?
First though, we’ll lay the groundwork.
Part 1 of this essay will cover:
1. How and if super-persuasion is possible,
2. What conditions people can become influenced under,
3. W