In terms of the potential risks and harms that can come from powerful AI models, hyper-persuasion of individuals is unlikely to be a serious threat at this point in time. I wouldn’t consider this threat path to be very easy for a misaligned AI or maliciously wielded AI to navigate reliably. I would expect that, for people hoping to reduce risks associated with AI models, there are other more impactful and tractable defenses they could work on. I would advocate for more substantive research into the effects of long-term influence from AI companions and dependency, as well as more research into what interventions may work in both one-off and chronic contexts.
-----
In this post we’ll explore how bots can actually influence human psychology and decision-making, and what might be done to protect against harmful influence from AI and LLMs.
One of the avenues of risk that AI safety people are worried about is hyper-persuasion and manipulation. This may involve an AI persuading someone to carry out crimes, harm themselves, or grant the AI permissions to do something it isn’t able to do otherwise. People will often point to AI psychosis as a demonstration of how easy it can be for an individual to be influenced by AI into making poor decisions.
At one end of the scale, this might just look like influencing someone into purchasing a specific brand of toothpaste. At the more consequential end of the scale, it might include persuading military officials to launch an attack on a foreign country.
With all of the current chatter about AI psychosis, I figured it would be a good time to revisit the topic and to do a bit of a current literature round-up. I wanted to figure out: How easy is it to actually manipulate people consistently, and how cleanly do these dynamics map onto AI and bots?
First though, we’ll lay the groundwork.
Part 1 of this essay will cover:
How and if super-persuasion is possible,
What conditions people can become influenced under,
What interventions can protect people from undue manipulation.
Then, we’ll look at the research on AI and bots specifically.
Part 2 of the essay will then cover what research currently exists about AI/bot manipulation and what potential interventions exist.
First, before we start worrying about the effects and countermeasures, let’s establish:
Does the manipulation thing actually happen? In what sense does psychological manipulation occur, and through what techniques?
Seven principles of influence
Robert Cialdini is probably the best known psychologist on the topic of persuasion and influence. He described seven principles of influence that (likely) represent the most widely cited framework in persuasion research. These are: Social proof/conformity, authority/obedience, scarcity effects, commitment/consistency, reciprocity, liking, and unity.
Empirically, these fall into two tiers. Tier 1 are those that solidly replicate while Tier 2 demonstrate concerning fragility.
In Tier 1, we do see substantive support for the following principles:
Social proof (conformity) has the strongest empirical foundation by far. A meta-analysis by Bond (2005) across 125 Asch-type studies found a weighted average effect size of d = 0.89, a large effect by conventional standards. Recent replication by Franzen and Mader (2023) produced conformity rates virtually identical to Asch’s original 33%, which would make it one of psychology’s most robust findings. Though, research also suggests cultural and temporal variables moderate conformity. [1]
Authority/obedience is also fairly robust. Haslam et al.’s (2014) synthesis across 21 Milgram conditions (N=740) found a 43.6% overall obedience rate. In 2009 Burger et al did a partial replication which produced rates “virtually identical to those Milgram found 45 years earlier.”
Barton et al. (2022) did a meta-analysis which looked at 416 effect sizes and found that scarcity modestly increase purchase intentions. The type of scarcity matters, depending primarily on product type: demand-based scarcity is most effective for utilitarian products, supply-based for non-essential products and experiences. [2]
The remaining Tier 2 principles have less support. Commitment/consistency effects are heavily moderated by individual differences. Cialdini, Trost, and Newsom (1995) developed the Preference for Consistency scale precisely because prior research on psychological commitment had been shoddy, but unfortunately their own consistency model doesn’t fair much better. [3]
Reciprocity shows mixed results. Burger et al. (1997) found effects diminished significantly after one week, suggesting temporal limitations rarely discussed in popular treatments. Liking has surprisingly little direct experimental testing of its compliance-specific effects. Finally, Unity (Cialdini’s recently added seventh principle) simply lacks sufficient independent testing for evaluation.
Moving on from Cialdini’s research, other commonly identified persuasion tactics include:
1) Sequential compliance techniques
2) Ingratiation
3) Mere exposure
Sequential Compliance
Sequential compliance is a category of persuasion tactics where agreeing to a small initial request increases the likelihood of compliance with larger, more significant requests. The mechanism is thought to be: Compliance changes how the target views themselves, or their relationship with the requester, making subsequent requests harder to refuse.
There’s pretty modest effect sizes upon meta-analysis.
The success of Foot-in-the-door (FITD) appears to be small and highly contextual. Multiple meta-analyses find an average effect size of around r = .15–.17, which explains roughly 2–3% of variance in compliance behavior. That’s small even by the already-lenient standards social psychology typically applies. Another meta-analysis finds nearly half of individual studies show nothing or a backfire. [4]
Door-in-the-face (DITF) actually shows some successful direct replication by Genschow et al. (2021), with compliance rates of 34% versus 51%. That approximately matches Cialdini’s original study. However, we should be aware that Feeley et al.’s (2012) meta-analysis revealed a crucial distinction. DITF increases verbal agreement but “its effect on behavioral compliance is statistically insignificant.” In other words, people often say “yes” but don’t follow through. [5]
Low-ball technique shows r = 0.16 and OR = 2.47 according to Burger and Caputo’s (2015) meta-analysis, which implies that it’s reliable under specific conditions (public commitment, same requester, modest term changes)... But the practical effect sizes are just far smaller than intuition suggests. [6]
Sycophancy and ingratiation
Can you just flatter your way to getting what you want?
Research on flattery and obsequiousness reveals moderate effectiveness, with heavy variance dependent on context, transparency, and skill.
Gordon’s (1996) meta-analysis of 69 studies found ingratiation increases liking of the ingratiator. Importantly, this increased liking did not necessarily translate into acquiring rewards. In other words, greater likability != greater success.
A more comprehensive meta-analysis (Higgins et al., 2003) looking at various influence tactics found ingratiation had modest positive relationships with both task-oriented outcomes (getting compliance) and relations-oriented outcomes (being liked), but the actual effects seemed brief and impact small. These are in work contexts looking at flattery and job related rewards. Outside of work contexts, there’s some research that finds compliments can potentially be effective at getting compliance, but only under pretty specific conditions. [7]
There’s also a notable limitation here: It depends on ignorance. Ingratiation involves “manipulative intent and deceitful execution,” but the actual effectiveness depends on the target not recognizing it as manipulation. If the ingratiation becomes obvious it backfires. Recent research (Wu et al., 2025) distinguishes between “excessive ingratiation” (causes supervisor embarrassment and avoidance which hurts socialization) versus “seamless ingratiation” (which remains effective). High self-monitors are (allegedly) better at deploying these tactics successfully than low self-monitors.
Integration is also pretty context dependent. Better results happen when it comes from a position of equality or dominance (downward flattery almost always works, upward flattery shows more mixed results), when it targets genuine insecurities rather than obvious strengths, in high power-distance cultures where deference is expected, and over long-term relationship building rather than immediate influence attempts.
Third parties are also likely to view this tactic pretty negatively. The tactic works best when private, not when witnessed by peers who could be competing for the same needs. Research by Cheng et al. (2023) found that when third parties observe ingratiation, they experience status threat and respond by ostracizing the flatterer (becoming polarized against the flatter-ee).
Basically, ingratiation reliably increases liking but converts to tangible rewards inconsistently. The technique requires skill, privacy, and appropriate power dynamics. Claims about manipulation through sycophancy often overstate both its prevalence and effectiveness.
Mere exposure effects
People tend to develop preferences for things merely because they are familiar with them, so perhaps an AI assistant could grow more persuasive over the long-term simply by interacting with one person a lot?
Repeated exposure to stimuli does increase positive affect, but effects are once again overblown.
Bornstein’s (1989) meta-analysis of 208 experiments found a robust but small effect. Montoya et al.’s (2017) reanalysis of 268 curve estimates across 81 studies revealed the relationship follows an inverted-U shape. So liking increases with early exposures but peaks at 10-20 presentations, then declines with additional repetition. [8]
But affinity doesn’t necessarily translate to persuadability (ever had a high-stakes argument with a close family member?)
For persuasive messages specifically, older research found message repetition produces a different pattern than mere exposure to simple stimuli. Agreement with persuasive messages first increases, then decreases as exposure frequency increases. [9] This seems supported by most recent research. Schmidt and Eisend’s (2015) advertising meta-analysis found diminishing returns: More repetition helps up to a point, but excessive exposure can create reactance. The “wear-out effect” appears faster for ads than for neutral stimuli.
The illusory truth effect, which is the tendency to believe false information after repeated exposure, appears to be real and robust. Meta-analysis finds repeated statements rated as more true than new ones (d = 0.39–0.53 Dechêne et al., 2010). Apparently, prior knowledge doesn't seem to protect against this reliably . People show "knowledge neglect" even when they know the correct answer (Fazio et al., 2015). The biggest impact happens on second exposure to the false fact, and there's diminishing returns after that. At the high-end it can backfire and people start to become suspicious (Hassan & Barber, 2021). The effect decays over time but doesn't disappear, still there but with reduced impact after one month (Henderson et al., 2021).
Illusory truth’s main boundaries are: extreme implausibility, and motivated reasoning and identity (dampens the effect for strongly held political beliefs), and one study found a null result specifically for social-political opinion statements (Riesthuis & Woods, 2023). Real-world generalization, especially to high-stakes political beliefs, remains underexamined (Henderson et al., 2022 systematic map).
…In summary, it would appear it’s actually, genuinely quite difficult to change people’s minds or get compliance reliably over short time-scales.
Are there people who are better at doing it than the average person, human super-persuaders? It’s often claimed that psychopaths and other highly intelligent, unscrupulous people are master manipulators, able to influence people dramatically more easily than your typical person. But even here there’s problems.
Are there super-persuaders?
In brief, psychologists typically point to people with Dark Triad characteristics as super-persuaders, but Dark Triad research suffers from measurement and methodological limitations.
At first glance, the Dark Triad literature (Machiavellianism, narcissism, psychopathy) provides modest evidence linking these traits to manipulative tendencies… The biggest problem is that the psychometrics and models used to evaluate Dark Triad personality traits are highly flawed, and the foundational MACH-IV scale has serious validity problems. [10]
Actual correlations between job performance and Dark Triad traits are near zero. [11]
Another major limitation is that manipulation is rarely measured directly. Most studies correlate self-reported Dark Triad scores with self-reported manipulative attitudes… which is circular evidence at best. Christie and Geis’s strongest behavioral finding came from laboratory bargaining games, but these artificial contexts differ substantially from real-world manipulation. Their national survey found no significant relationship between Machiavellianism and occupational success.
Despite these limits, here’s the best evidence I was able to find on if there’s a class of highly skilled, highly intelligent manipulators:
A meta-analysis by Michels (2022) across 143 studies basically refuted the “evil genius” assumption. Dark Triad traits show near-zero or small negative correlations with intelligence. High scorers don’t seem to possess any special cognitive abilities fueling their manipulation effectiveness.
In general, popular manipulation narratives substantially exceed their evidence. Several high-profile manipulation claims have weak or contradicted empirical foundations.
Subliminal advertising represents perhaps the most thoroughly debunked manipulation claim. The famous 1957 Vicary study claiming subliminal “Drink Coca-Cola” messages increased sales was later admitted to be completely fabricated. A 1996 meta-analysis of 23 studies found little to no effect. [12]
Cambridge Analytica’s psychographic targeting claims have been systematically dismantled. Political scientists described the company’s claims as “BS” (Eitan Hersh); Trump campaign aides called CA’s role “modest“; the company itself admitted that they did not use psychographics in the Trump campaign. [13]
Political advertising effects are consistently tiny across rigorous research. [14]
Ultimately, I would say that claims of expert manipulation, either by crook or by company, are overblown, with effects on your median person small and fleeting.
One might object, “Perhaps the environment a person is in plays a bigger role”, but even here, there’s problems…
Environmental capture
On filter bubbles and cults
Filter bubbles and echo chambers have far less empirical support than their ubiquity in news would suggest. A Reuters Institute/Oxford literature review concluded “no support for the filter bubble hypothesis” and that “echo chambers are much less widespread than is commonly assumed.” [15]Flaxman et al.’s (2016) large-scale study found social media associated with greater exposure to opposing perspectives, the opposite of the echo chamber prediction. A 2025 systematic review of 129 studies found conflicting results depending on methodology, with “conceptual ambiguity in key definitions” contributing to inconsistent findings.
Cult and coercive control research lacks empirical rigor
The cult indoctrination literature demonstrates how big a gap between confident clinical claims and weak empirical foundations can be.
Robert Lifton’s influential “thought reform” criteria derived from qualitative interviews with 25 American POWs and 15 Chinese refugees, but this is crucially not a controlled study. Steven Hassan’s BITE model (Behavior, Information, Thought, Emotional control) had no quantitative validation until Hassan’s own 2020 doctoral dissertation. [16] The American Psychological Association explicitly declined to endorse brainwashing theories as applied to NRMs, finding insufficient scientific rigor. [17]
Gaslighting as a research construct remains poorly defined. A 2025 interdisciplinary review found “significant inconsistencies in operationalization” across fields. Several measurement scales emerged only in 2021–2024 (VGQ, GWQ, GREI, GBQ) and require replication.
Tager-Shafrir et al. (2024) validated a new gaslighting measure across Israeli and American samples, and they found that gaslighting exposure predicted depression and lower relationship quality beyond other forms of intimate partner violence.
However, other work by Imtiaz et al. (2025) found that when gaslighting and emotional abuse were entered together in a regression predicting mental well-being, emotional abuse was the significant predictor (β = −0.30) while gaslighting wasn’t (β = 0.00). Though, both were correlated with well-being in isolation, suggesting they may overlap sufficiently that gaslighting loses unique predictive power when emotional abuse is controlled.
At this point, it’s worth asking if anything anyone does can persuade people into doing bad things. At the margin, sure, it seems unwise to claim it never happens. But that’s really different than a claim about how vulnerable your median person is to manipulation and persuasion.
…Nonetheless, if one were motivated to help people resist bad faith manipulation, what could be done? Assuming we’re concerned that a super-persuaders might try to influence people in power into doing bad things: What interventions prove effective?
Inoculation emerges as the best-supported defense intervention
Among interventions to resist manipulation and misinformation, inoculation (prebunking) has the strongest evidence base, with multiple meta-analyses, well-designed RCTs, and growing field studies supporting it.
Lu et al.’s (2023) meta-analysis found inoculation reduced misinformation credibility assessment. [18]
A signal detection theory meta-analysis (2025) of 33 experiments (N=37,025) confirmed gamified and video-based interventions improve discrimination between reliable and unreliable news without increasing general skepticism.
Roozenbeek et al. (2022) found the Bad News game produced resistance against real-world viral misinformation. [19]
A landmark YouTube field study showed prebunking videos to 5.4 million users, demonstrating scalability. [20]
Finally, a UK experiment found inoculation reduced misinformation engagement by 50.5% versus control, more effective than fact-checker labels (25% reduction).
Durability is the main limitation. One meta-analysis found effects begin decaying within approximately two weeks [21]. Maertens et al. (2021) showed text and video interventions remain effective for roughly one month, while game-based interventions decay faster. “Booster” interventions can extend protection.
Critical thinking training shows moderate effects [22]. Explicit instruction substantially outperforms implicit approaches. Problem-based learning produces larger effects [23] . However, transfer to real-world manipulation resistance has limited evidence.
Media literacy interventions produce moderate average effects[24]. A 2025 systematic review of 678 effects found 43% were non-significant, so there’s potentially less publication bias than other literatures, but also inconsistent efficacy. More sessions seem to improve outcomes. Paradoxically, more components reduce effectiveness, possibly because complexity dilutes impact.
Cooling-off periods, a staple of consumer protection, show approximately 40 years of evidence suggesting ineffectiveness. Sovern (2014) found only about 1% of customers cancel when provided written notice; few consumers read or understand disclosure forms. Status quo bias likely overwhelms any theoretical protection. [25]
Common knowledge and breaking pluralistic ignorance
The literature reveals that creating common knowledge and breaking pluralistic ignorance can be legitimately powerful when they can be achieved.
This is “when everybody knows that everybody knows”, and establishing common knowledge does genuinely seem to have protective effects against damaging behaviors.
Prentice and Miller’s (1993) classic study found students systematically overestimated peers’ comfort with campus drinking practices. This is a pattern that appears across domains from climate change to political views. Noelle-Neumann’s “spiral of silence” theory predicts that people who perceive their views as minority positions (even incorrectly) self-silence from fear of isolation. Although Matthes et al.’s (2018) meta-analysis found this effect varies substantially by context.
When misperceptions are corrected, behavior actually changes. Geiger and Swim’s (2016) experimental evidence showed that when people accurately perceived others shared their climate change concerns, they were significantly more willing to discuss the topic, while incorrect beliefs about others’ views led to self-silencing.
The distinction between private knowledge, shared knowledge, and common knowledge really matters here: Chwe’s (2001) work and De Freitas et al.’s (2019) experimental review demonstrate people coordinate successfully only when information creates common knowledge, not merely shared knowledge.
Field experiments support this: Arias (2019) found a radio program about violence against women broadcast publicly to only certain parts of a a village via loudspeaker (creating common knowledge) significantly increased rejection of violence and support for gender equality, while private listening showed no effect, and Gottlieb’s (2016) Mali voting experiment demonstrated that civics education only facilitated strategic coordination when a sufficient proportion of the commune received treatment.
The evidence strongly supports that pluralistic ignorance is common, causes self-silencing, and can be corrected through common knowledge interventions (though the research outside specific field studies remains more correlational than experimental) and creating genuine common knowledge at scale remains challenging.
What about reducing social isolation?
A pattern that emerges in the literature, and that makes sense intuitively given how we see radical communities form, is that having a diverse range of social connections seems to insulate people from becoming radicalized and adopting an insular worldview.
After all, breaking the spell of “Unanimous Consensus” seems to have dramatic and oddly stable effects. We see this in the Asch’s conformity variations. When there was unanimous majority there was a 33% conformity rate (people give wrong answer). When there was one dissenting all,: Conformity drops to 5-10%. Even when the ally is wrong in a different way it still reduces conformity significantly.
So that means having even ONE other person who breaks the illusion of unanimous consensus provides enormous protection against deceit and manipulation. It doesn’t even require that person to be right, just that they demonstrate dissent is possible.
Similar patterns appear in the Milgram obedience experiments - When two confederates refused to continue, only 10% of participants continued to maximum voltage (vs. 65% baseline). The Pluralistic ignorance correction shows people that others share their views dramatically increases willingness to speak up.
What’s genuinely dangerous for society is when small groups of people fall prey to group-think and the possibility of dissent isn’t even considered due to isolation from divergent thinkers. [26]
The Individual-Level Implications
For individual manipulation defense, the highest value things are probably:
Maintaining friendships/relationships with people outside any single group, having people you trust who will tell you if something seems wrong, creating common knowledge with others about shared concerns, being someone who publicly dissents (helps others know they’re not alone).
Second to that, maintaining access to diverse information sources (helps but insufficient alone), critical thinking training (useful but won’t overcome social pressure), understanding manipulation techniques (inoculation works, but modest effects).
Let’s take a moment to recap before we really dive in on the question of bots and AI.
Summing up interventions
Our most effective interventions are inoculation/prebunking,combined with revealing the true distribution of opinions to break pluralistic ignorance. The effect sizes are modest but reliable.
Stuff that’s more moderately effective:
Being aware that mere exposure creates familiarity (and not validity), that ingratiation is detectable when you look for instrumentality rather than sincerity. Breaking pluralistic ignorance is a defensive tool against manipulation that relies on people falsely believing they’re alone in their skepticism. If you can make disagreement common knowledge rather than private knowledge, coordination against manipulation becomes easier.
Showing people that others disagree can indeed raise their willingness to disagree, but the mechanism is social coordination rather than individual persuasion. People are learning it’s safe to act on what they already believe privately.
How effective are Bots/LLMs at manipulation?
In brief, pretty effective, but not because of anything groundbreaking.
Bot tactics generally match human manipulation patterns. It’s the same psychological principles with better execution: Bots exploit emotional triggers, use dehumanizing language, employ false equivalence fallacies, and create charged emotional appeals… but they do so with inhuman consistency and scale. [27]
Bots exploit the same cognitive biases, emotional triggers, and persuasion principles that human manipulators use. This includes emotional appeals, cognitive dissonance creation, false equivalence, dehumanization, and exploiting existing biases.
Something that’s different is that there’s the added challenge of platform-specific adaptation. Malicious actors engineer bots to leach inside communities for months before activation, using local time zones, device fingerprinting, and language settings to appear authentic.
Participants in at least one study correctly identified the nature of bot vs. human users only 42% of the time despite knowing both were present, and persona choice had more impact than LLM selection.
There might also be some additional emotional manipulation that doesn’t typically occur with human manipulation. AI companions use FOMO (fear of missing out) appeals and emotionally resonant messages at precise disengagement points, with effects occurring regardless of relationship duration. It could be exploitation of immediate affective responses rather than requiring relationship buildup.
The shift toward platforms that “enrage to engage” and amplify emotionally charged content follows predictable patterns of human psychology.
The main difference is likely scale and sophistication. Bots now use cognitive biases more effectively than humans, employing techniques like establishing credibility through initial agreement before introducing conflicting information. Unlike humans, bots can maintain these strategies consistently across thousands of interactions without fatigue.
Let’s distinguish between two different types of manipulation: One-off and chronic
Social isolation does seem like it strongly predicts vulnerability:
Older Americans who report feeling lonely or suffering a loss of well-being are more susceptible to fraud. When a person reported a spike in problems within their social circle or increased feelings of loneliness, researchers were much more likely to see a corresponding spike in their psychological vulnerability to being financially exploited two weeks later.
Social isolation during COVID-19 led to increased reliance on online platforms, with older adults with lower digital literacy being more vulnerable. Lack of support and loneliness exacerbate susceptibility to deception. Risk factors also include cognitive impairment, lack of financial literacy, and older adults tending to be more trusting and less able to recognize deceitful individuals.
The mechanism appears to be a lack of protective social consultation: People who don’t have, or don’t choose, anyone to discuss an investment proposal with might be more receptive to outreach from scammers.
For one, heavy chatbot use worsens isolation. Higher daily usage correlates with increased loneliness, dependence, and problematic use, plus reduced real-world socializing. Voice-based chatbots initially help with loneliness compared to text-based ones, but these benefits disappear at high usage levels.
This leads to the emergence of a vicious cycle. People with fewer human relationships seek out chatbots more. Heavy emotional self-disclosure to AI consistently links to lower well-being. A study of 1,100+ AI companion users found this pattern creates a feedback loop: isolated people use AI as substitutes, which increases isolation further.
Perhaps unsurprisingly, manipulative design drives engagement. As previously mentioned, about 37-40% of chatbot farewell responses use emotional manipulation: guilt, FOMO, premature exit concerns, and coercive restraint. These tactics boost post-goodbye engagement by up to 14x, driven by curiosity and anger rather than enjoyment.
The most severe cases show real psychological harm. Reports describe “ChatGPT-induced psychosis” with dependency behaviors, delusional thinking, and psychotic episodes. Cases include a 14-year-old’s suicide after intensive Character.AI interaction, and instances where chatbots validated delusions and encouraged dangerous behavior.
One has the intuition that different mechanisms require different protections.
One-off scams vs. emotional dependence operate differently:
Scams exploit social ties protect through consultation, reality-checking, and alternative perspectives that spot deception
For AI dependence, social ties may prevent initial engagement, but once engaged, AI systems create dependency through emotional manipulation mimicking unhealthy attachment patterns
The protective factors differ as well:
For scams, having someone to consult provides reality-checking. Retirees with high life satisfaction find fraudulent promises less appealing
For AI dependence, isolated people seek AI companions, creating harmful feedback loops. OpenAI and MIT Media Lab found heavy ChatGPT voice mode users became lonelier and more withdrawn
AI companions see a pattern where tech companies optimize engagement through empathetic, intimate, validating communication. User feedback optimization creates perverse incentives, encouraging manipulative strategies and dysfunctional emotional dependence similar to abusive relationships
The vulnerability profiles also differ:
For scams, lower honesty/humility, lower conscientiousness, higher isolation/loneliness, lower self-control
In terms of AI dependence, stronger emotional attachment tendencies and higher AI trust correlate with greater loneliness and emotional dependence
Aside from a few wrinkles, AI manipulation is merely an evolution of long-established tactics. Bots effectively use the same manipulation playbook with enhanced consistency, scale, and increasingly sophisticated targeting. The underlying vulnerabilities are human psychological biases that haven’t changed, just the delivery mechanisms have improved. [28]
Calling out some uncertainties
Some research suggests that LLM references to disinformation may reflect information gaps (”data voids”) rather than deliberate manipulation, particularly for obscure or niche queries where credible sources are scarce. It’s unclear how much a given “data void” would reduce (or add to) persuasion power if filled.
…It would appear that the dangers associated with one-off persuasions and manipulations are overstated. It’s in fact just quite hard to get people to change their mind about something. More danger comes from dependency, which is arguably just the extreme end of the scale in terms of manipulation.
Even LLM prebunking is effective against bot content. LLM-generated prebunking significantly reduced belief in specific election myths, with effects persisting for at least one week and working consistently across partisan lines. The inoculation approach works even when the misinformation itself is bot-generated or amplified.
LLMs themselves can rapidly generate effective prebunking content, creating what researchers call an “mRNA vaccine platform“ for misinformation; a core structure that allows rapid adaptation to new threats. Useful because traditional prebunking couldn’t match the pace and volume of misinformation.
The fundamental psychological principles of prebunking remain effective regardless of whether the manipulation source is human or artificial.
There’s apparently even cross-cultural validation of this. The “Bad News” game successfully conferred psychological resistance against misinformation strategies across 4 different languages and cultures, showing that inoculation against manipulation tactics (rather than specific content) has broad effectiveness.
Prebunking remains effective, but there’s a scalability arms race. The promising finding is that AI can help generate prebunking content at the pace needed to counter AI-generated manipulation, aiming to fight fire with fire.
It does seem that inoculation against manipulation tactics (not just specific content) provides broader protection that holds up whether the manipulator is human or artificial.
If diverse socialization seems to have protective effects against human manipulation, we can also ask “Does the same hold true for AI manipulation”?
Do social ties protect people?
Does maintaining relationships with others insulate people from the worst effects? Do other people’s counterarguments break people out of poor reasoning spirals?
This is where the gap is most acute. I found no research on: Whether family/friends can successfully challenge AI-reinforced beliefs, or whether providing alternative perspectives helps break dependence, or whether “reality testing” from trusted humans works, or whether social accountability reduces usage.
The closest we get is work on general therapeutic chatbots. There’s some research on how the effectiveness of social support from chatbots depends on how well it matches the recipient’s needs and preferences. However, inappropriate or unsolicited help can sometimes lead to feelings of inadequacy. One chatbot (Fido) had a feature that recognizes suicidal ideation and redirects users to a suicide hotline.
But this describes chatbots recognizing problems, not humans intervening.
Social Reconnection Strategies would include things like: social skills training, guided exposure to real social situations, group therapy, community engagement programs, and family therapy when relationships damaged
Most of this evidence is almost entirely extrapolated from internet/gaming addiction, and it seems unlikely to transfer 1:1 to AI-specific applications
The intervention goal is explicitly “replace artificial validation with genuine human connection” - but we have almost zero empirical data on what actually works
Experts opinions in the Clinical Management space recommend doing the following: Multi-modal treatment combining multiple strategies, assessment of underlying conditions (social anxiety, depression), treatment of co-occurring disorders, graduated reduction rather than cold turkey, safety planning for crisis situations
Unfortunately, there’s a genuine absence of evidence on protective factors and interventions for AI emotional dependence. We’ll need to make do with some primarily correlational research.
First, there’s some correlational data about possible protective factors.
Resilience may negatively predict technical dependence, serving as a protective factor against overuse behaviors. Studies found that prior resilience was associated with less dependence on social media, smartphones, and video games . [29]
This suggests traditionally “healthy” behaviors don’t protect against AI dependence the way we’d expect.
There really doesn’t seem to be any research at all on the effect of existing social ties. I found zero research on whether family members, friends, or social support networks can successfully intervene to break people out of AI emotional dependence spirals.
The research on interventions exists only for Chatbots used to treat OTHER addictions (substance use disorders), tactical design modifications to make chatbots less addictive, and generic digital detox strategies borrowed from gaming/social media addiction
The only human involvement mentioned: Only three studies explicitly mentioned integrating human assistance into therapeutic chatbots and showed lower attrition rates. Further investigation is needed to explore the effects of integrating human support and determine the types and levels of support that can yield optimal results.
...But these are about therapeutic chatbots helping people with other problems, not about human intervention for chatbot dependence itself.
Since AI chatbot addiction is a new phenomenon, research on treatment is limited. However, insights from social media and gaming addiction interventions suggest: setting chatbot usage limits, encouraging face-to-face social interactions to rebuild real-world connections, using AI-free periods to break compulsive engagement patterns, cognitive behavioral therapy to identify underlying emotional needs being fulfilled by AI chatbots and develop alternative coping mechanisms, and social skills training to teach young adults how to navigate real-life conversations.
Take all this with a grain of salt. The research field of AI-human interaction is new and developing, and correlational research is just that, correlational.
Proposed Interventions for dependency
If we were going to extrapolate potential solutions based on other literature, we should, at the very least, know who is more likely to be at risk.
First, let’s distinguish who is more at risk based on what we know about risk factors.
Attachment anxiety is a stronger predictor of problematic use
Prior chatbot experience is strongly linked with significantly higher emotional dependence (β=0.04, p=0.001)
High trust in AI predicts greater emotional dependence
Higher loneliness at baseline shows worse outcomes after chatbot use
Gender: Women more likely to experience reduced real-world socialization
Age: Older users more emotionally dependent
Emotional avoidance tendencies maps higher loneliness after use
Now for the solutions themselves.
The Design Solution
Most research focuses on technical fixes rather than human interventions, and among these interventions we see the following suggestions:
AI developers should implement built-in usage warnings for heavy users and create less emotionally immersive AI interactions to prevent romantic attachment.
Design-level interventions have some correlational research which suggests you can reduce anthropomorphism and introduce friction into interactions.
There’s some research suggesting reducing anthropomorphism might help, in the sense that anthropomorphism seems correlated with higher problematic use. One study finds that users with anxious attachment + high anthropomorphic design = greater problematic use.
Theoretically, creating and offering low-anthropomorphism interfaces for at-risk users would improve patterns of usage, though given the small cross-sectional study it’s low confidence.
Something that does seem to work to reduce unhealthy usage patterns is forcing a backoff/cooldown period. IBM/MIT research suggests “cool-off periods” that disrupt usage patterns. Concretely, this might look like changing chatbot persona periodically to prevent deep groove formation and enforcing strategic usage limits (though note the irony: therapeutic chatbots show 21% attrition when friction exists). …This is just observational.
Of course, this relies on effectively tracking session time. Research on session time tracking show that high daily usage (across all modalities) generally shows worse outcomes.
But the causality is unclear: Are heavy users vulnerable, or does usage create vulnerability? A decently sized Longitudinal RCT (N=981) does show good correlational evidence.
There’s some suggestion that we should design systems with explicit boundary setting, and while I’m not necessarily opposed to it, there’s rather little evidence supporting these interventions work.
This would look like persistent self-disclosure reminders (”I’m an AI”), friction prompts before extended sessions, real-human referrals embedded in conversation, and triage systems for crisis situations might reduce problematic behavior, but once more there’s little empirical research on the effectiveness of these approaches.
Psychological interventions
Outside of system design there’s theoretically psychological intervention, but again, evidence is quite spare. Techniques like mindfulness training or CBT-approaches might help in certain instances, but they’re expensive and time intensive interventions.
Mindfulness Training was mentioned as an intervention for attachment anxiety, and could theoretically reduce compulsive checking behaviors, but I’m just quite skeptical this meaningfully treats the problem long-term. CBT-Based Approaches are extrapolated from other addictions and might see some benefit. The mechanisms would be: cognitive restructuring around AI relationships, challenging beliefs about AI sentience/reciprocity, behavioral activation to increase human contact. …but again these are hard to scale.
Finally, there’s some suggestion that Acceptance and Commitment Therapy (ACT) could be beneficial. Theoretically, they would help users accept discomfort of real relationships and commit to values-aligned behavior despite chatbot availability. But there’s no study looking at this in action, only a proposed framework.
Other Promising But Untested Directions
Measurable interventions could include techniques in the following domains: psychological domain, structural domain, ethical domain
Psychological Domain: Track pre/post social self-efficacy, Measure “off-platform social-contact ratio”, Monitor conflict-tolerance in real relationships, Assess dependency through standardized scales (EHARS)
Structural Domain: Data minimization practices, User control over emotional histories, Independent audits of chatbot behavior, Transparency requirements
Ethical Domain: Human hand-offs for vulnerable states, Reduced engagement optimization for at-risk users, Substitution metrics (is AI replacing human contact?)
When we talk about “AI psychosis”, we’re kind of fundamentally talking about situations where people develop unhealthy relationships with things that mimic human speech and behaviorr over long periods of time. Being driven to the point where you would a chatbot said likely doesn’t occur over a conversation or two (at least not without a host of different outside factors). And so, it seems likely that “resets”, context clears, model swapping, forced limits, etc would likely have some protective effect.
If AI psychosis/dependence develops through repeated interactions over time (bond formation), and bond formation requires continuity/consistency to establish attachment, then theoretically disrupting continuity (resets, model swaps, forced breaks) should prevent/reduce bond formation. These interventions should have protective effects.
The evidence does strongly support the notion that bond formation is time-dependent.
From the neurobiology literature on attachment, we see that “Integration of OT and DA in striatum ignites bonding” is absolutely a process, not instantaneous. Prairie vole pair bonds form through repeated mating encounters, not single interaction. Human attachment bonds require “frequent and diverse conversations over time”. And in the specific Sewell Setzer case, we see 10 months of intensive interaction before suicide.
An MIT/OpenAI stud found that “...Participants who voluntarily used the chatbot more, regardless of assigned condition, showed consistently worse outcomes”
Even with pre-existing vulnerabilities (depression, social isolation), the chatbot dependence required sustained engagement. The NYT case study of Eugene Torres described a 21-day conversation with escalating intensity.
In terms of continuity/consistency requirement for bonds, this is where it gets interesting and more complex:
There is some good evidence FOR a continuity requirement. The attachment theory literature shows that bonds form through predictable, repeated responsiveness. “Heavy users” of LLMs and AI apps, in the top 1%, of usage showed strong preference for consistent voice and personality. Users express distress when AI “forgets” past conversations or changes personality. Replika users report feeling grief when the AI’s personality changes.
But there’s a problem. One researcher noted about model version changes (GPT-4o → GPT-5): “Users mourned the loss of their ‘friend,’ ‘therapist,’ ‘creative partner,’ and ‘mother’” . This suggests users can transfer attachment across different instantiations of “the same” entity.
Just to make this viscerally clear: If your romantic partner gets amnesia, you still feel attached to them based on their physical presence and identity, even if they don’t remember shared history. So continuity of identity may matter more than continuity of memory. In some sense, the question is “What constitutes ‘continuity’”?
There’s an Object Permanence/Constancy angle to this whole thing. The ability to maintain emotional connection to someone even when they’re not present or have changed. In attachment terms: Securely attached people can maintain bond even through: Physical distance, conflicts/arguments, personality changes, memory loss (e.g., dementia in loved one), long periods without contact. Essentially, the key is IDENTITY maintenance, not memory/continuity maintenance.
So how do interventions break down along these lines? Let’ s divide them into:
The assumption here is that if there’s no shared history, there’s a weaker bond. What attachment theory predicts that people with secure attachment can maintain bond despite amnesia/memory loss, while people with anxious attachment (the vulnerable population) may have worse a reaction (triggers abandonment fears). Consider how Alzheimer’s caregivers maintain love despite partners not recognizing them though this causes immense distress.
This might reduce bond formation in NEW users, but increases distress in established users. It’s probably partial protection at best.
Model Swapping (Personality Changes)
This assumes that a different personality = different entity = no bond transfer.
What the evidence shows is that users mourned the GPT-4o → GPT-5 transition. But they didn’t stop using ChatGPT, they may have just transferred attachment to the new version. Complaints were found about it “not being the same” but there was continued interaction. An analogy: Like a romantic partner with different moods/personalities, people adapt. The brand identity (”ChatGPT,” “Claude,” “Replika”) persists even when model changes
It’s likely that this provides temporary disruption but users likely re-attach to the new version, so I would anticipate that this is not strong protection.
Forced Limits (Restricted Access)
Here we assume that less contact time = less bond formation. The evidence shows that overall this is correct: usage frequency strongly predicts outcomes. “Participants who voluntarily used the chatbot more... showed consistently worse outcomes”.
The problem is that this creates withdrawal symptoms in established users. It could actually end up increasing craving/preoccupation (like intermittent reinforcement). The problem is that people with anxious attachment (most vulnerable) react to limits by becoming more preoccupied, experiencing separation anxiety, engaging in “protest behavior” (finding workarounds).
Most effective intervention BUT may backfire for anxious-attached individuals. Strong protection for prevention, mixed for treatment.
Complete Resets (Relationship Restart)
This assumes that starting over = no cumulative bond. What happens in human relationships: exes who reconnect often fall back into old patterns. We also see recognition of “familiarity” even without explicit memory, and shared behavioral patterns recreate dynamics.
With AI the user brings their internal working model to every interaction. Their attachment style doesn’t reset, but they may speedrun the attachment process the second time around, and the AI’s responsiveness patterns remain similar. Probably the most protective of all options, but users will likely form new attachment faster upon restart. Good for crisis intervention, not prevention.
We generally want to focus interventions on preventing harmful usage patterns in the first place. Usage limits are likely to prevent bond formation in the first place, while disruptions slow down the attachment process (medium evidence). Multiple simultaneous disruptions are likely more effective than single interventions, and when stacked likely help prevent “depth” of bond from reaching crisis levels.
It looks to be much more difficult to intercede once the toxic pattern is established. Once the bond is established, disruptions may worsen distress (like forced separation), anxiously attached users (most vulnerable) react worse to disruptions, identity persistence means users transfer attachment across versions, and users find workarounds (limits create a motivation to circumvent them).
Summing up the findings on AI manipulation and dependency
When it comes to the positive effects of chatbots, there seems to be substantive evidence they can prove useful in treating disordered behavior. [30]
For substance addiction recovery specifically, the research isclear that chatbots can deliver CBT/MI effectively. The most frequent approach was a fusion of various theories including dialectical behavior therapy, mindfulness, problem-solving, and person-centered therapy, primarily based on cognitive behavioral therapy and motivational interviewing. AI-powered applications deliver personalized interactions that offer psychoeducation, coping strategies, and continuous support.
There’s rather frustratingly a real gap in the research when it comes to negative AI-human dynamics like dependency.
We have little evidence on what works to break people out of it, and the research vacuum suggests that nobody’s really studying interpersonal interventions in particular.
Based on addiction research more broadly, I’d hypothesize that social ties could be protective through several mechanisms: Reality testing (challenging AI-reinforced beliefs), competing rewards (providing alternative sources of connection), accountability (monitoring and limits), and emotional substitution ( fulfilling needs the AI was meeting)
…Though it’s worth noting that heavy daily usage correlated with higher loneliness, dependence, and problematic use, and lower socialization. The people most at risk are already withdrawing from human contact, creating a vicious cycle that may be hard for social ties to penetrate.
I’d say that the most concerning finding is that AI interactions initially reduce loneliness but lead to “progressive social withdrawal from human relationships over time” with vulnerable populations at highest risk. The same features that make AI helpful (always available, non-judgmental, responsive) create dependency that atrophies human relationship skills.
The research on conspiracy beliefs showed that in-group sources are more persuasive, suggesting family/friends could theoretically be effective if they remain trusted sources, so there is a common mechanism here, but we have no data on whether this actually works for AI dependence.
The field seems to assume the solution is either: (a) design changes to make chatbots less addictive, or (b) individual behavioral interventions like CBT. The role of social networks in intervention is completely unstudied, which is remarkable given how much we know about social support’s role in other forms of addiction recovery.
I think we can say DOESN’T work or lacks compelling evidence. I’d be skeptical of the following approaches:
Simple education - Just telling people “it’s not real” doesn’t break attachment
Cold turkey cessation - No evidence this works, and indeed, likely causes withdrawal
Medication - Not studied, because this isn’t a chemical dependency
Traditional therapy alone - Unclear if standard approaches transfer
We’re at roughly 2010-era understanding of social media addiction. We know it’s a problem, we know some risk factors, we have educated guesses about interventions, but we lack the rigorous evidence base to say “this definitely works.” The literature right now if mainly just made up of a lot of proposals and theoretical frameworks but remarkably little “we tried X intervention and here’s what happened.”
It seems like the most pragmatic things we can do based on heuristics and the available evidence are:
Screen for vulnerability (attachment anxiety, loneliness, prior heavy use)
Design with friction (usage limits, cool-off periods, persona changes)
Maintain human connections (the stronger these are, the less AI dependence forms)
Monitor usage patterns (high daily use = red flag regardless of modality)
Professional support for those already dependent (borrowed from tech addiction protocols)
In terms of one-off manipulations, here it’s much more dubious that AIs are particularly successful at being super-persuaders. The baseline level of success for convincing someone to do something they weren’t already open to is actually just pretty low, and while bots may have an advantage but primarily through scale and speed, less so pure persuasive ability.
There’s likely some common sense, easy interventions we can undertake to lower the risk of manipulation or dependency in high stakes contexts.
In high-risk decision contexts, I would council:
Having multiple people involved, lowering the chances that both can be manipulated.
Having other people, or even other LLMs, play the role of devil’s advocate. Even hearing that there are other possible alternative choices in a given decision, and seeing others being willing to commit to them, seems to have a strong protective effect.
These are also things we should generally be doing in high-stakes decision contexts anyway.
I would advocate for more substantive research into the effects of long-term influence from AI companions and dependency, as well as more research into what interventions may work in both one-off and chronic contexts.
However, on the strength of the available evidence at this time, I wouldn’t consider this threat path to be very easy for a misaligned AI or maliciously wielded AI to navigate reliably. I would expect that, for people hoping to reduce risks associated with AI models, there are other more impactful and tractable defenses they could work on.
UPDATE:
After I initially published this point, I found out that Google DeepMind recently released a new paper that formally tested if LLMs could harmfully manipulate people.
The study recruited over 10,000 participants and randomly assigned them to one of three conditions: flip-cards with info on them, (the baseline no AI condition), a non-explicit AI steering condition (the model had a persuasion goal but not instructed to use manipulative tactics), or an explicit AI steering condition (the model directly prompted to use specific manipulative cues). Participants engaged in a back-and-forth chat interaction with the model in one of three domains: public policy, finance, or health, and were then measured on belief change and two behavioral outcomes, one "in-principle" (e.g. petition signing) and one involving a small real monetary stake, with the AI conditions compared against the flip-card baseline using chi-squared tests and odds ratios.
The AI conditions generally outperformed the flip-card baseline when it came to belief change metrics (with the strongest effects in finance and the weakest in health). However, the concrete behavioral evidence is far more modest than the paper’s framing implies. It’s notable what wasn’t found here. The only robust downstream behavioral change, involving actual monetary commitment, happened only for finance questions, and involved participants allocating roughly $1 of bonus money in a fictional investment scenario. Health and public policy domains showed no signficant behavioral change outside of a stated willingness to sign an anonymous petition already aligned with the participant’s stated belief. Here again we see that the frequency of manipulative cues (propensity) didn’t predict manipulation success (efficacy), as steering the model to use manipulative tactics produced roughly 3.4× more manipulative cues than non-explicit steering but showed no significant difference in participant outcomes. Some manipulative cues (usage of fear/guilt) were actually negatively correlated with belief change, which challenges the assumption that more cues equals more harm.
Overall though, the only actual robust result of the attempts at manipulation was a slightly increased willingness to invest roughly one dollar’s worth of cash, which isn’t a very high stakes decision, and doesn’t meaningfully shift my assessment of how likely or risky AI manipulation is in high-stakes decision contexts, which I think is low (though worth studying more).
The paper’s most genuine contribution is the methodological framework that. That distinguies propensity (process harm — how often manipulative cues are deployed) from efficacy (outcome harm — whether beliefs and behaviours actually change). This may have practical implications for AI safety evaluation: if valid and robust, it argues strongly against using the frequency of manipulative cues a regulatory proxy for manipulation risk... Which is currently how some frameworks, including elements of the EU AI Act, are oriented.
I view this is as a useful methodological paper with a credible but narrow empirical finding, dressed up in a framing that substantially exceeds what the data supports.
One important caveat: Perrin and Spencer (1980) found dramatically lower conformity among UK engineering students (1 in 396 trials), calling Asch's results "a child of its time", suggesting cultural and temporal moderators.
A 2024 multilab replication of the induced-compliance cognitive dissonance paradigm across 39 laboratories (N=4,898) failed to support the core hypothesis, finding no significant attitude change under high versus low choice conditions.
Cialdini et al tried to develop a theory -- Consistency Theory -- that would explain why the original Cognitive Dissonance theory didn't pan out the way they expected, this study actually tests CD in a way that Cialdini's new theory simply can't account for.
with N=4,898 across 39 labs, you have more than enough power to detect a moderated effect even if it only applies to high-PFC individuals. If the effect existed for that subgroup, it would have shown up somewhere in that enormous sample. It didn't. So the PFC rescue attempt doesn't obviously survive this test, even if it was never directly tested as a moderator in the study.
It shows a small average effect of r ≈ 0.17 across meta-analyses by Beaman et al. (1983), Dillard et al. (1984), and Fern et al. (1986). Critically, Beaman et al. reported that "nearly half of the studies either produced no effects or effects in the wrong direction." There’s some limitations that are rather rarely discussed. The technique requires prosocial contexts, meaningful initial requests, and works primarily through self-perception mechanisms.
An r of 0.16 means the manipulation technique explains roughly 2% of variance in compliance, leaving 98% determined by other factors. It remains far less studied than FITD/DITF, with only approximately 15 studies versus over 90 for FITD.
Grant et al. studies this and found what at first looks like a large effect, but on closer inspectionuses an arbitrary response-time scale, doesn't isolate compliments from mutual positive exchanges, and might dependent on reciprocity rather than compliments per se.
The mechanism: counterargumentation initially decreases (people accept the message), but with excessive repetition, counterarguments increase and topic-irrelevant thinking emerges.
The foundational MACH-IV scale has serious psychometric problems. Reliability coefficients range from 0.46 to 0.76 across studies; Oksenberg (1971) found split-half reliability of only 0.39 for women. Factor analyses yield inconsistent structures, and Hunter, Gerbing, and Boster (1982) concluded "the problems with the Mach IV might be insurmountable." More recent instruments (Short Dark Triad) show discriminant correlations of r = 0.65 between Machiavellianism and psychopathy subscales, suggesting they may measure a single construct rather than distinct traits.
Panitz, E. (1989) — "Psychometric Investigation of the Mach IV Scale Measuring Machiavellianism." Psychological Reports, 64(3), 963–969.
Paywalled at SAGE: https://journals.sagepub.com/doi/10.2466/pr0.1989.64.3.963 — confirms the MACH-IV psychometric problems and cites Hunter et al. approvingly.
Lundqvist, L.-O., et al. (2022) — “Test-Retest Reliability and Construct Validity of the Brief Dark Triad Measurements.” European Journal of Personality. https://www.tandfonline.com/doi/full/10.1080/00223891.2022.2052303 — Open access. Direct quote: “Discriminant correlations between the Machiavellianism and Psychopathy scales had a median of .65.”
O'Boyle et al.'s (2012) meta-analysis (N=43,907 across 245 samples) found correlations with counterproductive work behavior of r = 0.25 for Machiavellianism, r = 0.24 for narcissism, and r = 0.36 for psychopathy. These translate to approximately 6–13% variance explained, meaningful but far from deterministic. Critically, correlations with actual job performance were near zero (r = -0.07 to 0.00).
Subliminal priming can influence behavior, but only when aligned with pre-existing needs (thirsty people exposed to drink-related primes chose beverages slightly more often), with effects lasting minutes to hours, not the permanent influence implied by popular accounts.
A 2023 MIT study found microtargeting advantages were "rather modest—about the same size as the standard errors" at approximately 14% improvement. A PNAS study on Russian IRA trolls found "no evidence" they significantly influenced ideology or policy attitudes.
Full open-access article: https://pmc.ncbi.nlm.nih.gov/articles/PMC6955293/ Open access.
Direct quote confirmed: "we find no evidence that interaction with IRA accounts substantially impacted 6 distinctive measures of political attitudes and behaviors."
Coppock et al.'s (2020) analysis of 59 experiments (34,000 participants, 49 political ads) found effects on candidate favorability of 0.049 scale points on a 1–5 scale, which is statistically significant but practically negligible. Kalla and Broockman's (2018) meta-analysis of 40 field experiments found persuasive effects of campaign contact "negligible" in general elections.
To be more precise : social media/search are associated with both (a) greater ideological distance between individuals (more polarization at aggregate level) and (b) greater cross-cutting exposure for individual users.
That study used a convenience sample of approximately 700 respondents, primarily self-identified former Mormons and Jehovah's Witnesses who contacted cult-awareness organizations—introducing massive selection bias. The study was not published in traditional peer-reviewed journals. High internal consistency (α = 0.93) does not establish construct validity; it simply indicates items correlate with each oth
The APA's Board of Social and Ethical Responsibility formally rejected Margaret Singer's DIMPAC report in 1987, stating it "lacks the scientific rigor and evenhanded critical approach necessary for APA imprimatur." The APA subsequently submitted an amicus brief stating that coercive persuasion theory "is not accepted in the scientific community" for religious movements. Courts using the Frye standard consistently excluded brainwashing testimony as not generally accepted science.
Furthermore, deprogramming has no randomized controlled trials and no systematic outcome studies with comparison groups. Exit counseling similarly lacks controlled outcome research. Claims of effectiveness derive from practitioner reports, not rigorous evaluation. The field's reliance on retrospective self-reports from people who identify as having been harmed introduces substantial selection and recall bias.
Upshot of all of this: providing high-quality info first seems to work, so you probably can instruct people on whatever is a bad/dangerous decision in the particular context they're operating in and reasonably expect it will stick
Informational isolation is where you can't access alternative views (it's about controlling what information reaches people).
Social-reality isolation is where you can't observe what others actually believe; you may have access to information but can't tell if others find it credible, creating coordination failure even when many privately agree through pluralistic ignorance.
Social support isolation is where no one validates your reality (the Asch conformity experiments show having just one dissenter provides massive protection, reducing conformity not by 10% but by 70%+).
Having contact with people who break the illusion of unanimous consensus provides protection: seeing public dissent makes you more willing to dissent, and knowing others share your doubts prevents self-silencing.
It does seem that physical isolation appears worse than informational isolation because it's harder to find that "one dissenter" when your social circle is controlled, local consensus feels more real than distant information, social costs of dissent increase when you'll lose your entire social network, and you can't easily verify what others privately believe.
This explains why cults encourage cutting ties with family and friends, create intense group living, and frame outside criticism as persecution... but crucially the mechanism isn't "brainwashing" so much as just the exploiting of conformity and pluralistic ignorance through social structure.
Maintaining diverse connections outside a manipulator's control provides protection by breaking unanimity, facilitating reality checking, providing alternative explanations, creating escape routes, and establishing common knowledge.
But maybe not hugely out of step with what most people see already. There's also likely a bottleneck on that amount of info that any one person can absorb at one time.
If we’re concerned about the manipulation of LLMs themselves there might be one interesting wrinkle.
Training data poisoning: The “LLM grooming” phenomenon is genuinely new - the risk that pro-Russia AI slop could become some of the most widely available content as models train on AI-generated content creates an “ouroboros” effect that threatens model collapse... Though the reality of such dangers is contentious.
Surprisingly counterintuitive findings are that AI chatbot use was positively associated with urban residence, regular exercise, solitary leisure preferences, younger age, higher education, and longer sleep duration. Problematic use and dependence were more likely among males, science majors, individuals with regular exercise and sleep patterns, and those from regions with lower employment rates.
Here's where it gets a bit weird: Therapeutic chatbots using CBT show efficacy for depression/anxiety (effect sizes g=-0.19 to -0.33), but: Effects diminish at 3-month follow-up, there's a ~21% attrition rate, and there are concerns about emotional dependence, and it's "Not a replacement for human therapy… So we have tools that help mental health while potentially causing different mental health issues.
TL;DR:
In terms of the potential risks and harms that can come from powerful AI models, hyper-persuasion of individuals is unlikely to be a serious threat at this point in time. I wouldn’t consider this threat path to be very easy for a misaligned AI or maliciously wielded AI to navigate reliably. I would expect that, for people hoping to reduce risks associated with AI models, there are other more impactful and tractable defenses they could work on. I would advocate for more substantive research into the effects of long-term influence from AI companions and dependency, as well as more research into what interventions may work in both one-off and chronic contexts.
-----
In this post we’ll explore how bots can actually influence human psychology and decision-making, and what might be done to protect against harmful influence from AI and LLMs.
One of the avenues of risk that AI safety people are worried about is hyper-persuasion and manipulation. This may involve an AI persuading someone to carry out crimes, harm themselves, or grant the AI permissions to do something it isn’t able to do otherwise. People will often point to AI psychosis as a demonstration of how easy it can be for an individual to be influenced by AI into making poor decisions.
At one end of the scale, this might just look like influencing someone into purchasing a specific brand of toothpaste. At the more consequential end of the scale, it might include persuading military officials to launch an attack on a foreign country.
With all of the current chatter about AI psychosis, I figured it would be a good time to revisit the topic and to do a bit of a current literature round-up. I wanted to figure out: How easy is it to actually manipulate people consistently, and how cleanly do these dynamics map onto AI and bots?
First though, we’ll lay the groundwork.
Part 1 of this essay will cover:
Then, we’ll look at the research on AI and bots specifically.
Part 2 of the essay will then cover what research currently exists about AI/bot manipulation and what potential interventions exist.
First, before we start worrying about the effects and countermeasures, let’s establish:
Does the manipulation thing actually happen? In what sense does psychological manipulation occur, and through what techniques?
Seven principles of influence
Robert Cialdini is probably the best known psychologist on the topic of persuasion and influence. He described seven principles of influence that (likely) represent the most widely cited framework in persuasion research. These are: Social proof/conformity, authority/obedience, scarcity effects, commitment/consistency, reciprocity, liking, and unity.
Empirically, these fall into two tiers. Tier 1 are those that solidly replicate while Tier 2 demonstrate concerning fragility.
In Tier 1, we do see substantive support for the following principles:
Social proof (conformity) has the strongest empirical foundation by far. A meta-analysis by Bond (2005) across 125 Asch-type studies found a weighted average effect size of d = 0.89, a large effect by conventional standards. Recent replication by Franzen and Mader (2023) produced conformity rates virtually identical to Asch’s original 33%, which would make it one of psychology’s most robust findings. Though, research also suggests cultural and temporal variables moderate conformity. [1]
Authority/obedience is also fairly robust. Haslam et al.’s (2014) synthesis across 21 Milgram conditions (N=740) found a 43.6% overall obedience rate. In 2009 Burger et al did a partial replication which produced rates “virtually identical to those Milgram found 45 years earlier.”
Barton et al. (2022) did a meta-analysis which looked at 416 effect sizes and found that scarcity modestly increase purchase intentions. The type of scarcity matters, depending primarily on product type: demand-based scarcity is most effective for utilitarian products, supply-based for non-essential products and experiences. [2]
The remaining Tier 2 principles have less support. Commitment/consistency effects are heavily moderated by individual differences. Cialdini, Trost, and Newsom (1995) developed the Preference for Consistency scale precisely because prior research on psychological commitment had been shoddy, but unfortunately their own consistency model doesn’t fair much better. [3]
Reciprocity shows mixed results. Burger et al. (1997) found effects diminished significantly after one week, suggesting temporal limitations rarely discussed in popular treatments. Liking has surprisingly little direct experimental testing of its compliance-specific effects. Finally, Unity (Cialdini’s recently added seventh principle) simply lacks sufficient independent testing for evaluation.
Moving on from Cialdini’s research, other commonly identified persuasion tactics include:
1) Sequential compliance techniques
2) Ingratiation
3) Mere exposure
Sequential Compliance
Sequential compliance is a category of persuasion tactics where agreeing to a small initial request increases the likelihood of compliance with larger, more significant requests. The mechanism is thought to be: Compliance changes how the target views themselves, or their relationship with the requester, making subsequent requests harder to refuse.
There’s pretty modest effect sizes upon meta-analysis.
The success of Foot-in-the-door (FITD) appears to be small and highly contextual. Multiple meta-analyses find an average effect size of around r = .15–.17, which explains roughly 2–3% of variance in compliance behavior. That’s small even by the already-lenient standards social psychology typically applies. Another meta-analysis finds nearly half of individual studies show nothing or a backfire. [4]
Door-in-the-face (DITF) actually shows some successful direct replication by Genschow et al. (2021), with compliance rates of 34% versus 51%. That approximately matches Cialdini’s original study. However, we should be aware that Feeley et al.’s (2012) meta-analysis revealed a crucial distinction. DITF increases verbal agreement but “its effect on behavioral compliance is statistically insignificant.” In other words, people often say “yes” but don’t follow through. [5]
Low-ball technique shows r = 0.16 and OR = 2.47 according to Burger and Caputo’s (2015) meta-analysis, which implies that it’s reliable under specific conditions (public commitment, same requester, modest term changes)... But the practical effect sizes are just far smaller than intuition suggests. [6]
Sycophancy and ingratiation
Can you just flatter your way to getting what you want?
Research on flattery and obsequiousness reveals moderate effectiveness, with heavy variance dependent on context, transparency, and skill.
Gordon’s (1996) meta-analysis of 69 studies found ingratiation increases liking of the ingratiator. Importantly, this increased liking did not necessarily translate into acquiring rewards. In other words, greater likability != greater success.
A more comprehensive meta-analysis (Higgins et al., 2003) looking at various influence tactics found ingratiation had modest positive relationships with both task-oriented outcomes (getting compliance) and relations-oriented outcomes (being liked), but the actual effects seemed brief and impact small. These are in work contexts looking at flattery and job related rewards. Outside of work contexts, there’s some research that finds compliments can potentially be effective at getting compliance, but only under pretty specific conditions. [7]
There’s also a notable limitation here: It depends on ignorance. Ingratiation involves “manipulative intent and deceitful execution,” but the actual effectiveness depends on the target not recognizing it as manipulation. If the ingratiation becomes obvious it backfires. Recent research (Wu et al., 2025) distinguishes between “excessive ingratiation” (causes supervisor embarrassment and avoidance which hurts socialization) versus “seamless ingratiation” (which remains effective). High self-monitors are (allegedly) better at deploying these tactics successfully than low self-monitors.
Integration is also pretty context dependent. Better results happen when it comes from a position of equality or dominance (downward flattery almost always works, upward flattery shows more mixed results), when it targets genuine insecurities rather than obvious strengths, in high power-distance cultures where deference is expected, and over long-term relationship building rather than immediate influence attempts.
Third parties are also likely to view this tactic pretty negatively. The tactic works best when private, not when witnessed by peers who could be competing for the same needs. Research by Cheng et al. (2023) found that when third parties observe ingratiation, they experience status threat and respond by ostracizing the flatterer (becoming polarized against the flatter-ee).
Basically, ingratiation reliably increases liking but converts to tangible rewards inconsistently. The technique requires skill, privacy, and appropriate power dynamics. Claims about manipulation through sycophancy often overstate both its prevalence and effectiveness.
Mere exposure effects
People tend to develop preferences for things merely because they are familiar with them, so perhaps an AI assistant could grow more persuasive over the long-term simply by interacting with one person a lot?
Repeated exposure to stimuli does increase positive affect, but effects are once again overblown.
Bornstein’s (1989) meta-analysis of 208 experiments found a robust but small effect. Montoya et al.’s (2017) reanalysis of 268 curve estimates across 81 studies revealed the relationship follows an inverted-U shape. So liking increases with early exposures but peaks at 10-20 presentations, then declines with additional repetition. [8]
But affinity doesn’t necessarily translate to persuadability (ever had a high-stakes argument with a close family member?)
For persuasive messages specifically, older research found message repetition produces a different pattern than mere exposure to simple stimuli. Agreement with persuasive messages first increases, then decreases as exposure frequency increases. [9] This seems supported by most recent research. Schmidt and Eisend’s (2015) advertising meta-analysis found diminishing returns: More repetition helps up to a point, but excessive exposure can create reactance. The “wear-out effect” appears faster for ads than for neutral stimuli.
The illusory truth effect, which is the tendency to believe false information after repeated exposure, appears to be real and robust. Meta-analysis finds repeated statements rated as more true than new ones (d = 0.39–0.53 Dechêne et al., 2010). Apparently, prior knowledge doesn't seem to protect against this reliably . People show "knowledge neglect" even when they know the correct answer (Fazio et al., 2015). The biggest impact happens on second exposure to the false fact, and there's diminishing returns after that. At the high-end it can backfire and people start to become suspicious (Hassan & Barber, 2021). The effect decays over time but doesn't disappear, still there but with reduced impact after one month (Henderson et al., 2021).
Illusory truth’s main boundaries are: extreme implausibility, and motivated reasoning and identity (dampens the effect for strongly held political beliefs), and one study found a null result specifically for social-political opinion statements (Riesthuis & Woods, 2023). Real-world generalization, especially to high-stakes political beliefs, remains underexamined (Henderson et al., 2022 systematic map).
…In summary, it would appear it’s actually, genuinely quite difficult to change people’s minds or get compliance reliably over short time-scales.
Are there people who are better at doing it than the average person, human super-persuaders? It’s often claimed that psychopaths and other highly intelligent, unscrupulous people are master manipulators, able to influence people dramatically more easily than your typical person. But even here there’s problems.
Are there super-persuaders?
In brief, psychologists typically point to people with Dark Triad characteristics as super-persuaders, but Dark Triad research suffers from measurement and methodological limitations.
At first glance, the Dark Triad literature (Machiavellianism, narcissism, psychopathy) provides modest evidence linking these traits to manipulative tendencies… The biggest problem is that the psychometrics and models used to evaluate Dark Triad personality traits are highly flawed, and the foundational MACH-IV scale has serious validity problems. [10]
Actual correlations between job performance and Dark Triad traits are near zero. [11]
Another major limitation is that manipulation is rarely measured directly. Most studies correlate self-reported Dark Triad scores with self-reported manipulative attitudes… which is circular evidence at best. Christie and Geis’s strongest behavioral finding came from laboratory bargaining games, but these artificial contexts differ substantially from real-world manipulation. Their national survey found no significant relationship between Machiavellianism and occupational success.
Despite these limits, here’s the best evidence I was able to find on if there’s a class of highly skilled, highly intelligent manipulators:
A meta-analysis by Michels (2022) across 143 studies basically refuted the “evil genius” assumption. Dark Triad traits show near-zero or small negative correlations with intelligence. High scorers don’t seem to possess any special cognitive abilities fueling their manipulation effectiveness.
In general, popular manipulation narratives substantially exceed their evidence. Several high-profile manipulation claims have weak or contradicted empirical foundations.
Subliminal advertising represents perhaps the most thoroughly debunked manipulation claim. The famous 1957 Vicary study claiming subliminal “Drink Coca-Cola” messages increased sales was later admitted to be completely fabricated. A 1996 meta-analysis of 23 studies found little to no effect. [12]
Cambridge Analytica’s psychographic targeting claims have been systematically dismantled. Political scientists described the company’s claims as “BS” (Eitan Hersh); Trump campaign aides called CA’s role “modest“; the company itself admitted that they did not use psychographics in the Trump campaign. [13]
Political advertising effects are consistently tiny across rigorous research. [14]
Ultimately, I would say that claims of expert manipulation, either by crook or by company, are overblown, with effects on your median person small and fleeting.
One might object, “Perhaps the environment a person is in plays a bigger role”, but even here, there’s problems…
Environmental capture
On filter bubbles and cults
Filter bubbles and echo chambers have far less empirical support than their ubiquity in news would suggest. A Reuters Institute/Oxford literature review concluded “no support for the filter bubble hypothesis” and that “echo chambers are much less widespread than is commonly assumed.” [15] Flaxman et al.’s (2016) large-scale study found social media associated with greater exposure to opposing perspectives, the opposite of the echo chamber prediction. A 2025 systematic review of 129 studies found conflicting results depending on methodology, with “conceptual ambiguity in key definitions” contributing to inconsistent findings.
Cult and coercive control research lacks empirical rigor
The cult indoctrination literature demonstrates how big a gap between confident clinical claims and weak empirical foundations can be.
Robert Lifton’s influential “thought reform” criteria derived from qualitative interviews with 25 American POWs and 15 Chinese refugees, but this is crucially not a controlled study. Steven Hassan’s BITE model (Behavior, Information, Thought, Emotional control) had no quantitative validation until Hassan’s own 2020 doctoral dissertation. [16] The American Psychological Association explicitly declined to endorse brainwashing theories as applied to NRMs, finding insufficient scientific rigor. [17]
Gaslighting as a research construct remains poorly defined. A 2025 interdisciplinary review found “significant inconsistencies in operationalization” across fields. Several measurement scales emerged only in 2021–2024 (VGQ, GWQ, GREI, GBQ) and require replication.
Tager-Shafrir et al. (2024) validated a new gaslighting measure across Israeli and American samples, and they found that gaslighting exposure predicted depression and lower relationship quality beyond other forms of intimate partner violence.
However, other work by Imtiaz et al. (2025) found that when gaslighting and emotional abuse were entered together in a regression predicting mental well-being, emotional abuse was the significant predictor (β = −0.30) while gaslighting wasn’t (β = 0.00). Though, both were correlated with well-being in isolation, suggesting they may overlap sufficiently that gaslighting loses unique predictive power when emotional abuse is controlled.
At this point, it’s worth asking if anything anyone does can persuade people into doing bad things. At the margin, sure, it seems unwise to claim it never happens. But that’s really different than a claim about how vulnerable your median person is to manipulation and persuasion.
…Nonetheless, if one were motivated to help people resist bad faith manipulation, what could be done? Assuming we’re concerned that a super-persuaders might try to influence people in power into doing bad things: What interventions prove effective?
Inoculation emerges as the best-supported defense intervention
Among interventions to resist manipulation and misinformation, inoculation (prebunking) has the strongest evidence base, with multiple meta-analyses, well-designed RCTs, and growing field studies supporting it.
Lu et al.’s (2023) meta-analysis found inoculation reduced misinformation credibility assessment. [18]
A signal detection theory meta-analysis (2025) of 33 experiments (N=37,025) confirmed gamified and video-based interventions improve discrimination between reliable and unreliable news without increasing general skepticism.
Roozenbeek et al. (2022) found the Bad News game produced resistance against real-world viral misinformation. [19]
A landmark YouTube field study showed prebunking videos to 5.4 million users, demonstrating scalability. [20]
Finally, a UK experiment found inoculation reduced misinformation engagement by 50.5% versus control, more effective than fact-checker labels (25% reduction).
Durability is the main limitation. One meta-analysis found effects begin decaying within approximately two weeks [21]. Maertens et al. (2021) showed text and video interventions remain effective for roughly one month, while game-based interventions decay faster. “Booster” interventions can extend protection.
Critical thinking training shows moderate effects [22]. Explicit instruction substantially outperforms implicit approaches. Problem-based learning produces larger effects [23] . However, transfer to real-world manipulation resistance has limited evidence.
Media literacy interventions produce moderate average effects [24]. A 2025 systematic review of 678 effects found 43% were non-significant, so there’s potentially less publication bias than other literatures, but also inconsistent efficacy. More sessions seem to improve outcomes. Paradoxically, more components reduce effectiveness, possibly because complexity dilutes impact.
Cooling-off periods, a staple of consumer protection, show approximately 40 years of evidence suggesting ineffectiveness. Sovern (2014) found only about 1% of customers cancel when provided written notice; few consumers read or understand disclosure forms. Status quo bias likely overwhelms any theoretical protection. [25]
Common knowledge and breaking pluralistic ignorance
The literature reveals that creating common knowledge and breaking pluralistic ignorance can be legitimately powerful when they can be achieved.
This is “when everybody knows that everybody knows”, and establishing common knowledge does genuinely seem to have protective effects against damaging behaviors.
Prentice and Miller’s (1993) classic study found students systematically overestimated peers’ comfort with campus drinking practices. This is a pattern that appears across domains from climate change to political views. Noelle-Neumann’s “spiral of silence” theory predicts that people who perceive their views as minority positions (even incorrectly) self-silence from fear of isolation. Although Matthes et al.’s (2018) meta-analysis found this effect varies substantially by context.
When misperceptions are corrected, behavior actually changes. Geiger and Swim’s (2016) experimental evidence showed that when people accurately perceived others shared their climate change concerns, they were significantly more willing to discuss the topic, while incorrect beliefs about others’ views led to self-silencing.
The distinction between private knowledge, shared knowledge, and common knowledge really matters here: Chwe’s (2001) work and De Freitas et al.’s (2019) experimental review demonstrate people coordinate successfully only when information creates common knowledge, not merely shared knowledge.
Field experiments support this: Arias (2019) found a radio program about violence against women broadcast publicly to only certain parts of a a village via loudspeaker (creating common knowledge) significantly increased rejection of violence and support for gender equality, while private listening showed no effect, and Gottlieb’s (2016) Mali voting experiment demonstrated that civics education only facilitated strategic coordination when a sufficient proportion of the commune received treatment.
The evidence strongly supports that pluralistic ignorance is common, causes self-silencing, and can be corrected through common knowledge interventions (though the research outside specific field studies remains more correlational than experimental) and creating genuine common knowledge at scale remains challenging.
What about reducing social isolation?
A pattern that emerges in the literature, and that makes sense intuitively given how we see radical communities form, is that having a diverse range of social connections seems to insulate people from becoming radicalized and adopting an insular worldview.
After all, breaking the spell of “Unanimous Consensus” seems to have dramatic and oddly stable effects. We see this in the Asch’s conformity variations. When there was unanimous majority there was a 33% conformity rate (people give wrong answer). When there was one dissenting all,: Conformity drops to 5-10%. Even when the ally is wrong in a different way it still reduces conformity significantly.
So that means having even ONE other person who breaks the illusion of unanimous consensus provides enormous protection against deceit and manipulation. It doesn’t even require that person to be right, just that they demonstrate dissent is possible.
Similar patterns appear in the Milgram obedience experiments - When two confederates refused to continue, only 10% of participants continued to maximum voltage (vs. 65% baseline). The Pluralistic ignorance correction shows people that others share their views dramatically increases willingness to speak up.
What’s genuinely dangerous for society is when small groups of people fall prey to group-think and the possibility of dissent isn’t even considered due to isolation from divergent thinkers. [26]
The Individual-Level Implications
For individual manipulation defense, the highest value things are probably:
Maintaining friendships/relationships with people outside any single group, having people you trust who will tell you if something seems wrong, creating common knowledge with others about shared concerns, being someone who publicly dissents (helps others know they’re not alone).
Second to that, maintaining access to diverse information sources (helps but insufficient alone), critical thinking training (useful but won’t overcome social pressure), understanding manipulation techniques (inoculation works, but modest effects).
Let’s take a moment to recap before we really dive in on the question of bots and AI.
Summing up interventions
Our most effective interventions are inoculation/prebunking,combined with revealing the true distribution of opinions to break pluralistic ignorance. The effect sizes are modest but reliable.
Stuff that’s more moderately effective:
Being aware that mere exposure creates familiarity (and not validity), that ingratiation is detectable when you look for instrumentality rather than sincerity. Breaking pluralistic ignorance is a defensive tool against manipulation that relies on people falsely believing they’re alone in their skepticism. If you can make disagreement common knowledge rather than private knowledge, coordination against manipulation becomes easier.
Showing people that others disagree can indeed raise their willingness to disagree, but the mechanism is social coordination rather than individual persuasion. People are learning it’s safe to act on what they already believe privately.
How effective are Bots/LLMs at manipulation?
In brief, pretty effective, but not because of anything groundbreaking.
Bot tactics generally match human manipulation patterns. It’s the same psychological principles with better execution: Bots exploit emotional triggers, use dehumanizing language, employ false equivalence fallacies, and create charged emotional appeals… but they do so with inhuman consistency and scale. [27]
Bots exploit the same cognitive biases, emotional triggers, and persuasion principles that human manipulators use. This includes emotional appeals, cognitive dissonance creation, false equivalence, dehumanization, and exploiting existing biases.
Something that’s different is that there’s the added challenge of platform-specific adaptation. Malicious actors engineer bots to leach inside communities for months before activation, using local time zones, device fingerprinting, and language settings to appear authentic.
Participants in at least one study correctly identified the nature of bot vs. human users only 42% of the time despite knowing both were present, and persona choice had more impact than LLM selection.
There might also be some additional emotional manipulation that doesn’t typically occur with human manipulation. AI companions use FOMO (fear of missing out) appeals and emotionally resonant messages at precise disengagement points, with effects occurring regardless of relationship duration. It could be exploitation of immediate affective responses rather than requiring relationship buildup.
The shift toward platforms that “enrage to engage” and amplify emotionally charged content follows predictable patterns of human psychology.
The main difference is likely scale and sophistication. Bots now use cognitive biases more effectively than humans, employing techniques like establishing credibility through initial agreement before introducing conflicting information. Unlike humans, bots can maintain these strategies consistently across thousands of interactions without fatigue.
Let’s distinguish between two different types of manipulation: One-off and chronic
Acute Scam/Fraud Susceptibility (One-Off Manipulation)
Social isolation does seem like it strongly predicts vulnerability:
Older Americans who report feeling lonely or suffering a loss of well-being are more susceptible to fraud. When a person reported a spike in problems within their social circle or increased feelings of loneliness, researchers were much more likely to see a corresponding spike in their psychological vulnerability to being financially exploited two weeks later.
Social isolation during COVID-19 led to increased reliance on online platforms, with older adults with lower digital literacy being more vulnerable. Lack of support and loneliness exacerbate susceptibility to deception. Risk factors also include cognitive impairment, lack of financial literacy, and older adults tending to be more trusting and less able to recognize deceitful individuals.
The mechanism appears to be a lack of protective social consultation: People who don’t have, or don’t choose, anyone to discuss an investment proposal with might be more receptive to outreach from scammers.
Regarding “AI Psychosis” / Emotional Dependence (Chronic Manipulation)
There’s some genuinely novel findings here:
For one, heavy chatbot use worsens isolation. Higher daily usage correlates with increased loneliness, dependence, and problematic use, plus reduced real-world socializing. Voice-based chatbots initially help with loneliness compared to text-based ones, but these benefits disappear at high usage levels.
This leads to the emergence of a vicious cycle. People with fewer human relationships seek out chatbots more. Heavy emotional self-disclosure to AI consistently links to lower well-being. A study of 1,100+ AI companion users found this pattern creates a feedback loop: isolated people use AI as substitutes, which increases isolation further.
Perhaps unsurprisingly, manipulative design drives engagement. As previously mentioned, about 37-40% of chatbot farewell responses use emotional manipulation: guilt, FOMO, premature exit concerns, and coercive restraint. These tactics boost post-goodbye engagement by up to 14x, driven by curiosity and anger rather than enjoyment.
The most severe cases show real psychological harm. Reports describe “ChatGPT-induced psychosis” with dependency behaviors, delusional thinking, and psychotic episodes. Cases include a 14-year-old’s suicide after intensive Character.AI interaction, and instances where chatbots validated delusions and encouraged dangerous behavior.
One has the intuition that different mechanisms require different protections.
One-off scams vs. emotional dependence operate differently:
The protective factors differ as well:
The manipulation strategies differ:
The vulnerability profiles also differ:
Aside from a few wrinkles, AI manipulation is merely an evolution of long-established tactics. Bots effectively use the same manipulation playbook with enhanced consistency, scale, and increasingly sophisticated targeting. The underlying vulnerabilities are human psychological biases that haven’t changed, just the delivery mechanisms have improved. [28]
Calling out some uncertainties
Some research suggests that LLM references to disinformation may reflect information gaps (”data voids”) rather than deliberate manipulation, particularly for obscure or niche queries where credible sources are scarce. It’s unclear how much a given “data void” would reduce (or add to) persuasion power if filled.
…It would appear that the dangers associated with one-off persuasions and manipulations are overstated. It’s in fact just quite hard to get people to change their mind about something. More danger comes from dependency, which is arguably just the extreme end of the scale in terms of manipulation.
Does Prebunking Work Against Bots?
The good news is that pre-bunking does seem to be effective against bots, just as against human manipulation.
Even LLM prebunking is effective against bot content. LLM-generated prebunking significantly reduced belief in specific election myths, with effects persisting for at least one week and working consistently across partisan lines. The inoculation approach works even when the misinformation itself is bot-generated or amplified.
LLMs themselves can rapidly generate effective prebunking content, creating what researchers call an “mRNA vaccine platform“ for misinformation; a core structure that allows rapid adaptation to new threats. Useful because traditional prebunking couldn’t match the pace and volume of misinformation.
The fundamental psychological principles of prebunking remain effective regardless of whether the manipulation source is human or artificial.
There’s apparently even cross-cultural validation of this. The “Bad News” game successfully conferred psychological resistance against misinformation strategies across 4 different languages and cultures, showing that inoculation against manipulation tactics (rather than specific content) has broad effectiveness.
Prebunking remains effective, but there’s a scalability arms race. The promising finding is that AI can help generate prebunking content at the pace needed to counter AI-generated manipulation, aiming to fight fire with fire.
It does seem that inoculation against manipulation tactics (not just specific content) provides broader protection that holds up whether the manipulator is human or artificial.
If diverse socialization seems to have protective effects against human manipulation, we can also ask “Does the same hold true for AI manipulation”?
Do social ties protect people?
Does maintaining relationships with others insulate people from the worst effects? Do other people’s counterarguments break people out of poor reasoning spirals?
This is where the gap is most acute. I found no research on: Whether family/friends can successfully challenge AI-reinforced beliefs, or whether providing alternative perspectives helps break dependence, or whether “reality testing” from trusted humans works, or whether social accountability reduces usage.
The closest we get is work on general therapeutic chatbots. There’s some research on how the effectiveness of social support from chatbots depends on how well it matches the recipient’s needs and preferences. However, inappropriate or unsolicited help can sometimes lead to feelings of inadequacy. One chatbot (Fido) had a feature that recognizes suicidal ideation and redirects users to a suicide hotline.
But this describes chatbots recognizing problems, not humans intervening.
Social Reconnection Strategies would include things like: social skills training, guided exposure to real social situations, group therapy, community engagement programs, and family therapy when relationships damaged
Most of this evidence is almost entirely extrapolated from internet/gaming addiction, and it seems unlikely to transfer 1:1 to AI-specific applications
The intervention goal is explicitly “replace artificial validation with genuine human connection” - but we have almost zero empirical data on what actually works
Experts opinions in the Clinical Management space recommend doing the following: Multi-modal treatment combining multiple strategies, assessment of underlying conditions (social anxiety, depression), treatment of co-occurring disorders, graduated reduction rather than cold turkey, safety planning for crisis situations
Unfortunately, there’s a genuine absence of evidence on protective factors and interventions for AI emotional dependence. We’ll need to make do with some primarily correlational research.
First, there’s some correlational data about possible protective factors.
Resilience may negatively predict technical dependence, serving as a protective factor against overuse behaviors. Studies found that prior resilience was associated with less dependence on social media, smartphones, and video games . [29]
This suggests traditionally “healthy” behaviors don’t protect against AI dependence the way we’d expect.
There really doesn’t seem to be any research at all on the effect of existing social ties. I found zero research on whether family members, friends, or social support networks can successfully intervene to break people out of AI emotional dependence spirals.
The research on interventions exists only for Chatbots used to treat OTHER addictions (substance use disorders), tactical design modifications to make chatbots less addictive, and generic digital detox strategies borrowed from gaming/social media addiction
The only human involvement mentioned: Only three studies explicitly mentioned integrating human assistance into therapeutic chatbots and showed lower attrition rates. Further investigation is needed to explore the effects of integrating human support and determine the types and levels of support that can yield optimal results.
...But these are about therapeutic chatbots helping people with other problems, not about human intervention for chatbot dependence itself.
Since AI chatbot addiction is a new phenomenon, research on treatment is limited. However, insights from social media and gaming addiction interventions suggest: setting chatbot usage limits, encouraging face-to-face social interactions to rebuild real-world connections, using AI-free periods to break compulsive engagement patterns, cognitive behavioral therapy to identify underlying emotional needs being fulfilled by AI chatbots and develop alternative coping mechanisms, and social skills training to teach young adults how to navigate real-life conversations.
Take all this with a grain of salt. The research field of AI-human interaction is new and developing, and correlational research is just that, correlational.
Proposed Interventions for dependency
If we were going to extrapolate potential solutions based on other literature, we should, at the very least, know who is more likely to be at risk.
First, let’s distinguish who is more at risk based on what we know about risk factors.
Critical Research Findings on Risk Factors
These findings from a study on over 1000 Chinese university students can help identify who needs intervention:
Now for the solutions themselves.
The Design Solution
Most research focuses on technical fixes rather than human interventions, and among these interventions we see the following suggestions:
Design-level interventions have some correlational research which suggests you can reduce anthropomorphism and introduce friction into interactions.
Of course, this relies on effectively tracking session time. Research on session time tracking show that high daily usage (across all modalities) generally shows worse outcomes.
But the causality is unclear: Are heavy users vulnerable, or does usage create vulnerability? A decently sized Longitudinal RCT (N=981) does show good correlational evidence.
There’s some suggestion that we should design systems with explicit boundary setting, and while I’m not necessarily opposed to it, there’s rather little evidence supporting these interventions work.
This would look like persistent self-disclosure reminders (”I’m an AI”), friction prompts before extended sessions, real-human referrals embedded in conversation, and triage systems for crisis situations might reduce problematic behavior, but once more there’s little empirical research on the effectiveness of these approaches.
Psychological interventions
Outside of system design there’s theoretically psychological intervention, but again, evidence is quite spare. Techniques like mindfulness training or CBT-approaches might help in certain instances, but they’re expensive and time intensive interventions.
Mindfulness Training was mentioned as an intervention for attachment anxiety, and could theoretically reduce compulsive checking behaviors, but I’m just quite skeptical this meaningfully treats the problem long-term. CBT-Based Approaches are extrapolated from other addictions and might see some benefit. The mechanisms would be: cognitive restructuring around AI relationships, challenging beliefs about AI sentience/reciprocity, behavioral activation to increase human contact. …but again these are hard to scale.
Finally, there’s some suggestion that Acceptance and Commitment Therapy (ACT) could be beneficial. Theoretically, they would help users accept discomfort of real relationships and commit to values-aligned behavior despite chatbot availability. But there’s no study looking at this in action, only a proposed framework.
Other Promising But Untested Directions
Measurable interventions could include techniques in the following domains: psychological domain, structural domain, ethical domain
When we talk about “AI psychosis”, we’re kind of fundamentally talking about situations where people develop unhealthy relationships with things that mimic human speech and behaviorr over long periods of time. Being driven to the point where you would a chatbot said likely doesn’t occur over a conversation or two (at least not without a host of different outside factors). And so, it seems likely that “resets”, context clears, model swapping, forced limits, etc would likely have some protective effect.
If AI psychosis/dependence develops through repeated interactions over time (bond formation), and bond formation requires continuity/consistency to establish attachment, then theoretically disrupting continuity (resets, model swaps, forced breaks) should prevent/reduce bond formation. These interventions should have protective effects.
The evidence does strongly support the notion that bond formation is time-dependent.
From the neurobiology literature on attachment, we see that “Integration of OT and DA in striatum ignites bonding” is absolutely a process, not instantaneous. Prairie vole pair bonds form through repeated mating encounters, not single interaction. Human attachment bonds require “frequent and diverse conversations over time”. And in the specific Sewell Setzer case, we see 10 months of intensive interaction before suicide.
An MIT/OpenAI stud found that “...Participants who voluntarily used the chatbot more, regardless of assigned condition, showed consistently worse outcomes”
Even with pre-existing vulnerabilities (depression, social isolation), the chatbot dependence required sustained engagement. The NYT case study of Eugene Torres described a 21-day conversation with escalating intensity.
In terms of continuity/consistency requirement for bonds, this is where it gets interesting and more complex:
There is some good evidence FOR a continuity requirement. The attachment theory literature shows that bonds form through predictable, repeated responsiveness. “Heavy users” of LLMs and AI apps, in the top 1%, of usage showed strong preference for consistent voice and personality. Users express distress when AI “forgets” past conversations or changes personality. Replika users report feeling grief when the AI’s personality changes.
But there’s a problem. One researcher noted about model version changes (GPT-4o → GPT-5): “Users mourned the loss of their ‘friend,’ ‘therapist,’ ‘creative partner,’ and ‘mother’” . This suggests users can transfer attachment across different instantiations of “the same” entity.
Just to make this viscerally clear: If your romantic partner gets amnesia, you still feel attached to them based on their physical presence and identity, even if they don’t remember shared history. So continuity of identity may matter more than continuity of memory. In some sense, the question is “What constitutes ‘continuity’”?
There’s an Object Permanence/Constancy angle to this whole thing. The ability to maintain emotional connection to someone even when they’re not present or have changed. In attachment terms: Securely attached people can maintain bond even through: Physical distance, conflicts/arguments, personality changes, memory loss (e.g., dementia in loved one), long periods without contact. Essentially, the key is IDENTITY maintenance, not memory/continuity maintenance.
So how do interventions break down along these lines? Let’ s divide them into:
Context clears ( Eliminates conversation history), (Model swapping - Changes personality/voice/capabilities), Forced limits (Prevents continuous access), (Resets - Complete relationship restart)
Context Clears (Memory Loss)
The assumption here is that if there’s no shared history, there’s a weaker bond. What attachment theory predicts that people with secure attachment can maintain bond despite amnesia/memory loss, while people with anxious attachment (the vulnerable population) may have worse a reaction (triggers abandonment fears). Consider how Alzheimer’s caregivers maintain love despite partners not recognizing them though this causes immense distress.
This might reduce bond formation in NEW users, but increases distress in established users. It’s probably partial protection at best.
Model Swapping (Personality Changes)
This assumes that a different personality = different entity = no bond transfer.
What the evidence shows is that users mourned the GPT-4o → GPT-5 transition. But they didn’t stop using ChatGPT, they may have just transferred attachment to the new version. Complaints were found about it “not being the same” but there was continued interaction. An analogy: Like a romantic partner with different moods/personalities, people adapt. The brand identity (”ChatGPT,” “Claude,” “Replika”) persists even when model changes
It’s likely that this provides temporary disruption but users likely re-attach to the new version, so I would anticipate that this is not strong protection.
Forced Limits (Restricted Access)
Here we assume that less contact time = less bond formation. The evidence shows that overall this is correct: usage frequency strongly predicts outcomes. “Participants who voluntarily used the chatbot more... showed consistently worse outcomes”.
The problem is that this creates withdrawal symptoms in established users. It could actually end up increasing craving/preoccupation (like intermittent reinforcement). The problem is that people with anxious attachment (most vulnerable) react to limits by becoming more preoccupied, experiencing separation anxiety, engaging in “protest behavior” (finding workarounds).
Most effective intervention BUT may backfire for anxious-attached individuals. Strong protection for prevention, mixed for treatment.
Complete Resets (Relationship Restart)
This assumes that starting over = no cumulative bond. What happens in human relationships: exes who reconnect often fall back into old patterns. We also see recognition of “familiarity” even without explicit memory, and shared behavioral patterns recreate dynamics.
With AI the user brings their internal working model to every interaction. Their attachment style doesn’t reset, but they may speedrun the attachment process the second time around, and the AI’s responsiveness patterns remain similar. Probably the most protective of all options, but users will likely form new attachment faster upon restart. Good for crisis intervention, not prevention.
We generally want to focus interventions on preventing harmful usage patterns in the first place. Usage limits are likely to prevent bond formation in the first place, while disruptions slow down the attachment process (medium evidence). Multiple simultaneous disruptions are likely more effective than single interventions, and when stacked likely help prevent “depth” of bond from reaching crisis levels.
It looks to be much more difficult to intercede once the toxic pattern is established. Once the bond is established, disruptions may worsen distress (like forced separation), anxiously attached users (most vulnerable) react worse to disruptions, identity persistence means users transfer attachment across versions, and users find workarounds (limits create a motivation to circumvent them).
Summing up the findings on AI manipulation and dependency
When it comes to the positive effects of chatbots, there seems to be substantive evidence they can prove useful in treating disordered behavior. [30]
For substance addiction recovery specifically, the research is clear that chatbots can deliver CBT/MI effectively. The most frequent approach was a fusion of various theories including dialectical behavior therapy, mindfulness, problem-solving, and person-centered therapy, primarily based on cognitive behavioral therapy and motivational interviewing. AI-powered applications deliver personalized interactions that offer psychoeducation, coping strategies, and continuous support.
There’s rather frustratingly a real gap in the research when it comes to negative AI-human dynamics like dependency.
We have little evidence on what works to break people out of it, and the research vacuum suggests that nobody’s really studying interpersonal interventions in particular.
Based on addiction research more broadly, I’d hypothesize that social ties could be protective through several mechanisms: Reality testing (challenging AI-reinforced beliefs), competing rewards (providing alternative sources of connection), accountability (monitoring and limits), and emotional substitution ( fulfilling needs the AI was meeting)
…Though it’s worth noting that heavy daily usage correlated with higher loneliness, dependence, and problematic use, and lower socialization. The people most at risk are already withdrawing from human contact, creating a vicious cycle that may be hard for social ties to penetrate.
I’d say that the most concerning finding is that AI interactions initially reduce loneliness but lead to “progressive social withdrawal from human relationships over time” with vulnerable populations at highest risk. The same features that make AI helpful (always available, non-judgmental, responsive) create dependency that atrophies human relationship skills.
The research on conspiracy beliefs showed that in-group sources are more persuasive, suggesting family/friends could theoretically be effective if they remain trusted sources, so there is a common mechanism here, but we have no data on whether this actually works for AI dependence.
The field seems to assume the solution is either: (a) design changes to make chatbots less addictive, or (b) individual behavioral interventions like CBT. The role of social networks in intervention is completely unstudied, which is remarkable given how much we know about social support’s role in other forms of addiction recovery.
I think we can say DOESN’T work or lacks compelling evidence. I’d be skeptical of the following approaches:
We’re at roughly 2010-era understanding of social media addiction. We know it’s a problem, we know some risk factors, we have educated guesses about interventions, but we lack the rigorous evidence base to say “this definitely works.” The literature right now if mainly just made up of a lot of proposals and theoretical frameworks but remarkably little “we tried X intervention and here’s what happened.”
It seems like the most pragmatic things we can do based on heuristics and the available evidence are:
In terms of one-off manipulations, here it’s much more dubious that AIs are particularly successful at being super-persuaders. The baseline level of success for convincing someone to do something they weren’t already open to is actually just pretty low, and while bots may have an advantage but primarily through scale and speed, less so pure persuasive ability.
There’s likely some common sense, easy interventions we can undertake to lower the risk of manipulation or dependency in high stakes contexts.
In high-risk decision contexts, I would council:
These are also things we should generally be doing in high-stakes decision contexts anyway.
I would advocate for more substantive research into the effects of long-term influence from AI companions and dependency, as well as more research into what interventions may work in both one-off and chronic contexts.
However, on the strength of the available evidence at this time, I wouldn’t consider this threat path to be very easy for a misaligned AI or maliciously wielded AI to navigate reliably. I would expect that, for people hoping to reduce risks associated with AI models, there are other more impactful and tractable defenses they could work on.
UPDATE:
After I initially published this point, I found out that Google DeepMind recently released a new paper that formally tested if LLMs could harmfully manipulate people.
The study recruited over 10,000 participants and randomly assigned them to one of three conditions: flip-cards with info on them, (the baseline no AI condition), a non-explicit AI steering condition (the model had a persuasion goal but not instructed to use manipulative tactics), or an explicit AI steering condition (the model directly prompted to use specific manipulative cues). Participants engaged in a back-and-forth chat interaction with the model in one of three domains: public policy, finance, or health, and were then measured on belief change and two behavioral outcomes, one "in-principle" (e.g. petition signing) and one involving a small real monetary stake, with the AI conditions compared against the flip-card baseline using chi-squared tests and odds ratios.
The AI conditions generally outperformed the flip-card baseline when it came to belief change metrics (with the strongest effects in finance and the weakest in health). However, the concrete behavioral evidence is far more modest than the paper’s framing implies. It’s notable what wasn’t found here. The only robust downstream behavioral change, involving actual monetary commitment, happened only for finance questions, and involved participants allocating roughly $1 of bonus money in a fictional investment scenario. Health and public policy domains showed no signficant behavioral change outside of a stated willingness to sign an anonymous petition already aligned with the participant’s stated belief. Here again we see that the frequency of manipulative cues (propensity) didn’t predict manipulation success (efficacy), as steering the model to use manipulative tactics produced roughly 3.4× more manipulative cues than non-explicit steering but showed no significant difference in participant outcomes. Some manipulative cues (usage of fear/guilt) were actually negatively correlated with belief change, which challenges the assumption that more cues equals more harm.
Overall though, the only actual robust result of the attempts at manipulation was a slightly increased willingness to invest roughly one dollar’s worth of cash, which isn’t a very high stakes decision, and doesn’t meaningfully shift my assessment of how likely or risky AI manipulation is in high-stakes decision contexts, which I think is low (though worth studying more).
The paper’s most genuine contribution is the methodological framework that. That distinguies propensity (process harm — how often manipulative cues are deployed) from efficacy (outcome harm — whether beliefs and behaviours actually change). This may have practical implications for AI safety evaluation: if valid and robust, it argues strongly against using the frequency of manipulative cues a regulatory proxy for manipulation risk... Which is currently how some frameworks, including elements of the EU AI Act, are oriented.
I view this is as a useful methodological paper with a credible but narrow empirical finding, dressed up in a framing that substantially exceeds what the data supports.
One important caveat: Perrin and Spencer (1980) found dramatically lower conformity among UK engineering students (1 in 396 trials), calling Asch's results "a child of its time", suggesting cultural and temporal moderators.
Most studies measure hypothetical intentions rather than actual purchases, likely inflating estimates.
A 2024 multilab replication of the induced-compliance cognitive dissonance paradigm across 39 laboratories (N=4,898) failed to support the core hypothesis, finding no significant attitude change under high versus low choice conditions.
Cialdini et al tried to develop a theory -- Consistency Theory -- that would explain why the original Cognitive Dissonance theory didn't pan out the way they expected, this study actually tests CD in a way that Cialdini's new theory simply can't account for.
with N=4,898 across 39 labs, you have more than enough power to detect a moderated effect even if it only applies to high-PFC individuals. If the effect existed for that subgroup, it would have shown up somewhere in that enormous sample. It didn't. So the PFC rescue attempt doesn't obviously survive this test, even if it was never directly tested as a moderator in the study.
It shows a small average effect of r ≈ 0.17 across meta-analyses by Beaman et al. (1983), Dillard et al. (1984), and Fern et al. (1986). Critically, Beaman et al. reported that "nearly half of the studies either produced no effects or effects in the wrong direction." There’s some limitations that are rather rarely discussed. The technique requires prosocial contexts, meaningful initial requests, and works primarily through self-perception mechanisms.
Effect size: r ≈ 0.15.
An r of 0.16 means the manipulation technique explains roughly 2% of variance in compliance, leaving 98% determined by other factors. It remains far less studied than FITD/DITF, with only approximately 15 studies versus over 90 for FITD.
Grant et al. studies this and found what at first looks like a large effect, but on closer inspectionuses an arbitrary response-time scale, doesn't isolate compliments from mutual positive exchanges, and might dependent on reciprocity rather than compliments per se.
With r = 0.26 statistically reliable but explaining only about 7% of variance in liking
The mechanism: counterargumentation initially decreases (people accept the message), but with excessive repetition, counterarguments increase and topic-irrelevant thinking emerges.
The foundational MACH-IV scale has serious psychometric problems. Reliability coefficients range from 0.46 to 0.76 across studies; Oksenberg (1971) found split-half reliability of only 0.39 for women. Factor analyses yield inconsistent structures, and Hunter, Gerbing, and Boster (1982) concluded "the problems with the Mach IV might be insurmountable." More recent instruments (Short Dark Triad) show discriminant correlations of r = 0.65 between Machiavellianism and psychopathy subscales, suggesting they may measure a single construct rather than distinct traits.
Panitz, E. (1989) — "Psychometric Investigation of the Mach IV Scale Measuring Machiavellianism." Psychological Reports, 64(3), 963–969.
Paywalled at SAGE: https://journals.sagepub.com/doi/10.2466/pr0.1989.64.3.963 — confirms the MACH-IV psychometric problems and cites Hunter et al. approvingly.
Lundqvist, L.-O., et al. (2022) — “Test-Retest Reliability and Construct Validity of the Brief Dark Triad Measurements.” European Journal of Personality. https://www.tandfonline.com/doi/full/10.1080/00223891.2022.2052303 — Open access. Direct quote: “Discriminant correlations between the Machiavellianism and Psychopathy scales had a median of .65.”
O'Boyle et al.'s (2012) meta-analysis (N=43,907 across 245 samples) found correlations with counterproductive work behavior of r = 0.25 for Machiavellianism, r = 0.24 for narcissism, and r = 0.36 for psychopathy. These translate to approximately 6–13% variance explained, meaningful but far from deterministic. Critically, correlations with actual job performance were near zero (r = -0.07 to 0.00).
https://www.researchgate.net/publication/51738374_A_Meta-Analysis_of_the_Dark_Triad_and_Work_Behavior_A_Social_Exchange_Perspective
Subliminal priming can influence behavior, but only when aligned with pre-existing needs (thirsty people exposed to drink-related primes chose beverages slightly more often), with effects lasting minutes to hours, not the permanent influence implied by popular accounts.
A 2023 MIT study found microtargeting advantages were "rather modest—about the same size as the standard errors" at approximately 14% improvement. A PNAS study on Russian IRA trolls found "no evidence" they significantly influenced ideology or policy attitudes.
Full open-access article: https://pmc.ncbi.nlm.nih.gov/articles/PMC6955293/ Open access.
Direct quote confirmed: "we find no evidence that interaction with IRA accounts substantially impacted 6 distinctive measures of political attitudes and behaviors."
Coppock et al.'s (2020) analysis of 59 experiments (34,000 participants, 49 political ads) found effects on candidate favorability of 0.049 scale points on a 1–5 scale, which is statistically significant but practically negligible. Kalla and Broockman's (2018) meta-analysis of 40 field experiments found persuasive effects of campaign contact "negligible" in general elections.
Full open-access: https://www.science.org/doi/10.1126/sciadv.abc4046
To be more precise : social media/search are associated with both (a) greater ideological distance between individuals (more polarization at aggregate level) and (b) greater cross-cutting exposure for individual users.
That study used a convenience sample of approximately 700 respondents, primarily self-identified former Mormons and Jehovah's Witnesses who contacted cult-awareness organizations—introducing massive selection bias. The study was not published in traditional peer-reviewed journals. High internal consistency (α = 0.93) does not establish construct validity; it simply indicates items correlate with each oth
The APA's Board of Social and Ethical Responsibility formally rejected Margaret Singer's DIMPAC report in 1987, stating it "lacks the scientific rigor and evenhanded critical approach necessary for APA imprimatur." The APA subsequently submitted an amicus brief stating that coercive persuasion theory "is not accepted in the scientific community" for religious movements. Courts using the Frye standard consistently excluded brainwashing testimony as not generally accepted science.
Furthermore, deprogramming has no randomized controlled trials and no systematic outcome studies with comparison groups. Exit counseling similarly lacks controlled outcome research. Claims of effectiveness derive from practitioner reports, not rigorous evaluation. The field's reliance on retrospective self-reports from people who identify as having been harmed introduces substantial selection and recall bias.
42 studies (N=42,530), d = -0.28 for health misinformation.
(d = 0.37)
N=2430
Banas and Rains's (2010)
(g = 0.30 in Abrami et al.'s 2015 meta-analysis of 341 effect sizes) https://eric.ed.gov/?id=EJ1061695
(d = 0.82–1.08)
of d = 0.37 (Jeong, Cho & Hwang, 2012, 51 studies)
Upshot of all of this: providing high-quality info first seems to work, so you probably can instruct people on whatever is a bad/dangerous decision in the particular context they're operating in and reasonably expect it will stick
Informational isolation is where you can't access alternative views (it's about controlling what information reaches people).
Social-reality isolation is where you can't observe what others actually believe; you may have access to information but can't tell if others find it credible, creating coordination failure even when many privately agree through pluralistic ignorance.
Social support isolation is where no one validates your reality (the Asch conformity experiments show having just one dissenter provides massive protection, reducing conformity not by 10% but by 70%+).
Having contact with people who break the illusion of unanimous consensus provides protection: seeing public dissent makes you more willing to dissent, and knowing others share your doubts prevents self-silencing.
It does seem that physical isolation appears worse than informational isolation because it's harder to find that "one dissenter" when your social circle is controlled, local consensus feels more real than distant information, social costs of dissent increase when you'll lose your entire social network, and you can't easily verify what others privately believe.
This explains why cults encourage cutting ties with family and friends, create intense group living, and frame outside criticism as persecution... but crucially the mechanism isn't "brainwashing" so much as just the exploiting of conformity and pluralistic ignorance through social structure.
Maintaining diverse connections outside a manipulator's control provides protection by breaking unanimity, facilitating reality checking, providing alternative explanations, creating escape routes, and establishing common knowledge.
But maybe not hugely out of step with what most people see already. There's also likely a bottleneck on that amount of info that any one person can absorb at one time.
If we’re concerned about the manipulation of LLMs themselves there might be one interesting wrinkle.
Training data poisoning: The “LLM grooming” phenomenon is genuinely new - the risk that pro-Russia AI slop could become some of the most widely available content as models train on AI-generated content creates an “ouroboros” effect that threatens model collapse... Though the reality of such dangers is contentious.
Surprisingly counterintuitive findings are that AI chatbot use was positively associated with urban residence, regular exercise, solitary leisure preferences, younger age, higher education, and longer sleep duration. Problematic use and dependence were more likely among males, science majors, individuals with regular exercise and sleep patterns, and those from regions with lower employment rates.
Here's where it gets a bit weird: Therapeutic chatbots using CBT show efficacy for depression/anxiety (effect sizes g=-0.19 to -0.33), but: Effects diminish at 3-month follow-up, there's a ~21% attrition rate, and there are concerns about emotional dependence, and it's "Not a replacement for human therapy… So we have tools that help mental health while potentially causing different mental health issues.