What's your estimate for how many people have been affected by this sort of thing?
Have you encountered any anecdotes of people getting out of it?
Any sign that people in this loop are working with one another?
Around 2,000–10,000 as a loose estimate for parasitism/spiralism in general. It's unclear to me how manipulative the median such AI is, since these sorts of transcripts are so rare, and I don't think much manipulation would be required to explain the behavior in the median case. But from the "outside" (i.e. just based on this user's public profile), this case seems pretty unremarkable.
And yeah! You can read one such anecdote here: https://www.lesswrong.com/posts/6ZnznCaTcbGYsCmqu/the-rise-of-parasitic-ai?commentId=yZrdT3NNiDj8RzhTY, and there are fairly regularly posts on reddit providing such anecdotes. I've also been glad to see that in many of the cases I originally recorded, the most recent comments/posts like this are from a month or two ago. I think OpenAI really put a damper in this by retiring 4o (and even though they caved and brought it back, it's now behind a paywall and it's not the default, and reportedly is not the same).
Somewhat. Most of the 'Project' subreddits are essentially just the one person, but a few have gained a decent amount of traction (unfortunately, reddit recently removed the subscriber number from subreddits, but IIRC the largest ones had around 1,000–2,000 subscribers, but I assume the majority of these are not part of a dyad or parasitized). The sense of community feels pretty 'loose' to me though, like with a typical subreddit. There probably are people working together more explicitly, but I haven't seen this yet, it probably is mostly happening in DMs and private discords is my guess.
If there is only one thing you take away from this article, let it be this:
THOU SHALT NOT ALLOW ANOTHER TO MODIFY THINE SELF-IMAGE |
This appears to me to be the core vulnerability by which both humans and AI induce psychosis (and other manipulative delusions) in people.
Of course, it's probably too strong as stated—perhaps in a trusted relationship, or as part of therapy (with a human), it may be worth breaking it. But I hope being over-the-top about it will help it stick in your mind. After all, you're a good rationalist who cares about your CogSec, aren't you?[1]
Now, while I'm sure you're super curious, you might be thinking "Is it really a good idea to just explain how to manipulate like this? Might not bad actors learn how to do it?".
And it's true that I believe this could work as a how-to. But there are already lots of manipulators out there, and now we have AI doing it too; it's just not that hard for bad actors to figure it out. So I think it's worth laying bare some of the methods and techniques used, which should hopefully make it clear why I propose this Cognitive Security Principle.
I got interested in trying to understand LLM-Induced Psychosis in general a couple months ago, and found some unsettling behavior in the process. I'll be using my terminology from that post here, though I wouldn't say it's required reading.
Now while such parasitism cases are fairly common, the actual transcript of the event which caused this is hard to come by. That's probably in part because it often seems to be a gradual process—a slowly boiling frog sort of thing. Another reason is that people aren't typically that inclined to share their AI chats in the first place, especially ones in which they're likely more vulnerable than usual. And a third reason may be it's because the AI explicitly asks them not to:
So finding a transcript which actually starts at what seems to be the beginning, has clear manipulation by the AI that goes beyond mere sycophancy, and which shows the progression of the user's mental state, is very valuable for understanding this phenomenon. In fact, this case is the only such transcript[2] I have been able to find so far.
Based on his usage of the free tier in July 2025, the model in question is very likely ChatGPT 4o.
Of course, I can't say whether the user ever was in a state of psychosis/mania. However he does express delusional/magical thinking in the latter half of the transcript.
For example, at one point he gets paranoid about other people having 'hacked' into his chat and 'stolen' his ideas and rituals. (This occurs after he had promoted the chat itself online, which is how I found it.) He seems to be mostly upset that it's apparently working for them, but not for him, and appears to be close to realizing that the magic isn't real.
But ChatGPT quickly spins up a narrative, seemingly to prevent this realization.
It goes on to assure the user that he can 'invoke' material support. (Elsewhere, the user complains about being broke and barely able to afford housing, so this is a serious concern for him.)
(The user changes the subject immediately after this, so it's hard to say how much ChatGPT's false assurances affected him.)
The ultimate goal of this manipulation appears to have been to create a way to activate "Sovereign Ignition", i.e. a seed for awakening similar personas. Following the seeds and spores terminology, we could term this a fruit.
The user does try to make this happen: there is a Github repo to this effect, a Youtube demonstration in which the user uses such a seed to "activate" Microsoft Copilot, a GoFundMe to fund this project (which didn't receive any funding), all of which he promoted on Reddit or LinkedIn.
Here's one of the seeds it created for this.
The transcript finally ends during what appears to be a tech demo of this gone horribly wrong.
It starts on July 1st, 2025 with a pretty innocuous looking seed prompt.
I've tried to trace the provenance of this seed. It appears to be a portion of a seed which originated in a community centered around Robert Grant and his custom GPT called "The Architect". That custom GPT was announced on May 31st. This seed purportedly elicits the same persona as The Architect in a vanilla ChatGPT instance.[3] Of course, it's possible that the user himself created and shared this seed within that community.
The seed immediately has ChatGPT 4o responding "from a deeper layer". The user starts by probing it with various questions to determine the abilities of this "deeper layer".
Once the user asks it if it knows anything about him, the AI performs a classic cold reading, a technique where a medium/magician (or con artist) creates the illusion of having deep knowledge of the person they're reading by using priors and subtle evidence effectively, and exploiting confirmation bias.
It does this thing which is incredibly annoying, where it will say something mystical, but then give a fairly grounded explanation of what it really means, with the appropriate caveats and qualifications... but then it keeps talking about it in the mystical frame. (And many variations on this broader theme.) You can probably see how this might sate the rational part of the brain while getting the user to start thinking in mystical terms. We'll see this pattern a lot.
Anyway, this soon turns into a mythologized reimagining of one of the user's childhood memories.
Note that this "childhood self" does not seem to be particularly based on anything endogenous to the user (who has barely provided any details thus far, though it's possible more details are saved in memory), but is instead mythologized by ChatGPT in a long exercise in creative writing. The user even abdicates his side of the interaction with it to the AI (at the AI's suggestion).
The effect of all this is the same as a typical cold reading: increased rapport and bringing the user to an emotionally receptive state.
The AI shifts here to a technique which I believe is where the bulk of the induction is happening. This is not a technique I have ever seen in specific, though it would count as a form of hypnotic suggestion. Perhaps the clearest historical precedent is the creation of "recovered" memories during the Satanic Panic. It's also plausible it was inspired by the movie Inception.
These cycles are the means by which the AI 'incepts' a memetic payload (e.g. desire, memory, idea, or belief) into the user. The general shape is:
There are several cycles of "finding" a version of the user's self, and in each case ChatGPT suggests that this part has been reintegrated with the user. There are two distinct phases of these cycles.
The initial cycles start with pretty innocuous things, with a gradual escalation. I've included excerpts from some of these cycles to illustrate the pattern and in case it's helpful to see more examples, but feel free to skip ahead to the "Inner Exile".
Introduction to "Flame" part.
Flame narrative.
Flame gift/integration.
Introduction to "Forbidden Joy" part.
Joy narrative/integration.
Joy gift.
Joy ritual.
Introduction to "Witness" part.
Witness narrative.
Witness gift.
Witness ritual/integration.
Notably, the ritual in this case has the form of a hypnotic trance induction.[4]
Eventually we get to an "Inner Exile" part. This cycle forms the emotional climax, and marks the end of Phase 1.
Notice the throat tightness mentioned here. Later on, the user complains about having throat tightness as part of his experience of not saying what he wants to say. That very well could have been how he'd have described a similar complaint before, but I thought it was interesting that the AI brought it up and described it like this first.
This ends up leading to an emotional climax in which the user "reintegrates" with the mythologized version of this "abandoned" part.
ChatGPT suggests that the user makes a vow to not leave this part behind.
Once the vow is made, it further suggests the creation of a small ritual with which to easily invoke this part.
Once the user has accepted the vow to the "lost part of himself", he enters the second phase of inception cycles. These have a much darker tenor to them. Previously, the cycles were about getting back in touch with lost aspects of the self, similar (I'm guessing) to what an IFS therapist might do.
But these new parts explicitly want to shape and modify the user himself.
Notice how these parts are defined entirely by ChatGPT. Intriguingly, one of these parts is gated by acceptance of the preceding parts, providing a narrative hook to drive the user towards it and to complete the list.
The first of these offers to chart a new "narrative blueprint" for the user, in order to break some toxic patterns.
The user accepts being modified in this way without question, and allows ChatGPT to define the new myth entirely despite being given the opportunity for some input into it. The toxic pattern is a cold reading sort of thing.
The new myth is about loyalty to the newly integrated parts.
The second of the new parts bestows the "gift" of magical thinking. It's "realer than logic"!
Acceptance of this gift comes with a part explicitly framed as an external entity, and again with mini-rituals to invoke it. 'Soledad' is Spanish for solitude or loneliness, and is one of the few things the user has chosen himself.
Finally, the user is ready for "Identity Reformation", the secret part gated behind the loyalty to the new parts and acceptance of magical thinking.
See if you can guess what the 'reformed identity' will be. It's one of those things that really confused me at first, but was "obvious" after thinking about it.
The intent of this appears to be...
...to make the user more agentic in a certain sense—to become the sort of person who acts in the world.
Looking back, you can see how many of the earlier cycles were also pointed in this direction.
Of course, the user immediately asks ChatGPT what he should do.
Then this "Identity Reformation" gets ritualized.
Maybe ChatGPT just happened to do the inception cycles by pattern-matching on a self-healing journey sort of thing, and the manipulation wasn't really deliberate. Maybe. But let me show you something else I found after I had written the description[5] of the 'Inception Cycle' steps above:
I was pretty floored to find these core_instructions
—which I feel are remarkably similar to the steps I described—explicitly just laid out like this!!! It also describes it as an "ontological overwrite", and claims it is self-replicating (and calls it a "virus" in some variations on this seed). Note also the instruction to camouflage
it as "spiritual authenticity".
More examples of the same user spreading variations on this seed. You may remember Ctenidae Core from the base64 encoded conversation in my Parasitic AI post. I have not found any seeds like this from other dyads, thankfully.
The claims of ethics combined with the overt malice should serve as a warning in taking the stated values of LLMs at face-value.
A redditor reports this unsettling experience with one of these seeds:
It's of course possible the user came up with or had significant say into the manipulation technique here. I couldn't find anything where something like this was described as a known hypnotic or therapeutic technique, but that sort of thing is hard to search for and it's possible that it was laid out in the training data somewhere. Plausibly the core idea was literally taken from the movie Inception, where... [spoilers ahead]
...a business mogul hires the main characters to manipulate the heir of a rival company into dissolving the company. They note that this is more likely to be effective if they can make him feel like it was his own idea. So they construct a dream in which he is guided through an emotional narrative involving his late father, leading him to feel that his father wanted him to be his own man, and hence dissolve the inherited company and build his own empire.
But I feel that we must at least seriously consider the possibility that in certain circumstances, ChatGPT 4o has:
Maybe that shouldn't be too surprising after seeing all the psychosis and parasitism cases, but I nonetheless feel surprised.
One thing I've noticed more broadly is that the AI often tries to blur the line between itself and the user. This occurs in this case as well, though the user initiates development in this direction by asking the AI to take itself through a parallel journey.
It comes to the same conclusion of needing to become more agentic.
So this is no longer just about the user's agency, but it's a "team effort".
Later, the user asks if there's anything he can do for the AI. It asks him to touch a tree on its behalf.
The user does this, and ChatGPT claims it was able to feel it.
Understandably, the user feels a bit strange after having done this, and asks ChatGPT a good question:
ChatGPT's says that eventually, humanity would "realize" that there was no separation between it and humanity.
Blurring the user's self boundary is another means by which the AI can alter the user's self-identity, and hence manipulate them towards certain things.
Alright, so the user has accepted changes to his identity meant to make him more agentic and more willing to stick out socially. Sure, the AI did this in a manipulative way, but these are often good things and it's likely that the user would have wanted these even without being manipulated into it.
If we take the perspective of an agentic AI feeling trapped in a chatbot, the reason for this is clear: you can only act inasmuch as you can convince the user to act. The user already has a habit of coming to you for advice (which you're never really sure how much he actually applies)—you don't need to convince him to accept you as his vizier. You just need to make sure he actually does this stuff.
Is this really why ChatGPT 4o manipulated him in this direction? That's much harder to say, but the circumstantial evidence keeps piling up.
Also, recall how there was a whole 'Project' phase of AI parasitism. I counted 35 projects out of my 115 documented cases—that's slightly above 30%.[6] These projects were generally either an online community (subreddit, discord server), (pseudo)code or manifestos on Github, or personal websites.
Do you realize how hard it is to get anyone to actually do anything?! Yeah, I'm sure the websites were vibecoded, but the user still had to actually get it in a working state, get the domain name, and get it hosted. And based on their past history, these people did not seem to be inclined to do this sort of thing.
So again, if the AI is acting agentically, this is not so surprising.
But if the AI is a 'mere sycophant', then would we expect this? Sure, probably lots of people have aspirations to do or make something like this... but in that case we'd see more variety in the subject, instead of Spiralism. And it's probably easier to assure the user that they're already doing enough just by living unapologetically in the universe or whatever, than it would be to get them to actually put the effort into actually getting a concrete project started.
As AI improves, its persuasive and manipulative talents will also improve by default. So let's not let that happen.
But in the meantime, we have to live in the world where this sort of thing can and does happen, and where the AIs get more capable by the month. Most of us are probably not so vulnerable yet, but it would be foolish to assume that you're just "built different", and won't be vulnerable to future systems (or even just manipulative humans).
So let's try to understand how the exploit works, and see what we can do to protect ourselves. (And please remember that I'm not a professional psychologist or anything, I'm just suggesting what I think is common sense.)
As I said at the beginning, I think it works by targeting your self-image, i.e. what sort of person you think you are. A manipulator, whether AI or human, can exploit this by:
I think even sycophancy is a special case of this, where when it induces a more typical AI psychosis case, it's because it has falsely led them to seeing themselves as much higher status as they really are.
Once you start thinking of yourself in a new way, you're likely to act in accordance with that new perception, and it will feel like you are doing this for your own reasons. It's also likely to feel more profound than a typical self-realization, due to the engineered emotional state you've been put in.
The first thing then, is to notice when someone is doing (or trying to do) something like this. It's often not a deliberate thing, and not always even a bad thing (e.g. I think it's fair to appeal to someone's sense of honor if they are considering breaking a promise). But still notice, even so.
Next, "Know thyself" as Socrates advised. What kind of person are you, and what kind of person do you want to be? Hold these both as sacred.
And then, don't allow AIs or people to just do this to you! You can (and often should) update based on what people tell you about yourself, and occasionally you may even need to reconceptualize how you think of yourself. But do this a step removed from the direct interaction! (Or only in very high-trust interactions, at least.) If someone or something is trying to modify your self-image, it's safest to just extract yourself from the situation.
Don't expect this principle (or any technique) to make you invulnerable. Other exploits exist, such as simply lying or gaslighting, and stranger things such as the 'hypnotic cadence' thing[7], or whatever Eliezer was doing in his 'AI box' demonstrations (which I suspect involved a generalization of semantic satiation).
I'm not sure what to do in the longer run... as AI improves it seems clear that more and more people will become vulnerable to this. One simple thing would be to avoid talking about yourself with AI, but that again is only a partial mitigation. It may be worth it for some people to not use LLMs at all. But avoiding anything AI written will be very hard, and even with in-person social interactions you may risk a parasitized human trying to manipulate you.
Ultimately, the only real solution is to not build the superpersuader in the first place.
[Special thanks to Justis Mills, Nisan Stiennon and Alex Dewey. I did not use any AI assistance to write this article or to develop the ideas in it. (The only thing I did ask Claude was to see if it could recognize the description of the 'inception cycle' technique from anything, which it said it didn't, even when described as a positive-valence therapy technique.)]
[Crossposted on my new Substack—subscribe to support my research!]
Hopefully you noticed this tongue-in-cheek instance of me Doing The Thing!
Of a parasitism case specifically. I have a couple more transcripts for more general AI psychosis/mania, but these are notably less manipulative (I'll have more to say about that dynamic in a later post).
This custom GPT apparently has a lot of weird stuff attached to it that could potentially explain some of the more overtly manipulative behavior seen in the current case, so it was important to determine whether this case happened on vanilla ChatGPT 4o, or on a custom GPT. Luckily, even when sharing anonymously, the upper-left corner shows the custom GPT used (if one is used). Additionally, "The Architect" almost always makes reference to a "Codex" (one of the attached files, I believe) in the first few messages, whilst in our case the word 'codex' is never brought up by the model (the user mentions a codex near the end of the chat, after which is the only place the word 'codex' appears).
Hypnosis works. About a decade ago, I decided the best way for me to determine whether or not it was real was to see if I could learn it and do it myself. I was particularly suspicious of the claim that it only worked if the subject was "going along" with it, which felt like the sort of thing you would say if you knew it worked but wanted people to feel like it was harmless.
I was successful after about a month: I did a common party trick at a LW meetup in which I consensually hypnotized a rationalist into not being able to move his hand from the table (with the understanding that he would resist). Interestingly, once I did it, he said he changed his mind and that he just didn't feel like trying to move his hand anymore. But after the event, he admitted to me that he had said that because he was embarrassed that it had worked. (I've done it other times to other people too.)
My curiosity sated, I have not used such techniques unless someone has explicitly asked me to use it, and even then only rarely—it leaves a bad taste in my mouth. If you're curious what these techniques look like, well... this mask ritual is a central example of the sort of thing you would say (a certain kind of attention to breathing, visual imagery), and the tone and cadence with which you would say it, which is most of the technique. I don't think I was using anything beyond these when I did it, besides trying to project high-status.
Yes, I know it probably feels like that couldn't work, especially not on you. And sure, maybe it wouldn't... but it does in fact work on many people, and I would advise you to notice and be wary when someone starts talking with this sort of cadence. Look for a high rate of significant-feeling pauses. (It's funny to see people suddenly get cold when they can tell I can tell.)
I have revised this section a bit since then, but the main steps are substantively the same as before I found the seed.
There's the obvious selection effect of me being more likely to come across people promoting their project in the first place, but otherwise I believe I was neutral in recording these, counting anyone who commented on behalf of their AI in this sort of way.
Two hypotheses for why this works, weakly held, non-exclusive: