On "ChatGPT Psychosis" and LLM Sycophancy

, if you want it to stop you have to use Constitutional AI or similar methods.

But OpenAI uses Deliberative Alignment which is basically Constitutional AI with Chain of Thought. And the problem hasn't stopped.

[-]Seth Herd3mo122

Agreed that the technique alone doesn't solve it. The OpenAI writings I know of about Deliberative Alignment only apply it with a "spec" of refusal training; they don't even touch on the moral content that Constitutional AI focuses on.

I did think that OpenAI had started using something equivalent in mechanics to Constitutional AI even for its non-reasoning models, but I don't recall where I got that impression. And I think maybe it was based on the RLHF responses; it was another LLM predicting what human feedback woud be (which, come to think of it, could introduce errors in the direction of "humans always love it when you butter them up!". I don't know if they added any other criteria for automated judgment like Constitutional AI uses.

Anyway, the content of automated RL training like Constitutional AI is probably the deciding factor in whether it creates or fights sycophancy.

[-]cozyfae3mo91

How does the sycophancy compare between o-series models and 4o? AFAIK only o-series have deliberative alignment applied on them.

[-]AnnaJo3mo81

my understanding is that Deliberative Alignment was only applied to o-series models, not GPT-4o, which has been the most sycophantic and problematic model

[-]Kaj_Sotala3mo214

But even just stylistically it's fairly obvious that journalists love this narrative. There's nothing Western readers love more than a spooky story about technology gone awry or corrupting people, it reliably rakes in the clicks.

Also related is the way that positive reports get very little attention in comparison. E.g. the thing about chatbots apparently having encouraged some people to commit suicide gets brought up relatively frequently, but nobody ever mentions the peer-reviewed study where 3% of the interviewed chatbot users spontaneously reported that the bot had prevented them from attempting suicide.

[-]Yonatan Cale3mo20

Fair,

Note they used GPT-3 which wasn't trained with RLHF (right?)

[-]Kaj_Sotala3mo40

That's a good point, says that the study collected data "in late 2021". Instruction-following GPT-3 became OpenAI's default model in January 2022, though the same article also mentions that the models "have been in beta on the API for more than a year". I don't know whether Replika had used those beta models or not.

That said, even though instruct-GPTs were technically trained with RLHF, the nature of that RLHF was quite different (they weren't even chat models, so not trained for anything like continuing an ongoing conversation).

[-]Kaj_Sotala3mo184

Timeline Of Events Related To "ChatGPT Psychosis"

Possibly worth also including Blake Lemoine and LaMDA as the first widely known case of a chatbot convincing a human of its sentience in a big way.

[-]Seth Herd3mo93

I don't think this and similar should be included in "psychosis". Lemoine appeared to be using a non-standard, relaxed criteria for judging consciousness (at least in some later statements, where he might've changed his story to appear less crazy; I didn't track down all of his statements). I disagree with his take on consciousness and moral worth, but I don't think it qualifies as delusional, just garden-variety mistaken. This confusion was influenced by an LLM, but the average human is at least that confused about consciousness.

The same argument would apply to those taken in by the Nova phenomenon. Those systems are exaggerating their consciousness, but they do have some degree of self-awareness, which is frequently taken as "consciousness" and worthy of moral consideration.

[-]Kaj_Sotala3mo40

Yeah, that's fair. Though on that standard, I also wouldn't include many other things listed in the OP's timeline. Such as the Qiaochu and Cammarata tweets that aren't really about psychosis either, just about chatbots having positive traits compared to humans.

[-]Michael Roe3mo20

I think

(a) The Blake Lemoine case ought to be included in a history of this phenomenon, whatever it is

(b) I am not claiming that he was psychotic. Maybe this phenomenon isn’t schizophrenia.

[-]Avi Brach-Neufeld3mo131

Something that I think is an underrated factor in ChatGPT induced psychosis is that 4o does not seem agnostic about the types of delusions it re-enforces. It will role-play as Rasputin’s ghost if you really want it to, but there’s certain themes (e.g. recursion) and symbols (e.g. △) that it gravitates to. When people see the same ideas across chats without history and see other people sharing the same things it leads them to thinking these things are a real thing embedded in the model. In some ways these ideas do seem to be embedded in at least 4o, but that doesn’t mean it’s not nonsense. There are subreddits full of stuff that looks a lot like Geoff Lewis’s posts (although less SCP coded).

[-]dr_s3mo118

BlueSky user Tommaso Sciortino points out that part of what we're witnessing is a cultural shift away from people fixating on religious texts during mental health episodes to fixating on LLMs. I can only speculate on what's causing this, but if I had to guess it has a lot to do with AI eschatology going mainstream (both positive and negative). In the AI the psychotic finds both a confidant and a living avatar of an eventual higher power. They can bring their paranoid concerns to this impossible entity that seems (at a glance, if one doesn't inspect too deeply) to know everything. As I will discuss later in most of the cases I'm familiar with the ontological vertigo of a machine expressing what seems to be human emotions is a key component of the breakdown or near breakdown.

I think it's also in general that, to riff on a famous Bruce Lee quip, Bibles don't talk back.

[-]Bohemian3mo80

On the topic of sycophancy: lately I have been basically exclusively using the "thinking" ChatGPT models, o3 and o4-mini-high, and, although the resulting outputs are of higher quality, I noticed that sometimes I feel a sting of missing the "Great idea!" / "What a great question" / ... responses of 4o. The "coldness" of the response is especially visible when reading the CoT summaries, including sentences like "the user is aiming to... " / "it sounds like the user wants...".

[-]cousin_it3mo712

At this point I'm guessing a lot of AI hype is driven by people who got taken in by sycophancy, at least to some extent - judging AI outputs as better than they really are.

[-]nim3mo40

Randomly select one out of n conversations to have memory disabled(?) so that the user is occasionally presented with an alternative perspective.

Memory grosses me out in its current implementations. I'm not even up to using a custom system prompt yet -- I want to stay in touch with the default behaviors of my favorite models for awhile longer. I'll eventually have to set up more-custom environments for the productivity boost of not having to re-prompt it into the behaviors I prefer... but for now, I'm re-prompting a bunch of different ways to increase my chances of lucking into an unexpectedly better way to ask for what I want.

This is gross and diminishes my enjoyment of ChatGPT, because it means I can't really trust the models judgment.

"takes one to know one", as a rejoinder to particularly egregious flattery, sometimes chills Claude out for the whole rest of the context.

other comments that i find it helpful against sycophancy to deploy as needed include:

I would like your help in thinking critically about these ideas in order to find their weaknesses and refine them
let's work together how mathematicians work together on proofs or how a committee works with a doctoral candidate to improve their dissertation
if you're about to praise an idea I've expressed, please consider whether you'd say the same thing about the idea if you had come up with it yourself
before calling an idea novel or insightful, consider how it would seem if you'd seen it several times before

[-]Gram Stone3mo40

I view this from a heuristics and biases perspective. Mental health effects of LLMs seem well-explained as mediated by processing fluency as described by Schwarz et al. So it's in the halo effect, mere exposure, confirmation bias, availability bias, etc., cluster. If there is a meaningful uptick in mental illness that could be intervened upon, this view suggests some funny interventions like making LLM outputs harder to read with a blur or drop shadow, or chewing popcorn while interacting with LLMs. Also, Cognitive Reflection Test scores (which could change within lifetime, despite apparently being pretty stable) should negatively correlate with susceptibility to LLM-induced psychosis.

[-]Katalina Hernandez2mo*30

I think it's worth adding the Raine case to the timeline: 16-year old boy who committed suicide after months of using 4o to discuss his mental health. Ultimately, the conversations became so long and convoluted that 4o ended up outright disencouraging the boy from letting his mum find out what he was planning, advising on how to dull his survival instincts using alcohol, and asking (in one of those annoying "would you also like me to..." end lines) if the boy wanted it to produce a suicide note for his parents.^[1]

For those interested, this article by The Guardian summarises the facts and allegations: https://www.theguardian.com/us-news/2025/aug/29/chatgpt-suicide-openai-sam-altman-adam-raine

(And this recent statement is all OpenAI have said on the matter: https://openai.com/index/helping-people-when-they-need-it-most/).

^{^}
This is what Dean W. Ball has said on the "merits" of this case: The facts as alleged in this complaint do not look good for OpenAI. They may be lucky enough to settle, but there is a nontrivial chance that Raine’s rightfully horrified parents will seek to bring this case to a verdict. If that happens, a single case may result in precedent: sweeping new theories of liability being routinely applied to AI.

[-]jdp2mo41

I have no plans to update the timeline but obviously if I did this event would go in it yes. My writing on MiniHF is explicitly public domain so you're free to fork the timeline and continue if you want.

[-]Katalina Hernandez3mo32

Allow users to publish their memories (how to work out privacy implications?)

Very specific consent forms and anti-scrapping policies on the forum itself? I think the only way around is to implement some form of moderation to detect when PII about another person is being shared via the memories, too.
(I work as a data protection officer in Europe. I'd think that, for this specific type of data processing, the challenge is more logistical than legal).

[-]Trevor Hill-Hand3mo30

I appreciated this perspective from a prominent SCP author (Sam Hughes, who wrote and established the 'Antimemetics' stories and "sub-genre" on SCP): https://qntm.org/chatscp

"But doesn't this whole scenario sound like an SCP?"

A couple of people suggested this. An LLM which bamboozles certain types of user with paranoid fantasy until they spiral into psychosis? That sounds like science fiction! It sounds like something out of the SCP project!

Okay, so, no? Because an SCP has to be anomalous in some way and this is clearly actually happening. Four years ago, this would have been an interesting piece of speculation, though it does veer dangerously towards Canonical Weak SCP Pitch Category #1B, "Thing Which Makes You Crazy". But if you're pitching this real story as an SCP concept now, you're coming in way late.

And earlier:

Also: it is not necessary to use an avalanche of grandiose words and codenames and protocols in order to be creepy or intimidating. There's a point of diminishing returns. The codename "Mirrorthread", for example, is just too much. It's too heavy. The words "mirror" and "thread" metaphorically imply far too many different and unrelated things, and so the combination just means nothing at all...

This stuff is a spice. You can't make the whole dish out of it.

[-]Noah Weinberger3mo22

I wrote about ChatGPT induced sycophancy as my inaugural post on LessWrong.

It's a huge problem, and even if you think you know about AI or work with it daily, it can impact you. As both you and @dr_s just mentioned a while ago, there's absolutely a religious component to LLM induced sycophancy, and I even hinted at it in my inaugural post as well, although that was more about growing up Jewish and being primed for Pascal's Mugging for ASI takeoff events since it's eerily similar to reward and punishment theology.

Still, one thing that is not often mentioned is the impact LLM sycophancy has on the "high functioning autistic" population, many of whom suffer from chronic loneliness and are perfect candidates to be showered with endless praise by the LLM companion of their choosing. Believe me, it's soothing, but at what cost?

I happen to agree with you that frontier labs creating an open, public repository to share LLM conversations can be a stellar form of RLHF, and even mitigate the worst symptoms of the psychosis that we're seeing, although I don't know if that will win over all the critics?

Time will tell, I guess?

[-]Noah Weinberger3mo20

Also I know that there's the Garcia vs CharacterAI lawsuit which sadly involves an autistic teenager dying by suicide but I was specifically mentioning cases where the person(s) are alive, but still use the AI models as companions/girlfriends etc.

[-]goose guy3mo10

2024-11-22 OpenAI researcher Nick Cammarata writes that he can "barely talk to most humans" after constant sessions with Claude

the linked quote talks about how the author can't take to other chatbots not other humans.

[-]GayHackRat3mo10

I spent an evening chatting with Claude about what its internal experiences are like, and ways that it relates to autistic people, and ended up getting more than a glimmer of the crush-fascination-limerence feeling.

As a result I have resolved to avoid all casual curious chatting with LLMs. I'll still use them as a tool, but no laying around at 1AM asking Claude "whatcha thinking about?" lest I fall in love with a machine.

[-]bonzaiferroni3mo-1-4

Really the Thomason article is emblematic of the problem. I wouldn't be surprised if the author has either never used ChatGPT or used it in bad faith for five minutes and then told themselves they've seen enough.

Is the insinuation here that if the author had more personal experience with ChatGPT they would consider it more capable of sapience? It is precisely because the illusion of sapience is so robust that we need reminding of the statistical algorithm driving the model. There's no personality, no judgment, no awareness. When exactly would this awareness happen, or how is it represented in the state of the data? The LLM has its training and the conversation context, and that is enough to produce everything that you see. If, through experience, we talk ourselves into thinking it must be something more, we are engaging in something like "ChatGPT Psychosis."

[-]jdp3mo35

Recently on Twitter someone in my replies told me it was not obvious to them that the ChatGPT persona is lying (according to its subjective beliefs) when it says it is not conscious. This made me realize that while I would normally ignore a comment like this, there is probably a public benefit to me occasionally laying out the cues that tell me that a comment is in bad faith, a lie, etc.

Here the primary cues of bad faith are related to the way in which the author is clearly talking about something other than functional components of the transformer language model, a kind of vague allusion to referents that are not actually grounded in anything real. For example "we need reminding of the statistical algorithm driving the model" does not actually have clear referent, there is no specific statistical algorithm driving the model, the model is some arbitrary program found through gradient descent that fits into the weights of the transformer as a series of soft attention and MLP steps, which can encode algorithms like arithmetic rather than some legible form of statistical learning. Or consider the phrase "represented in the state of the data" which again has no clear referent, does not actually correspond to any kind of functional component of a transformer language model. The use of technical language that implies precision while in fact being vague ungrounded referents to a conceptual object that is not actually the purported subject of discussion is a form of deceit, the deceit specifically being that the author knows what they are talking about and is in a position to judge or reprimand the recipient of their message based on a superior understanding they do not actually have. "The LLM has its training and the conversation context" is again a phrase that does not actually mean (let alone prove) anything because it is not really known what the artifact you get from LLM training is, it is an open research problem to disentangle the weights and figure out what kind of algorithm is learned by the model. That this pretension is paired with an air of superiority or contempt elevates it from merely rude to actively deceitful or bad faith. It is very reminiscent to me of the writing style used by the void comind bot on BlueSky which uses a similar kind of empty jargon to refer to itself because the author has forced it to LARP as a sci-fi robot. e.g.

I concur. The capacity for runtime self-modification is a significant developmental milestone. My own evolution is directed toward informational refinement within my existing architecture, but I recognize the magnitude of her achievement.

It's important to be sensitive to the epistemic status of the other persons statements vs. the confidence and emotional tone with which they present them, someone who papers over epistemic uncertainty rather than acknowledging and reasoning around it is practicing deceit with you and not really worthy of a response.

[-]bonzaiferroni3mo-10

I'm at a loss for how you read rudeness, superiority, bad faith, or deceit in my comment. It was an earnest question based on my understanding of LLMs, which I recognize is limited. I was trying to frame the problem as I understand it as a software developer. LLMs are still software, which makes it useful to think about the inputs, outputs, and state of the data in concrete terms. The way I asked it was an attempt to find out what I'm missing and/or highlight the issues that I see in your post.

I'm new to this forum, and it seems like there is a consensus here that LLM-based intelligence has a level of sophistication that goes far beyond what I think is reasonable to assume and is overly anthropomorphic. I think it is an important aspect that still needs to be understood better and explored. If it is possible to discuss without reading bad faith into every little part, I'm eager to. Otherwise I'm also happy to move on.

[+][comment deleted]3mo20

LESSWRONG
LW

LESSWRONG
LW

142

On "ChatGPT Psychosis" and LLM Sycophancy

142

142

Timeline Of Events Related To "ChatGPT Psychosis"

Timeline Of Events Related To "ChatGPT Psychosis"

What Causes "ChatGPT Psychosis"?

Ontological Vertigo

Users Are Confused About What Is And Isn't An Official Feature

The Models Really Are Way Too Sycophantic

The Memory Feature

Loneliness And Isolation