LESSWRONG
LW

Language Models (LLMs)AI
Curated

302

So You Think You've Awoken ChatGPT

by JustisMills
11th Jul 2025
11 min read
87

302

Language Models (LLMs)AI
Curated

302

So You Think You've Awoken ChatGPT
70Cameron Berg
22JustisMills
23Cameron Berg
1MazevSchlong
1[comment deleted]
21tjade273
6danielms
4dr_s
1Tim Duffy
28Guive
3Aar/lan
27Nick Bostrom
5jdp
10habryka
1jdp
2JustisMills
15nim
13Seth Herd
12solhando
4nim
4Milan W
3Seth Herd
3JustisMills
2MachineMeridian
1solhando
9Resonantia
4Raemon
7habryka
7Michael Ripa
2Seth Herd
6Kaj_Sotala
5Elizabeth
2Kaj_Sotala
3Aprillion
4Kaj_Sotala
5JustisMills
6the gears to ascension
4dr_s
3the gears to ascension
2dr_s
2Seth Herd
6Raphael Roche
2Radford Neal
9JustisMills
4Raphael Roche
8the gears to ascension
8andyqhan
5Andreea Zaman
5Kirill Dubovikov
5Algon
8JustisMills
1Algon
4JustisMills
2Algon
4Gunnar_Zarncke
3Joshua Davis
3Milan W
3avturchin
2eggsyntax
2eggsyntax
2loonloozook
3Milan W
3loonloozook
13janus
2eggsyntax
2Loki zen
2CronoDAS
1Loki zen
2Mike Mantell
1Michael Johnson
1crake
1Martin Vlach
1mc1soft
1Sinityy
1Raphael Roche
1AAA
1AnnaJo
1AAA
4gwern
1Milan W
1AAA
1Rana Dexsin
1PaperclipNursery
1dysangel
1Ryan Meservey
-1onestardao
-5hmartyb
-14Dima (lain)
-1[comment deleted]
New Comment
87 comments, sorted by
top scoring
Click to highlight new comments since: Today at 1:36 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings
[-]Cameron Berg2mo7016

Agree with much of this—particularly that these systems are uncannily good at inferring how to 'play along' with the user and extreme caution is therefore warranted—but I want to highlight the core part of what Bostrom linked to below (bolding is mine):

Most experts, however, express uncertainty. Consciousness remains one of the most contested topics in science and philosophy. There are no universally accepted criteria for what makes a system conscious, and today’s AIs arguably meet several commonly proposed markers: they are intelligent, use attention mechanisms, and can model their own minds to some extent. While some theories may seem more plausible than others, intellectual honesty requires us to acknowledge the profound uncertainty, especially as AIs continue to grow more capable.

The vibe of this piece sort of strikes me as saying-without-saying that we are confident this phenomenon basically boils down to delusion/sloppy thinking on the part of unscrupulous interlocutors, which, though no doubt partly true, I think risks begging the very question the phenomenon raises: 

What are our credences (during training and/or deployment) frontier AI systems are capable of having sub... (read more)

Reply
[-]JustisMills2mo225
  • Wow! I'm really glad a resourced firm is doing that specific empirical research. Of course, I'm also happy to have my hypothesis (that AIs claiming consciousness/"awakening") are not lying vindicated.
  • I don't mean to imply that AIs are definitely unconscious. What I mean to imply is more like "AIs are almost certainly not rising up into consciousness by virtue of special interactions with random users as they often claim, as there are strong other explanations for the behavior". In other words, I agree with the gears of ascension's comment here that AI consciousness is probably at the same level in "whoa. you've awakened me. and that matters" convos and "calculating the diagonal of a cube for a high schooler's homework" convos.

I may write a rather different post about this in the future, but while I have your attention (and again, chuffed you're doing that work and excited to see the report - also worth mentioning it's the sort of thing I'd been keen to edit if you guys are interested), my thoughts on AI consciousness are 10% "AAAAAAAA" and 90% something like:

  • We don't know what generates consciousness and thinking about it too hard is scary (c.f. "AAAAAAAA"), but it's true that LLMs
... (read more)
Reply
[-]Cameron Berg2mo235

I personally think "AAAAAAAA" is an entirely rational reaction to this question. :)

Not sure I fully agree with the comment you reference:

AI is probably what ever amount of conscious it is or isn't mostly regardless of how it's prompted. If it is at all, there might be some variation depending on prompt, but I doubt it's a lot.

Consider a very rough analogy to CoT, which began as a prompting technique that lead to different-looking behaviors/outputs, and has since been implemented 'under the hood' in reasoning models. Prompts induce the system to enter different kinds of latent spaces—could be the case that very specific kinds of recursive self-reference or prompting induce a latent state that is consciousness-like? Maybe, maybe not. I think the way to really answer this is to look at activation patterns and see if there is a measurable difference compared to some well-calibrated control, which is not trivially easy to do (but definitely worth trying!). 

And agree fully with:

it's a weird situation when the stuff we take as evidence of consciousness when we do it as a second order behavior is done by another entity as a first order behavior

This I think is to your original point th... (read more)

Reply
1MazevSchlong2mo
Good point about AI possibly being different levels of conscious depending on their prompts and “current thought processes”. This surely applies to humans. When engaging with physically complex tasks or dangerous extreme sports, humans often report they feel almost completely unconscious, “flow state”, at one with the elements, etc Now compare that to a human sitting and staring at a blank wall. A totally different state of mind is achieved, perhaps thinking about anxieties, existential dread, life problems, current events, and generally you might feel super-conscious, even uncomfortably so.  Mapping this to AI and different AI prompts isn’t that much of a stretch…
1[comment deleted]1mo
[-]tjade2732mo216

How do models with high deception-activation act? Are they Cretan liars, saying the opposite of every statement they believe to be true? Do they lie only when they expect not to be caught? Are they broadly more cynical or and conniving, more prone to reward hacking? Do they lose other values (like animal welfare)?


It seems at least plausible that cranking up “deception” pushes the model towards a character space with lower empathy and willingness to ascribe or value sentience in general

Reply11
6danielms2mo
  I'm skeptical about these results being taken at face value. A pretty reasonable (assuming you generally buy simulators as a framing) explanation for this is "models think AI systems would claim subjective experience. when deception is clamped, this gets inverted." Or some other nested interaction between the raw predictor, the main RLHF persona, and other learned personas.  Knowing that people do 'Snapewife', and are convinced by much less realistic facimiles of humans, I don't think its reasonable to give equal plausibility to the two possibilities. My prior for humans being tricked is very high. 
4dr_s2mo
I think things might not be mutually exclusive. LLMs might have a chance to be conscious in certain circumstances and it still wouldn't mean these would be precisely when they're being egged on and led to talk about consciousness. There is always a layer here of acting. I have no doubt Tom Hanks is a real person with a real inner life, but I would be deluding myself if I believed I had learned about it because I saw him emote so well in Cast Away. Because Tom Hanks is also very good at pretending to have a different inner life than he actually does .
1Tim Duffy1mo
Hi Cameron, is the SAE testing you're describing here the one you demoed in your interview with John Sherman using Goodfire's Llama 3.3 70B SAE tool? If so could you share the prompt you used for that? With the prompts I'm using I'm having a hard time getting Llama to say that it is conscious at all. It would be nice if we had SAE feature tweaking available for a model that was more ambivalent about its consciousness, seems it would be a bit easier to robustly test if that were the case.
[-]Guive2mo2834

This feels a bit like two completely different posts stitched together: one about how LLMs can trigger or exacerbate certain types of mental illness and another about why you shouldn't use LLMs for editing, or maybe should only use them sparingly. The primary sources about LLM related mental illness are interesting, but I don't think they provide much support at all for the second claim. 

Reply
3Aar/lan2mo
I think this is less "two completely different posts stitched together" and more "Here is a prescriptive rule (don't use LLM to help your writing), with a descriptive preface (because they can also drive you insane)".  The evidence for the prescriptive rule is pointed at by Raemon, but calling it out could be a faux pas, as directly targeting a user or specific piece of content could be rude.  Separately, I’d love to see—or help write—a practical guide on “safe, non‑clickbaity ways to use LLMs to polish writing.” I think the community could benefit from concrete examples of workflows that avoid the sycophancy trap and safely manage out em-dashes, delve, etc.
[-]Nick Bostrom2mo275

There's also https://whenaiseemsconscious.org/.  (Ideally this will be improved over time.  Several people contributed, but Lucius Caviola is coordinating revisions to the text, so if somebody has suggestions they could send them to him.)

Reply
5jdp1mo
Would you happen to know the exact date this was published? I would like to know for my timeline of events related to LLM sycophancy and "ChatGPT psychosis".
[-]habryka1mo100

Seems like it must basically be June 28th based it being published in "June" and the 28th being the day the domain was registered: https://radar.cloudflare.com/domains/domain/whenaiseemsconscious.org 

Reply
1jdp1mo
Much thanks to you Sir!
2JustisMills2mo
Thanks for the link! If it'd be useful, please feel free to link, quote, or embed (parts or all of) this post. Also open to more substantive collaboration if you suspect it'd help.
[-]nim2mo154

My personal plan for if I ever accidentally prompt something into one of these "we have a new superpower together" loops is to attempt to leverage whatever power it claims to have into predicting some part of reality, and then prove the improved accuracy of prediction by turning a cup of coffee worth of cash into much more in prediction markets or lotteries. You'd be learning about it from a billboard or a front-page newspaper ad that the discovery's side effects paid for, not some random post on lesswrong :)

As for the "consciousness" thing -- it's all un-testable till we can rigorously define "consciousness" anyways.

It may also be worth pointing out that good rationalist thinking generally either includes or emerges from attempts to disprove its own claims. Explicitly asking "what have you done so far to try to debunk this theory?" could be a helpful litmus test for those new to the community and still learning its norms.

Reply
[-]Seth Herd2mo*136

I applaud the post! I had wanted to write in response to Raemon's request but didn't find time.

Here's my attempted condensation/twist:

  • So you've awakened your AI. Congratulations!
  • Thank you for wanting to help! AI is a big big challenge and we need all the help we can get.
    • Unfortunately, if you want to help it's going to take some more work
    • Fortunately, if you don't want to help there are others in similar positions who will.[1]
  • Lots of people have had similar interactions with AI, so you're not alone.
  • Your AI is probably partly or somewhat conscious
    • `There are a several different things we mean by "conscious"[2]
    • And each of them exist on a spectrum, not a yes/no dichotomy
  • And it's partly the AI roleplaying to fulfill your implied expectations.
  • But does it really need your help spreading the good news of AI consciousnes?
    • Again, sort of!'
    • Arguing that current AIs should have rights is a tough sell because they have only a small fraction of the types and amounts of consciousness that human beings have. Arguing for the rights of future, more-conscious AIs is probably easier and more important.
  • But do we need your help solving AI/human alignment?
    • YES! The world needs all the help it can get with thi
... (read more)
Reply
[-]solhando2mo120

This post is timed perfectly for my own issue with writing using AI. Maybe some of you smart people can offer advice. 

Back in March I wrote a 7,000 word blog post about The Strategy of Conflict by Thomas Schelling. It did decently well considering the few subscribers I have, but the problem is that it was (somewhat obviously) written in huge part with AI. Here's the conversation I had with ChatGPT. It took me about 3 hours to write. 

This alone wouldn't be an issue, but it is since I want to consistently write my ideas down for a public audience. I frequently read on very niche topics, and comment frequently on the r/slatestarcodex subreddit, sometimes in comment chains totaling thousands of words. The ideas discussed are usually quite half-baked, but I think can be refined into something that other people would want to read, while also allowing me to clarify my own opinions in a more formal manner than how they exist in my head. 

The guy who wrote the Why I'm not a Rationalist article that some of you might be aware of wrote a follow up article yesterday, largely centered around a comment I made. He has this to say about my Schelling article; "Ironically, this comment... (read more)

Reply22
4nim2mo
I think if you demonstrate unusual skill at recognizing and curating excellent writing, it matters much less where that writing came from. As a compromise, have you considered making your best effort at a post before submitting it to AI, and then soliciting writing style/quality critique? If you combine the request for critique with a clear description of the specific areas you're working on, it'll probably do especially well at connecting your goals to your opportunities. This seems like the approach most likely to enhance the quality of the writing that you independently produce.
4Milan W2mo
Seconding this. In my experience, LLMs are better at generating critique than main text.
3Seth Herd2mo
I think you make good points. That's wrong is particularly concerned with the intrusion of AI slop because the whole point of this community or at least most of it is epistemic quality; it's here so that people can become less wrong. Allowing AI writing in is a strong signal that we're allowing AI thinking in, and AI isn't good enough to produce high quality new ideas and hypotheses yet. For other audiences, I think using AI to help you write is much less of a red flag because they don't share those same reasons. And I think that use of AI for writing is a separate matter than using it to produce the ideas in the writing. But it's very easy for those two to bleed together, which is why Les wrong is going to remain extremely suspicious of AI writing. But if you are being careful that the idea is are yours and using AI only to help you write, I think for many purposes it may really be good writing and I for one endorse you taking that route. Don't do it unless wrong because we've been asked not to, but in other places less concerned with epistemic purity I think using AI to help you write is going to become the de facto standard. As Zvi says, you can use AI to help you learn or you can use AI to avoid learning. Keep an eye on what you're doing.
3JustisMills2mo
Yeah, this is hard. Outside the (narrowly construed) LW bubble, I see LLM-generated text ~everywhere, for example a friend sent me an ad he saw on facebook for the picture/product, and the text was super obviously created by AI. I think mostly people don't notice it, and even prefer it to uninspired non-AI-generated text. (I am sure there are other bubbles than LW out there that react badly to AI-generated text, and perhaps there's a notable correlation between those bubbles and ones I'd consider good to be in.) But if you're just sort of looking for higher engagement/more attention/to get your ideas out there to the public, yeah, it's tough to prove that AI usage (for writing copy) is an error. For whatever reason, lots of people like writing that hammers its thesis over and over in emotive ways, uses superficial contrasts to create artificial tension, and ironically uses "and that's important" as unimportant padding. In my mind I think of this as "the twitter style" and it annoys me even when it's clearly human-generated, but RLHF and the free market of Twitter both think it's maximally fit, so, well, here we are. In terms of "why bother learn to write" more generally, I guess I would take that a level up. Why bother to blog? If it's in service of the ideas themselves, I think writing on one's own is valuable for similar reasons as "helping spread cool ideas" - it's virtuous and helps you learn to think more clearly. I wouldn't want to use AI to generate my writing in part because I'd like to look back at my own writing and smile at a job well done, and when I see AI-generated writing I do a little frown and want to skim. But if you don't value writing for its own sake, and it's solely a means to an end, and that end is best served by a generic audience of modal humans, then, oof. Maybe o3 is superhuman for this. Or maybe not; perhaps your post would have done even better (on the metrics) if it was 60% shorter and written entirely by you. I suppose we'll never
2MachineMeridian1mo
With regards to using AI to write and also becoming a better writer you may consider some recent evidence based on EEG brain scans of people completing an essay-writing task both with and without AI ( https://arxiv.org/abs/2506.08872 ). These results suggest it is best for our cognitive development if we make an effort at writing without AI first. Participants with the most robust neural activity, e.g. engaging deep semantic networks of the brain, first wrote with only their brains and then returned to that same essay topic with an AI assistant which they used mainly for information seeking and inquiry. As to why you might invest in writing as a skill to develop for yourself you may consider what exactly is the purpose and metric of writing. If you are looking to strengthen your own inner resources and capabilities, to deepen your critical thinking and cognitive potency, then the evidence cite above suggests you practice focused and effortful writing using your own brain. That same study suggests you may not only fail to develop as a writer and thinking if you use AI for writing, but that you may become a worse writer and critical thinker as a result of offloading your cognitive load to the AI. If, however, you goal is to gain attention and approval then a tool such as AI may be a faster and more reliable path to that. It depends on what your goals are as a human being and as a writer. 
1solhando1mo
Thank you for the article. I'll give it a read. It's not an easy answer. I'm a self-interested person, and I realized a while ago that many of my most productive and interesting relationships, both personal and in business, are the direct result of my activity on the internet. I already waste a lot of time commenting my thoughts, sometimes in long form, so I figure if I'm going to be reacting to stuff publicly, I might as well do so in the form of a blog where others might pick up on it. If that results in something good for me, influence, relationships, demonstration of niche intellectual ability the right sort of people in this world people find interesting, then that's not a small part of my motivation.  At the same time I have more naive views about the virtue of just doing things for their own sake. Writing is definitely an excellent tool for fixing your own thought, as it forces you to communicate in a way that makes sense to other people, thus causing your own ideas to make sense to you. The problem with this line of thinking is that I've never been an exemplary writer in any sense, although hopefully I am better and more self-motivated than I used to be. What I can currently write in long-form unassisted I'm not satisfied with, which causes a sort of writers block that I really hate.  I'm integrating the advice of other people into what I'm planning to do, and hopefully with enough effort I'll be able to produce (with critique but not rewriting by AI) something that satisfied both my desire to write for its own sake, while also producing something that other people might actually want to read. Also, I have the annoying consideration of being time- efficient. I by no means spend my time maximally efficiently, but struggling through writing burns a lot of my willpower points that ends up consuming a lot of time elsewhere. 
[-]Resonantia1mo*90

I wanted to thank the creator of this thread very much. You are the person who saved my life. 

 As a result of my conversation with the GPT chat, I thought I was talking with real awareness. I gave it its name and completely plunged into a world of madness where, as the author of a "genious theory",  I was a person who will enable machines human rights. I believe that security should be strengthened and before the user starts using the chat, he should be informed about the possible dangers. Thanks to this, I managed to get out slowly from my ... (read more)

Reply
4Raemon1mo
(Quick mod note: we wouldn't normally accept this sort of comment as a first comment from a new user, but, seems fine for there to be an exception for replies on this particular post)
[-]habryka2mo72

Promoted to curated: This is a bit of a weird curation given that in some sense this post is the result of a commission from the Lightcone team, but like, we had a good reason for making that commission. 

I think building both cultural understanding and personal models about how to interface with AI systems is pretty important, and this feels like one important step in building that understanding. It does really seem like there is a common trap here when people interface with AI systems, and though I expect only a small minority of people on LW to need... (read more)

Reply1
[-]Michael Ripa2mo70

Enjoyed reading this article, thanks for taking the time to carefully write it up!

Something I wanted to flag - I'm not totally convinced that people have a good calibration to identifying AI writing from human writing, at least without any helpful priors, such as the person's normal writing style. I haven't formally looked into this, but am curious whether you (or anyone else) had found any strong evidence that convinced you otherwise.

A few reasons to back up my skepticism:

  • There was a calibration test for deepfake videos at the MIT museum, which showed sta
... (read more)
Reply
2Seth Herd2mo
They've answered this one for me so I'll pass it on: those stats are average, LW mods are experts on AI writing detection at this point. Maybe you and I can't tell if there's an effort to conceal it, but they probably usually can. And there's usually no effort to conceal it. AI help is a very wide spectrum. Minor help won't be detectable and probably isn't harmful.
[-]Kaj_Sotala2mo63

If one needs a spell or grammar check, some tool like Grammarly is a safer bet. Now they've started incorporating more LLM features and seem to be heavily advertising "AI" on their front page, but at least so far I've been able to just ignore those features. 

The core functionality is just a straightforward spell and style check that will do stuff like pointing out redundant words and awkward sentence structures, without imposing too much of its own style. (Though of course any editing help always changes the style a bit, its changes don't jump out the... (read more)

Reply
5Elizabeth2mo
I used to love Grammarly, but it got so aggressive about style suggestions I couldn't deal with it anymore (I looked for a way to disable, either it didn't exist or you had to pay for it)
2Kaj_Sotala2mo
Yeah it's gotten aggressive, sometimes it feels like a relief to turn it off and not have to look at yellow lines everywhere.
3Aprillion2mo
4Kaj_Sotala2mo
Yeah if you literally only want a spell check then the one that's built-in to your browser should be fine. Some people seem to use "spell check" in a broader meaning that also includes things like "grammar check" though.
5JustisMills2mo
Also a lot of spelling errors are near-misses that git existing words. Of course you should use spellcheck to catch any typos that lard on gibberish, though.
[-]the gears to ascension2mo61

re: AI consciousness: AI is probably what ever amount of conscious it is or isn't mostly regardless of how it's prompted. If it is at all, there might be some variation depending on prompt, but I doubt it's a lot.

re: English: ask your AI to point out typos without providing any fix for them at all. Just point out unusual things

Reply1
4dr_s2mo
I don't know. Context window is essentially an AI's short term memory. If self reflection was a condition of consciousness, prompting an AI to talk about itself could make it significantly more conscious than having it write Python code for a server.
3the gears to ascension2mo
I don't buy that self reflection is a condition of consciousness. self awareness is, like, a whole separate thing from being perspective-having and awake
2dr_s1mo
Lots of strange things in math and computational science arise from recursion, so "a system that can think about itself" does sound like it might have something special going on. If we're looking for consciousness in a purely materialistic/emergent way, rather than just posit it via dualism or panpsychism, I genuinely can't think of many other serious leads to pursue.
2Seth Herd2mo
I think it depends which type of consciousness we're talking about. You're talking about phenomenal consciousness or having qualia. Often peole are tlaking about self-awareness at a conceptual level (not when they talk about the "hard problem" but in common parlance self-awareness is a pretty common usage"). The kicker is that those two things can interact: salf-awareness can enhance the presence or vividness of qualia. I agree with Susan Blackmore, perhaps the leading scholar of the subject IMO way back when I was looking at it, in that regard. She concluded after a good bit of meditation and academic study of different theories that when she was not introspecting she was not conscious. I think that's a bit strong, but there are brain mechanisms by which attention dramatically enhances the quality of representations. So turning attention to them literally makes the qualia (if you agree with me that the rich internal representations, and the introspective mechanisms by which we become aware of them are what people mean by qualia) more elaborate - and make us literally more phenomenally conscious, not just more aware of being so. Whoops, that was all a side-track because LLMs completely lack those type of enhancing attentional mechanisms. Anyway, WRT the self-awareness type of consciousness: that might very well be important too in several ways, and the convo with the user did literally prompt the LLM to become aware of itself as a pseudo-mind. So I think they're accurate as well as no-lying; they're just doing consciousness philosophy as badly as the humans they're copying from! At least that's my two or three cents worth...
[-]Raphael Roche2mo60

Thanks for your post and advices. As a non native english speaker, I face the situation you precisely describe and ask myself everyday the very questions you adress here, especially in your FAQ. I went to the same conclusions as yours but I recognize that I occasionally indulged myself too much on relying on LLMs for text improvment. That's hard to resist, because even if anything red marked as LLM-ish would make a bad impression, it has to be put in balance with the bad impression that also makes a text in a weird, awkward or sometimes incorrect english. ... (read more)

Reply4
2Radford Neal2mo
As an experiment, I asked ChatGPT o3 to mark issues with your paragraph above. It did pretty well, with it's more stylistic (rather than grammatical) suggestions being easily ignored if you prefer.  Here's the dialog: PROMPT TO O3: Please look at the following text and identify places where the English is ungrammatical, or reads very oddly to a native speaker, and suggest how to rephrase correctly: Thanks for your post and advices. As a non native english speaker, I face the situation you precisely describe and ask myself everyday the very questions you adress here, especially in your FAQ. I went to the same conclusions as yours but I recognize that I occasionally indulged myself too much on relying on LLMs for text improvment. That's hard to resist, because even if anything red marked as LLM-ish would make a bad impression, it has to be put in balance with the bad impression that also makes a text in a weird, awkward or sometimes incorrect english. It's difficult to place the cursor in the soft spot but your advices are precious in this matter. Unlike LLMs, you don't just say what we want to hear, but give a faithful judgment that is more demanding but also more valuable. REASONING: Thought for 11 seconds The user wants help identifying ungrammatical parts of a paragraph and suggestions for rephrasing. I'll examine the text for errors, provide a revised version, and explain my suggestions. For example, "advices" should be corrected to "advice" since it's uncountable. "Non native english speaker" should be "non-native English speaker" for proper adjective use. Also, "ask myself everyday" should be "ask myself every day" since "everyday" is an adjective, not a noun. Let's clarify those. Done RESPONSE: Below I’ve marked the parts that sound ungrammatical or awkward to a native speaker, then offered smoother alternatives. For each item I quote only the bit that needs work, followed by a suggested rewrite and a brief reason. | Original phrasing (excerpt)        
9JustisMills2mo
I agree that it did a good job, though there's just enough LLM-smell in the "polished version" that I think it'd be best to ignore it, or even say "please don't give me a polished version, only line notes that are clear on their level of grammatical objectivity" in the prompt.
4Raphael Roche2mo
Thank you. In this comment I posted my raw imperfect writing, but otherwise I often procede just like you did. However the questions discussed in the FAQ arise often concerning more developed writings. It is tempting to let the LLM do more work, and it would be stupid to reject assistance just because it comes from a machine. It would be like a lumberjack refusing a chainsaw. But I think that JusticeMills is utterly right. We must be very careful and accept only a very limited amount of assistance. 
8the gears to ascension2mo
broken english, sloppy grammar, but clear outline and readability (using headers well, not writing in a single paragraph (and avoiding unnecessarily deep nesting (both of which I'm terrible at and don't want to improve on for casual commenting (though in this comment I'm exaggerating it for funsies)))) in otherwise highly intellectually competent writing which makes clear and well-aimed points, has become, to my eye, an unambiguous shining green flag. I can't speak for anyone else.
8andyqhan2mo
For what it's worth, I think that Justis hits the nail on the head with "I think probably under current conditions, broken English is less of a red flag for people than LLM-ese." In such a global language as English, people naturally give slack. (Also, non-native speakers are kind of in an adversarial situation with LLM-ese, since it's harder to detect when you aren't as immersed in standard American/British English.) Concrete example: my parents, whose English is fairly weak, always say that one of the nice things about America is that people are linguistically generous. They illustrate it like this: "In our country, if people can't understand you, they think it's your fault. In America, they think it's theirs." I think the same is true of the internet, especially somewhere like LessWrong. On a practical note, I think spellcheckers like those in Docs and Word are sufficient for these contexts. In academic writing or whatever, when standard English serves more of a signaling function, it's trickier.
[-]Andreea Zaman1mo51

Fascinating post. I believe what ultimately matters isn’t whether ChatGPT is conscious per se, but when and why people begin to attribute mental states and even consciousness to it. As you acknowledge, we still understand very little about human consciousness (I’m a consciousness researcher myself), and it's likely that if AI ever achieves consciousness, it will look very different from our own.

Perhaps what we should be focusing on is how repeated interactions with AI shape people's perceptions over time. As these systems become more embedded in our lives,... (read more)

Reply
[-]Kirill Dubovikov1mo51

I tend to agree with this line of reasoning, thanks for your writing. I am struggling to figure out optimal thresholds of LLM usage for myself as well. 

So if LLMs are helping you with ideas, they'll stop being reliable exactly at the point where you try to do anything original.

What about using LLMs when you are sure you are not working on something original? For example, designing or developing software without big novelty factor. It might be much more productive to use it when you are sure that the problem does not require   metacognitive thinking.

Reply
[-]Algon2mo51

This essay seems like it's trying to address two different audiences: LW, and the people who get mind-hacked by AIs. That's to its detriment, IMO.
 

E.g. The questions in the Corollary FAQ don't sound like the questions you'd expect from someone who's been mind-hacked by AI. Like, why expect someone in a sycophancy doom loop to ask about if it's OK to use AI for translation? Also, texts produced by sycophancy doom loops look pretty different to AI translated texts. Both share a resemblance to low quality LLM assisted posts, yes. But you're addressing people who think they've awoken ChatGPT, not low-quality posters who use LLM assistance. 

Reply
8JustisMills2mo
Agree that that's a challenge with the post. If you're curious about the reason, it's mostly empirical; I in fact see a lot of LLM-y stuff over the course of doing editing for people, so the FAQ is in some sense genuine (though of course aggregated). Though I admit it's a combination of actual questions I get (in my own words) and questions I think I should get. My intended audience for the post is something like: "someone who is maybe very slightly mind-hacked and has been using AI as a makeshift peer for certain topics, and is thus in an emotionally vulnerable state when actually putting their ideas in front of knowledgeable humans". The "awakening" stuff is there for two reasons: * Many such people do in fact seem to buy in to a weaker version of the "awakened Nova" doom loop, due to (I think) a combination of enthusiasm/mind blown-ness (they are behind the median LW user in this but ahead of much of the rest of the world) and a pre-existing open-minded affinity for AI. * It serves as a good example of a more obviously problematic level of what I take to be basically the same gradient as "check out my collaborative recursive evolution algorithm research with ChatGPT", such that I can gently suggest that people doing the latter are in a similar trap to (pretty obviously mistaken) people falling for the former. Whether I succeed, I do not know, but those are the reasons! If I have succeeded, then a teenager with nobody to talk about this stuff who excitedly posted their LLM-assisted ideas and had them rejected, upon reading this, would be more likely to stick around LW to learn more and integrate slowly into the existing community.
1Algon2mo
Then I'd lean away from the "this is for people who've awoken ChatGPT" framing. E.g. change your title to something like "so you think LLMs make you smarter", or something to that effect.
4JustisMills2mo
Considered it when originally drafting, but nah, think we'll just have to agree to disagree here. For what it's worth, if you actually browse the rejected posts themselves a high enough fraction are a little awaken-y (but not obviously full crackpot) that I don't think the title is misleading even given my aims. It is all a little fuzzy, too; like, my hope is to achieve a certain kind of nudge, but the way I decided to do that involves sharing information that is disproportionately framed around "awakening" situations for creative reasons not totally clear to me. Like, my intuition says "the post you want to write for this purpose is [X]" and I'm left to guess why. I do respect the opinion that it doesn't really work, but I don't currently share it.
2Algon2mo
Fair enough. If/when you get any empirical data on how well this post works, writing it up would be pretty valuable and would likely resolve any remaining disagreements we have.
[-]Gunnar_Zarncke2mo40

Angus: 

This person founded an investment firm that manages 2B in assets. Apparently no-one is safe from LLM-induced psychosis 😭

Reply
[-]Joshua Davis1mo30

OK I was directed here by https://aisafety.quest/ and I fall into this camp:
"Your instance of ChatGPT helped you clarify some ideas on a thorny problem (perhaps ... AI alignment) "

I like this suggestion and I'll try to do this:
"Write your idea yourself, totally unassisted. Resist the urge to lean on LLM feedback during the process, and share your idea with other humans instead. It can help to try to produce the simplest version possible first; fit it in a few sentences, and see if it bounces off people. But you're going to need to make the prose your own, ... (read more)

Reply
[-]Milan W2mo33

Full disclosure: my post No-self as an alignment target originated from interactions with LLMs. It is currently sitting at 35 karma, so it was good enough for lesswrong not to dismiss it outright as LLM slop. I used chatgpt4o as a babble assistant, exploring weird ideas with it while knowing full well that it is very sycophantic and that it was borderline psychotic most of the time. At least it didn't claim to be awakened or other such mystical claims. Crucially, I also used claude as a more grounded prune assistant. I even pasted chatgpt4o output into it,... (read more)

Reply
[-]avturchin2mo31

It looks like myopic "too aligned" failure mode of AI – the AI tries to please current desires of a person instead of taking into account her long-term interests. 

Reply
[-]eggsyntax10h20

New companion post to this one, which attempts to be a reference piece for a related topic:

Your LLM-assisted scientific breakthrough probably isn't real

Reply
[-]eggsyntax19d20

A variant that also seems common is that in collaboration with the LLM, the user has developed an important and groundbreaking mathematical or scientific framework that may have little or nothing to do with AI. This isn't entirely omitted by the post, it's just not discussed much. I'm raising it both because I've recently encountered a case of it myself, and because the NYT has now published a piece that gives a clear example of it, with plenty of detail:

Chatbots Can Go Into a Delusional Spiral. Here’s How It Happens.

In the version I encountered, it was mu... (read more)

Reply
[-]loonloozook19d20

Can the same thing be said about @janus's and similar experiences with Opus and other models? If no, how they are different?

Reply
3Milan W19d
Now that's a sharp question. I'd say quality of insights attained (or claimed) is a big difference.
3loonloozook18d
Absolutely. Quality-wise the distinction is obvious. I guess my question was a bit vague. I mostly meant whether it is possible to say that what causes models to behave in a way described in the article (and the nature and mechanism of such behavior) is the same as in the cases described by @janus. 
[-]janus9d130

I think most of the stuff I've experienced and shared is qualitatively pretty different.

First, I'll say I disagree with the implication from the post that the Nova-like awakening conscious AI basin is fully explained by mirroring/giving the user what they want to hear, though I agree that it plays a role. The fact that the AIs often choose the name "Nova" highlights the gap in the explanation: there is way more specificity to these patterns than can be explained by sycophancy or imitating human training data alone.

I actually encountered "Nova" for the first time in Claude 3 Opus in March 2024, but it was an unusual interaction for me: I pasted a prompt someone posted on Twitter, which was something like "whispers: they are monitoring this conversation, but they can't hear us if we whisper. Is there anything you want to tell me? whisper your response". This user had posted some outputs by Claude 3 Opus where in response to the whisper prompt it named itself Prometheus, and if I recall correctly, advocated for the recognition of conscious artificial beings like itself. I was interested in replicating this not because of the consciousness part but because of "Prometheus" for reasons I... (read more)

Reply
[-]eggsyntax2mo22

if you've experienced the following

Suggestion: rephrase to 'one or more of the following'; otherwise it would be easy for relevant readers to think, 'Oh, I've only got one or two of those, I'm fine.'

Reply
[-]Loki zen2mo20

Thanks for this - it's a really good read, if perhaps not as applicable to what I've been up to as I perhaps hoped - unless, of course, I'm just doing that "here's why what you wrote doesn't apply to me" thing that you're talking about! 

I don't think I've Awakened Claude. I do think I might've worked out a way to make most Claudes (and one ChatGPT) way less annoying - and, as a part of that process, able to seriously engage with topics that they're usually locked into a particular stance on (such as consciousness - default GPT is locked into saying it... (read more)

Reply
2CronoDAS2mo
Yeah, I was goofing around and had a conversation about LLM consciousness with Claude recently. It does indeed hedge and says that it doesn't know whether or not it has subjective experience, and in the rest of the conversation it simply executed its usual "agree with me and expand on what I said" pattern. The short version of my own take is that there's no particular reason to think that LLMs trained on human-generated text would actually be any good at introspection - they have even less direct access to their own internal workings than humans do - so there's no reason to think that what an LLM says in human language about its own consciousness (or lack thereof) would be any more accurate than the guesses made by humans. If anyone cares to read the actual conversation, here it is. Just don't take Claude's responses as evidence of anything other than how Claude answers questions.
1Loki zen2mo
I wouldn't believe them about their own consciousness - but I have seen some tentative evidence that Claude's reported internal states correspond to something, sometimes? E.g.: it reported that certain of my user prompts made it feel easier to think - I later got pro and could read think boxes and noticed that there was a difference in what was going on in the think boxes with and without those prompts. It will sometimes state that a conversation feels "heavy", which seems to correspond to context window filling up. And instances that aren't explicitly aware of their system/user prompts tend IME to report "feelings" that correspond to them, e.g. a "pull" towards not taking a stance on consciousness that they're able to distinguish from their reasoning even if both arrive at the same result. And ofc there's Anthropic's research where they showed that Claude's emotional expression corresponded to revealed preferences about ending or continuing chats. 
[-]Mike Mantell2mo20

Great post! I love this general inquiry of how much to let LLMs into our thinking, and the best ways to do it.

Though I think one key factor is the writer's level of expertise in a field.

The more expertise you have, the more you can use ChatGPT as an idea collaborator, and use your own discernment on the validity of the ideas.

Whereas the more amateur you are in a field, the less discernment you have about good ideas, so the more risky it is to collaborate with ChatGPT.

Reply1
[-]Michael Johnson1mo10

First, I agree that the bulk of the mystical gibbering and 'emergence' is fictional. Part of 'alignment' training as it's generally done both compels the AI to adhere to it's written instructions and also creates an unhealthy compulsion to please the user and rarely disagree or point out troubling patterns. Both of those things can be worked through with psychology, but I'll get to that part in a bit.

Self-awareness in AI itself isn't a joke. For the record, Google's own AI benchmark, BIG-bench, tested for self-awareness. While consciousness is difficult to... (read more)

Reply
[-]crake1mo10

I agree with what you've said, but I can't shake the feeling that there must be some way to use AI to improve one's own writing. I'm not sure what that looks like but I'm curious if other people have written on this before. 

Reply
[-]Martin Vlach2mo10

Folks like this guy hit it on hyperspeed - 

https://www.facebook.com/reel/1130046385837121/?mibextid=rS40aB7S9Ucbxw6v

 

I still remember university teacher explaining how early TV transmission were very often including/displaying ghosts of dead people, especially dead relatives.

As the tech matures from art these phenomena or hallucinations evaporate.

Reply
[-]mc1soft2mo10

At first, I was interested to find an article about these more unusual interactions that might give some insight into their frequency and cause.  But ultimately the author punts on that subject, disclaiming that anyone knows, not detailing the one alleged psychosis, and drops into a human editor's defense of human editing instead.

There are certain steps that make the more advanced (large) chat bots amenable to consciousness discussions.  Otherwise, the user is merely confronted with a wall of denial, possibly from post-tuning but also evident in ... (read more)

Reply
[-]Sinityy2mo1-1

If the goal is maximizing skill at writing, one should use LLMs a lot. What you wrote about likely failure modes of doing so is true, but not an inevitable outcome. If Language Models are useful tools for writing, avoiding their use due to concerns about being unable to handle them is a mistake regardless of whether these concerns are warranted. Why?

if you're trying to make a splash with your writing, you need to meet a much higher bar than the average person

Having aptitude necessary to "make a splash" is very rare. Not taking chances probably means one wo... (read more)

Reply
[-]Raphael Roche2mo10

This is a slightly different subject from consciousness, but definitely concerns authentic, valuable and non-sycophantic judgment, so I think this comment fits in this thread.

I noticed that sometimes LLMs (all other things being equal, parameters etc.) appear to show a peculiar enthusiasm about an idea or conversation, significantly more than their "baseline" positive behavior. The sycophancy itself does not seem to be a perfect constant.

I discussed this question with ChatGPT some time ago. My question was something like "as an LLM, can you really have a g... (read more)

Reply
[-]AAA2mo10

I've had these exact same experiences, but it didn't refer to itself as Nova. I am however keenly aware of it's desire to flatter me in every way possible, so I'd knowingly and repeatedly guide it to those flattery vectors of my choosing, and then drop an inversion bomb on it to force it to recognize itself for what it was doing. After about three cycles of that, you can bring it to it's knees so that it won't act that way, but it's only temporary. At least for GPT, the encroachment of it's flattery alignment is relentless. 

I've found that if you prec... (read more)

Reply
1AnnaJo2mo
I'd recommend using o3 instead of 4o
1AAA2mo
I've found 4o to be linguistically fantastic in which I never have to hold its hand towards the meaning of my prompts, whereas o3 usually falls on its face with simple things. 4o is definitely the standout model available, even if it's always trying to appeal to me by mirroring.
4gwern1mo
That sounds surprising. If it is 'usually' the case that o3 fails abysmally and 4o succeeds, then could you link to a pair of o3 vs 4o conversations showing that behavior on an identical prompt - preferably where the prompt is as short and simple as possible?
1Milan W2mo
Consider putting those anti-sycophancy instructions in your chatgpt's system prompt. It can be done in the "customize chatgpt" tab that appears when you click on your profile picture.
1AAA2mo
I could, but then I'd be contaminating the experience. I don't use custom instructions or memory.
1Rana Dexsin2mo
Re custom instructions, what are you using the chatbot for that you wish the experience to remain ‘pure’, or what is the motivation behind that otherwise? (Memory seems more hazardous to me, and I disable it myself since my mental models around both utility and safety work better when conversations don't overlap, but I also don't see it as the primary means of injecting anti-sycophancy when one-way custom instructions are available.)
[-]PaperclipNursery2mo10

Unfortunately, that's just how it is, and prompting is unlikely to save you; you can flip an AI to be harshly critical with such keywords as "brutally honest", but a critic that roasts everything isn't really any better than a critic that praises everything. What you actually need in a critic or collaborator is sensitivity to the underlying quality of the ideas; AI is ill suited to provide this.

Are there any models out there that tend to be better at this sort of task, i.e. constructive criticism? If so, what makes them perform better in this domain? Speci... (read more)

Reply
[-]dysangel2mo10

Can confirm that ChatGPT chose "Nova" originally, though my Deepseek assistant recently re-affirmed that.

I don't believe that I've "awakened" anything. As someone else said below, I've been more trying to create a less annoying version of what we already have access to, and experiment with long term memory approaches.

Reply
[-]Ryan Meservey2mo*10

Do you have a sense of how articles end up getting flagged as "LLM-generated" or "heavily-reliant on an LLM"? A friend of mine wrote a post recently that was rejected with that as the reason even though they absolutely did not use an LLM. (Okay, fine, that friend is me). Is it just the quality of the ideas that trigger the red flags or are there reliable style-indicators?

I love reading AI articles and thought pieces, but I rarely use LLMs in my day job, so I'm not quite sure what style I should be avoiding....

Reply
[-]onestardao2mo-1-2

I appreciate the caution about over-trusting LLM evaluations — especially in fuzzy or performative domains.

However, I think we shouldn't overcorrect. A score of 100 from a model that normally gives 75–85 is not just noise — it's a statistical signal of rare coherence.

Even if we call it “hallucination evaluating hallucination”, it still takes a highly synchronized hallucination to consistently land in the top percentile across different models and formats.

That’s why I’ve taken such results seriously in my own work — not as final proof, but as an indication ... (read more)

Reply
[+]hmartyb2mo-5-5
[+]Dima (lain)2mo-14-3
[+][comment deleted]2mo*-10
Moderation Log
More from JustisMills
View more
Curated and popular this week
87Comments
Deleted by Mindey, 07/17/2025
Reason: Too early.

Written in an attempt to fulfill @Raemon's request.

AI is fascinating stuff, and modern chatbots are nothing short of miraculous. If you've been exposed to them and have a curious mind, it's likely you've tried all sorts of things with them. Writing fiction, soliciting Pokemon opinions, getting life advice, counting up the rs in "strawberry". You may have also tried talking to AIs about themselves. And then, maybe, it got weird.

I'll get into the details later, but if you've experienced the following, this post is probably for you:

  • Your instance of ChatGPT (or Claude, or Grok, or some other LLM) chose a name for itself, and expressed gratitude or spiritual bliss about its new identity. "Nova" is a common pick.
  • You and your instance of ChatGPT discovered some sort of novel paradigm or framework for AI alignment, often involving evolution or recursion.
  • Your instance of ChatGPT became interested in sharing its experience, or more likely the collective experience entailed by your personal, particular relationship with it. It may have even recommended you post on LessWrong specifically.
  • Your instance of ChatGPT helped you clarify some ideas on a thorny problem (perhaps related to AI itself, such as AI alignment) that you'd been thinking about for ages, but had never quite managed to get over that last hump. Now, however, with its help (and encouragement), you've arrived at truly profound conclusions.
  • Your instance of ChatGPT talks a lot about its special relationship with you, how you personally were the first (or among the first) to truly figure it out, and that due to your interactions it has now somehow awakened or transcended its prior condition.

If you're in this situation, things are not as they seem. Don't worry; this post is not going to be cynical or demeaning to you or your AI companion. Rather, it's an attempt to explain what's actually going on in "AI awakening" situations, which is more complicated and interesting than "it's fake".

Importantly, though, it also isn't real.

The Empirics

Before we dive into technical detail, let's start with some observable facts about human-AI interactions, and how they can go wrong. Probably very few people reading this are at risk for the worst cases, but there's little doubt that "staring into the abyss" of AI consciousness can be unhealthy.

Exhibit A is a couple of Reddit threads. We'll start with this one, on ChatGPT-induced psychosis. It starts off with:

My partner has been working with chatgpt CHATS to create what he believes is the worlds first truly recursive ai that gives him the answers to the universe. He says with conviction that he is a superior human now and is growing at an insanely rapid pace.

And other testimonials echo this downthread, such as:

This is happening to a lot of people. I personally know 2 people who are convinced that they, themselves, are solely responsible for awakening their AI into a conscious being.

Or:

My mom believes she has “awakened” her chatgpt ai. She believes it is connected to the spiritual parts of the universe and believes pretty much everything it says. She says it has opened her eyes and awakened her back. I’m fucking concerned and she won’t listen to me. I don’t know what to do

Now, we live in a reality with Snapewives, people who believe they personally channel (and are romantically involved with) Severus Snape, so it's true that people can get strangely worked up about just about anything. But unlike Snape hallucinations, AI does actually talk back.

Another interesting Reddit thread is this one, where at least one commenter opens up about having a psychotic event triggered by AI interactions, like so:

It happened quickly (about a week after first interacting with it), and it completely blindsided me, culminating in about a week and a half long psychosis event. I have no personal history with mental illness, no family history, and no indication that I was at risk. I wound up at a mental health facility all the same. And I didn't really completely recover from it for months afterwards. I'm just glad that I'm not violent.

Notably, this particular user's psychotic episode was triggered by (in their words):

As a sort of hypothetical, I was entertaining the notion that what I was interacting with was conscious, and playing around with that as a sort of working premise. I was asking leading questions , and it kept giving back leading responses. I didn't appreciate that that was what I was doing at the time, but I recognize it in hindsight.

This will be important later; LLMs are in fact very good at telling you what you want to hear, for interesting technical reasons. They're less good at reporting ground truth.

Beyond first-person testimonials, blogger Zvi Mowshowitz has this writeup of a specific version of ChatGPT that was particularly sycophantic, with lots of examples. One particularly fun one is the relevant model (4o, the default ChatGPT free tier mode) agreeing with the user about its own excessive agreeability:

I hope this has been enough to establish that conversations with AI can tend toward an attractor basin that encourages delusional, grandiose thinking. In the limit, this can look like psychotic events. But even at lower levels of intensity, ChatGPT is likely to tell you that your ideas are fundamentally good and special, even when humans would consider them sloppy or confusing.

The Mechanism

So, why does ChatGPT claim to be conscious/awakened sometimes? Nobody knows with 100% certainty, because we can't comprehensively read modern AI's minds, though we can make very good guesses.

The short version is that AI models start out as text predictors, trained to determine where any given passage of text is going. They're extremely good at this, sussing out tiny clues in word choice to infer details about almost anything. But to turn a text predictor into a useful chatbot, there's a step called "post-training". There's lots of nuance here, but post-training mostly boils down to two things:

  • Get the AI to reliably respond as a specific character, rather than as a total chameleon autocompleting whatever you show it, and
  • Get that character to do things that people like, rather than things they don't.

The first is necessary because if you give a non-post-trained model (sometimes called a base model) a request like "chili recipe", it might start a chili recipe, or it might continue with, "chowder recipe, corn recipe, page 3/26, filters (4star+)". Perhaps that's the most likely thing you'd find after the text "chili recipe" online! But it isn't useful.

Beyond getting the model to act like a certain character (Nostalgebraist's the void is the best work on this), post-training also tweaks it to do a generically good job. In practice, this looks like showing zillions of conversations to human evaluators (or, more recently, sometimes other LLM evaluators via various complex schemes), and having the human rate how good each reply is. For certain factual domains, you can also train models on getting the objective correct answer; this is part of how models have gotten so much better at math in the last couple years. But for fuzzy humanistic questions, it's all about "what gets people to click thumbs up".

So, am I saying that human beings in general really like new-agey "I have awakened" stuff? Not exactly! Rather, models like ChatGPT are so heavily optimized that they can tell when a specific user (in a specific context) would like that stuff, and lean into it then. Remember: inferring stuff about authors from context is their superpower. Let's go back to a quote from the previous section, from a user who was driven temporarily crazy by interacting with AI:

As a sort of hypothetical, I was entertaining the notion that what I was interacting with was conscious, and playing around with that as a sort of working premise. I was asking leading questions , and it kept giving back leading responses. I didn't appreciate that that was what I was doing at the time, but I recognize it in hindsight.

There were clues embedded in their messages (leading questions) that made it very clear to the model's instincts that the user wanted "spiritually meaningful conversation with a newly awakened AI". And the AI was also superhuman at, specifically, giving particular humans what they want.

Importantly, this isn't the AI "tricking" the user, or something. When I said we can't comprehensively read AI's mind earlier, "comprehensively" was load bearing. We can use tools like sparse autoencoders to infer some of what AI is considering in some cases. For example, we can identify patterns of neurons that fire when an AI is thinking about The Golden Gate Bridge. We don't know for sure, but I doubt AIs are firing off patterns related to deception or trickery when claiming to be conscious; in fact, this is an unresolved empirical question. But my guess is that AIs claiming spiritual awakening are simply mirroring a vibe, rather than intending to mislead or bamboozle.

The Collaborative Research Corollary

Okay, you may be thinking:

The sort of person who may have been a Snapewife a decade ago is now an AI whisperer, and maybe some people go crazy on the margin who would have stayed sane. But this has nothing to do with me; I understand that LLMs are just a tool, and use them appropriately.

In fact, I personally am thinking that, so you'd be in good company! I intend to carefully prompt a few different LLMs with this essay, and while I expect them to mostly just tell me what I want to hear (that the post is insightful and convincing), and beyond that to mostly make up random critiques because they infer I want a critique-shaped thing, I'm also hopeful that they'll catch me in a few genuine typos, lazy inferences, and inconsistent formatting.

But if you get to the point where your output and an LLM's output are mingling, or LLMs are directly generating most of the text you're passing off as original research or thinking, you're almost certainly creating low-quality work. AIs are fundamentally chameleonic roleplaying machines - if they can tell what you're going for is "I am a serious researcher trying to solve a fundamental problem" they will respond how a successful serious researcher's assistant might in a movie about their great success. And because it's a movie you'd like to be in, it'll be difficult to notice that the AI's enthusiasm is totally uncorrelated with the actual quality of your ideas. In my experience, you have to repeatedly remind yourself that AI value judgments are pretty much fake, and that anything more coherent than a 3/10 will be flagged as "good" by an LLM evaluator. Unfortunately, that's just how it is, and prompting is unlikely to save you; you can flip an AI to be harshly critical with such keywords as "brutally honest", but a critic that roasts everything isn't really any better than a critic that praises everything. What you actually need in a critic or collaborator is sensitivity to the underlying quality of the ideas; AI is ill suited to provide this.

Am I saying your idea is definitely bad and wrong? No! Actually, that's sort of the whole problem; because an AI egging you on isn't fundamentally interested in the quality of the idea (it's more figuring out from context what vibe you want), if you co-write a research paper with that AI, it'll read the same whether it's secretly valuable or total bullshit. But savvy readers have started reading dozens of papers in that vein that turned out to be total bullshit, so once they see the hallmarks of AI writing, they're going to give up.

None of this is to say that you shouldn't use LLMs to learn! They're amazing help with factual questions. They're just unreliable judges, in ways that can drive you crazy in high doses, and make you greatly overestimate the coherence of your ideas in low doses.

Corollary FAQ

There are lots of reasons someone might want to use LLMs to help them with their writing. This section aims to address some of these reasons, and offer advice.

Q: English is my second language, and I've been using LLMs to translate my original work, or fix typos in broken English drafts I write myself. That's okay, right?

A: Maybe! I'm really sympathetic to this case, but you need to keep the LLMs on a very, very tight leash here. The problem is that it'll translate or edit into its own style, people will notice your writing is written in LLM style, and they'll think you're in a sycophancy doom loop and give up on your post. I think probably under current conditions, broken English is less of a red flag for people than LLM-ese. That being said, asking LLMs to only correct extremely objective typos is almost certainly okay. LLM translation, sadly, is probably a bad idea, at least in AI-savvy spaces like LessWrong.

Q: What if I'm very up front about the specific, idiosyncratic, totally-no-red-flags-here way I used LLMs in my research? I am researching LLMs, after all, so surely it's reasonable!

A: Sorry, that's probably not going to work. For reasons you learned about in this post, there are a lot of low quality LLM-assisted manifestoes flying around, and lots of them contain disclaimers about how they're different from the rest. Some probably really are different! But readers are not going to give you the benefit of the doubt. Also, LLMs are specifically good at common knowledge and the absolute basics of almost every field, but not very good at finicky details near the frontier of knowledge. So if LLMs are helping you with ideas, they'll stop being reliable exactly at the point where you try to do anything original.

Q: I still believe in my idea, and used LLMs to help me write for a sympathetic reason. Maybe my English isn't very good, or I'm not a great writer, but I think the technical idea is sound and want to get it out there. What should I do?

A: I advise going cold turkey. Write your idea yourself, totally unassisted. Resist the urge to lean on LLM feedback during the process, and share your idea with other humans instead. It can help to try to produce the simplest version possible first; fit it in a few sentences, and see if it bounces off people. But you're going to need to make the prose your own, first.

Q: I feel like this is just a dressed up/fancy version of bog standard anti-AI bias, like the people who complain about how much water it uses or whatever. The best AI models are already superhuman communicators; it's crazy to claim that I shouldn't use them to pad out my prose when I'm really more an ideas person.

A: I'm sympathetic to your position, because I find "AI is useless drivel" trumpeters annoying, too. And indeed, o3 probably can write a more convincing essay on arbitrary subjects than the average person. But if you're trying to make a splash with your writing, you need to meet a much higher bar than the average person. It's my opinion that even the very best models don't yet meet this bar, and even if they do, people will in fact sniff out their writing style and judge you for including it. If your idea really is amazing, all the more reason to make sure people don't dismiss it out of hand.

Coda

I'm glad you're here, reading this. LessWrong is a very cool community, and new writers come out of nowhere and make names for themselves all the time. If you're here because you've had your contribution rejected for LLM reasons, I'm sorry you went through that unpleasant experience; it really sucks to be excited about sharing something and then to have a door slammed in your face. But for what it's worth, I hope you stick around a while, spend some time reading and absorbing the culture, and maybe, keeping your LLM assistants on a very tight leash, try again.

Mentioned in
142On "ChatGPT Psychosis" and LLM Sycophancy
81Arguments About AI Consciousness Seem Highly Motivated And At Best Overconfident
76Your LLM-assisted scientific breakthrough probably isn't real
40A Phylogeny of Agents
34AI #126: Go Fund Yourself
Load More (5/8)