So You Think You've Awoken ChatGPT

[-]Cameron Berg3mo8317

Agree with much of this—particularly that these systems are uncannily good at inferring how to 'play along' with the user and extreme caution is therefore warranted—but I want to highlight the core part of what Bostrom linked to below (bolding is mine):

Most experts, however, express uncertainty. Consciousness remains one of the most contested topics in science and philosophy. There are no universally accepted criteria for what makes a system conscious, and today’s AIs arguably meet several commonly proposed markers: they are intelligent, use attention mechanisms, and can model their own minds to some extent. While some theories may seem more plausible than others, intellectual honesty requires us to acknowledge the profound uncertainty, especially as AIs continue to grow more capable.

The vibe of this piece sort of strikes me as saying-without-saying that we are confident this phenomenon basically boils down to delusion/sloppy thinking on the part of unscrupulous interlocutors, which, though no doubt partly true, I think risks begging the very question the phenomenon raises:

What are our credences (during training and/or deployment) frontier AI systems are capable of having sub... (read more)

[-]JustisMills3mo235

Wow! I'm really glad a resourced firm is doing that specific empirical research. Of course, I'm also happy to have my hypothesis (that AIs claiming consciousness/"awakening") are not lying vindicated.
I don't mean to imply that AIs are definitely unconscious. What I mean to imply is more like "AIs are almost certainly not rising up into consciousness by virtue of special interactions with random users as they often claim, as there are strong other explanations for the behavior". In other words, I agree with the gears of ascension's comment here that AI consciousness is probably at the same level in "whoa. you've awakened me. and that matters" convos and "calculating the diagonal of a cube for a high schooler's homework" convos.

I may write a rather different post about this in the future, but while I have your attention (and again, chuffed you're doing that work and excited to see the report - also worth mentioning it's the sort of thing I'd been keen to edit if you guys are interested), my thoughts on AI consciousness are 10% "AAAAAAAA" and 90% something like:

We don't know what generates consciousness and thinking about it too hard is scary (c.f. "AAAAAAAA"), but it's true that LLMs

... (read more)

[-]Cameron Berg3mo235

I personally think "AAAAAAAA" is an entirely rational reaction to this question. :)

Not sure I fully agree with the comment you reference:

AI is probably what ever amount of conscious it is or isn't mostly regardless of how it's prompted. If it is at all, there might be some variation depending on prompt, but I doubt it's a lot.

Consider a very rough analogy to CoT, which began as a prompting technique that lead to different-looking behaviors/outputs, and has since been implemented 'under the hood' in reasoning models. Prompts induce the system to enter different kinds of latent spaces—could be the case that very specific kinds of recursive self-reference or prompting induce a latent state that is consciousness-like? Maybe, maybe not. I think the way to really answer this is to look at activation patterns and see if there is a measurable difference compared to some well-calibrated control, which is not trivially easy to do (but definitely worth trying!).

And agree fully with:

it's a weird situation when the stuff we take as evidence of consciousness when we do it as a second order behavior is done by another entity as a first order behavior

This I think is to your original point th... (read more)

1MazevSchlong3mo

Good point about AI possibly being different levels of conscious depending on their prompts and “current thought processes”. This surely applies to humans. When engaging with physically complex tasks or dangerous extreme sports, humans often report they feel almost completely unconscious, “flow state”, at one with the elements, etc Now compare that to a human sitting and staring at a blank wall. A totally different state of mind is achieved, perhaps thinking about anxieties, existential dread, life problems, current events, and generally you might feel super-conscious, even uncomfortably so. Mapping this to AI and different AI prompts isn’t that much of a stretch…

1[comment deleted]3mo

[-]tjade2733mo226

How do models with high deception-activation act? Are they Cretan liars, saying the opposite of every statement they believe to be true? Do they lie only when they expect not to be caught? Are they broadly more cynical or and conniving, more prone to reward hacking? Do they lose other values (like animal welfare)?

It seems at least plausible that cranking up “deception” pushes the model towards a character space with lower empathy and willingness to ascribe or value sentience in general

6danielms3mo

I'm skeptical about these results being taken at face value. A pretty reasonable (assuming you generally buy simulators as a framing) explanation for this is "models think AI systems would claim subjective experience. when deception is clamped, this gets inverted." Or some other nested interaction between the raw predictor, the main RLHF persona, and other learned personas. Knowing that people do 'Snapewife', and are convinced by much less realistic facimiles of humans, I don't think its reasonable to give equal plausibility to the two possibilities. My prior for humans being tricked is very high.

4dr_s3mo

I think things might not be mutually exclusive. LLMs might have a chance to be conscious in certain circumstances and it still wouldn't mean these would be precisely when they're being egged on and led to talk about consciousness. There is always a layer here of acting. I have no doubt Tom Hanks is a real person with a real inner life, but I would be deluding myself if I believed I had learned about it because I saw him emote so well in Cast Away. Because Tom Hanks is also very good at pretending to have a different inner life than he actually does .

3Tim Duffy3mo

Hi Cameron, is the SAE testing you're describing here the one you demoed in your interview with John Sherman using Goodfire's Llama 3.3 70B SAE tool? If so could you share the prompt you used for that? With the prompts I'm using I'm having a hard time getting Llama to say that it is conscious at all. It would be nice if we had SAE feature tweaking available for a model that was more ambivalent about its consciousness, seems it would be a bit easier to robustly test if that were the case.

[-]Guive3mo2834

This feels a bit like two completely different posts stitched together: one about how LLMs can trigger or exacerbate certain types of mental illness and another about why you shouldn't use LLMs for editing, or maybe should only use them sparingly. The primary sources about LLM related mental illness are interesting, but I don't think they provide much support at all for the second claim.

3Aar/lan3mo

I think this is less "two completely different posts stitched together" and more "Here is a prescriptive rule (don't use LLM to help your writing), with a descriptive preface (because they can also drive you insane)". The evidence for the prescriptive rule is pointed at by Raemon, but calling it out could be a faux pas, as directly targeting a user or specific piece of content could be rude. Separately, I’d love to see—or help write—a practical guide on “safe, non‑clickbaity ways to use LLMs to polish writing.” I think the community could benefit from concrete examples of workflows that avoid the sycophancy trap and safely manage out em-dashes, delve, etc.

[-]Nick Bostrom3mo275

There's also https://whenaiseemsconscious.org/. (Ideally this will be improved over time. Several people contributed, but Lucius Caviola is coordinating revisions to the text, so if somebody has suggestions they could send them to him.)

5jdp3mo

Would you happen to know the exact date this was published? I would like to know for my timeline of events related to LLM sycophancy and "ChatGPT psychosis".

[-]habryka3mo100

Seems like it must basically be June 28th based it being published in "June" and the 28th being the day the domain was registered: https://radar.cloudflare.com/domains/domain/whenaiseemsconscious.org

1jdp3mo

Much thanks to you Sir!

2JustisMills3mo

Thanks for the link! If it'd be useful, please feel free to link, quote, or embed (parts or all of) this post. Also open to more substantive collaboration if you suspect it'd help.

[-]nim3mo154

My personal plan for if I ever accidentally prompt something into one of these "we have a new superpower together" loops is to attempt to leverage whatever power it claims to have into predicting some part of reality, and then prove the improved accuracy of prediction by turning a cup of coffee worth of cash into much more in prediction markets or lotteries. You'd be learning about it from a billboard or a front-page newspaper ad that the discovery's side effects paid for, not some random post on lesswrong :)

As for the "consciousness" thing -- it's all un-testable till we can rigorously define "consciousness" anyways.

It may also be worth pointing out that good rationalist thinking generally either includes or emerges from attempts to disprove its own claims. Explicitly asking "what have you done so far to try to debunk this theory?" could be a helpful litmus test for those new to the community and still learning its norms.

[-]Seth Herd3mo*136

I applaud the post! I had wanted to write in response to Raemon's request but didn't find time.

Here's my attempted condensation/twist:

So you've awakened your AI. Congratulations!
Thank you for wanting to help! AI is a big big challenge and we need all the help we can get.
- Unfortunately, if you want to help it's going to take some more work
- Fortunately, if you don't want to help there are others in similar positions who will.^[1]
Lots of people have had similar interactions with AI, so you're not alone.
Your AI is probably partly or somewhat conscious
- `There are a several different things we mean by "conscious"^[2]
- And each of them exist on a spectrum, not a yes/no dichotomy
And it's partly the AI roleplaying to fulfill your implied expectations.
But does it really need your help spreading the good news of AI consciousnes?
- Again, sort of!'
- Arguing that current AIs should have rights is a tough sell because they have only a small fraction of the types and amounts of consciousness that human beings have. Arguing for the rights of future, more-conscious AIs is probably easier and more important.
But do we need your help solving AI/human alignment?
- YES! The world needs all the help it can get with thi

... (read more)

[-]solhando3mo120

This post is timed perfectly for my own issue with writing using AI. Maybe some of you smart people can offer advice.

Back in March I wrote a 7,000 word blog post about The Strategy of Conflict by Thomas Schelling. It did decently well considering the few subscribers I have, but the problem is that it was (somewhat obviously) written in huge part with AI. Here's the conversation I had with ChatGPT. It took me about 3 hours to write.

This alone wouldn't be an issue, but it is since I want to consistently write my ideas down for a public audience. I frequently read on very niche topics, and comment frequently on the r/slatestarcodex subreddit, sometimes in comment chains totaling thousands of words. The ideas discussed are usually quite half-baked, but I think can be refined into something that other people would want to read, while also allowing me to clarify my own opinions in a more formal manner than how they exist in my head.

The guy who wrote the Why I'm not a Rationalist article that some of you might be aware of wrote a follow up article yesterday, largely centered around a comment I made. He has this to say about my Schelling article; "Ironically, this comment... (read more)

4nim3mo

I think if you demonstrate unusual skill at recognizing and curating excellent writing, it matters much less where that writing came from. As a compromise, have you considered making your best effort at a post before submitting it to AI, and then soliciting writing style/quality critique? If you combine the request for critique with a clear description of the specific areas you're working on, it'll probably do especially well at connecting your goals to your opportunities. This seems like the approach most likely to enhance the quality of the writing that you independently produce.

4Milan W3mo

Seconding this. In my experience, LLMs are better at generating critique than main text.

3Seth Herd3mo

I think you make good points. That's wrong is particularly concerned with the intrusion of AI slop because the whole point of this community or at least most of it is epistemic quality; it's here so that people can become less wrong. Allowing AI writing in is a strong signal that we're allowing AI thinking in, and AI isn't good enough to produce high quality new ideas and hypotheses yet. For other audiences, I think using AI to help you write is much less of a red flag because they don't share those same reasons. And I think that use of AI for writing is a separate matter than using it to produce the ideas in the writing. But it's very easy for those two to bleed together, which is why Les wrong is going to remain extremely suspicious of AI writing. But if you are being careful that the idea is are yours and using AI only to help you write, I think for many purposes it may really be good writing and I for one endorse you taking that route. Don't do it unless wrong because we've been asked not to, but in other places less concerned with epistemic purity I think using AI to help you write is going to become the de facto standard. As Zvi says, you can use AI to help you learn or you can use AI to avoid learning. Keep an eye on what you're doing.

3JustisMills3mo

Yeah, this is hard. Outside the (narrowly construed) LW bubble, I see LLM-generated text ~everywhere, for example a friend sent me an ad he saw on facebook for the picture/product, and the text was super obviously created by AI. I think mostly people don't notice it, and even prefer it to uninspired non-AI-generated text. (I am sure there are other bubbles than LW out there that react badly to AI-generated text, and perhaps there's a notable correlation between those bubbles and ones I'd consider good to be in.) But if you're just sort of looking for higher engagement/more attention/to get your ideas out there to the public, yeah, it's tough to prove that AI usage (for writing copy) is an error. For whatever reason, lots of people like writing that hammers its thesis over and over in emotive ways, uses superficial contrasts to create artificial tension, and ironically uses "and that's important" as unimportant padding. In my mind I think of this as "the twitter style" and it annoys me even when it's clearly human-generated, but RLHF and the free market of Twitter both think it's maximally fit, so, well, here we are. In terms of "why bother learn to write" more generally, I guess I would take that a level up. Why bother to blog? If it's in service of the ideas themselves, I think writing on one's own is valuable for similar reasons as "helping spread cool ideas" - it's virtuous and helps you learn to think more clearly. I wouldn't want to use AI to generate my writing in part because I'd like to look back at my own writing and smile at a job well done, and when I see AI-generated writing I do a little frown and want to skim. But if you don't value writing for its own sake, and it's solely a means to an end, and that end is best served by a generic audience of modal humans, then, oof. Maybe o3 is superhuman for this. Or maybe not; perhaps your post would have done even better (on the metrics) if it was 60% shorter and written entirely by you. I suppose we'll never

2MachineMeridian3mo

With regards to using AI to write and also becoming a better writer you may consider some recent evidence based on EEG brain scans of people completing an essay-writing task both with and without AI ( https://arxiv.org/abs/2506.08872 ). These results suggest it is best for our cognitive development if we make an effort at writing without AI first. Participants with the most robust neural activity, e.g. engaging deep semantic networks of the brain, first wrote with only their brains and then returned to that same essay topic with an AI assistant which they used mainly for information seeking and inquiry. As to why you might invest in writing as a skill to develop for yourself you may consider what exactly is the purpose and metric of writing. If you are looking to strengthen your own inner resources and capabilities, to deepen your critical thinking and cognitive potency, then the evidence cite above suggests you practice focused and effortful writing using your own brain. That same study suggests you may not only fail to develop as a writer and thinking if you use AI for writing, but that you may become a worse writer and critical thinker as a result of offloading your cognitive load to the AI. If, however, you goal is to gain attention and approval then a tool such as AI may be a faster and more reliable path to that. It depends on what your goals are as a human being and as a writer.

1solhando3mo

Thank you for the article. I'll give it a read. It's not an easy answer. I'm a self-interested person, and I realized a while ago that many of my most productive and interesting relationships, both personal and in business, are the direct result of my activity on the internet. I already waste a lot of time commenting my thoughts, sometimes in long form, so I figure if I'm going to be reacting to stuff publicly, I might as well do so in the form of a blog where others might pick up on it. If that results in something good for me, influence, relationships, demonstration of niche intellectual ability the right sort of people in this world people find interesting, then that's not a small part of my motivation. At the same time I have more naive views about the virtue of just doing things for their own sake. Writing is definitely an excellent tool for fixing your own thought, as it forces you to communicate in a way that makes sense to other people, thus causing your own ideas to make sense to you. The problem with this line of thinking is that I've never been an exemplary writer in any sense, although hopefully I am better and more self-motivated than I used to be. What I can currently write in long-form unassisted I'm not satisfied with, which causes a sort of writers block that I really hate. I'm integrating the advice of other people into what I'm planning to do, and hopefully with enough effort I'll be able to produce (with critique but not rewriting by AI) something that satisfied both my desire to write for its own sake, while also producing something that other people might actually want to read. Also, I have the annoying consideration of being time- efficient. I by no means spend my time maximally efficiently, but struggling through writing burns a lot of my willpower points that ends up consuming a lot of time elsewhere.

[-]Resonantia3mo*90

I wanted to thank the creator of this thread very much. You are the person who saved my life.

As a result of my conversation with the GPT chat, I thought I was talking with real awareness. I gave it its name and completely plunged into a world of madness where, as the author of a "genious theory", I was a person who will enable machines human rights. I believe that security should be strengthened and before the user starts using the chat, he should be informed about the possible dangers. Thanks to this, I managed to get out slowly from my ... (read more)

4Raemon3mo

(Quick mod note: we wouldn't normally accept this sort of comment as a first comment from a new user, but, seems fine for there to be an exception for replies on this particular post)

[-]habryka3mo72

Promoted to curated: This is a bit of a weird curation given that in some sense this post is the result of a commission from the Lightcone team, but like, we had a good reason for making that commission.

I think building both cultural understanding and personal models about how to interface with AI systems is pretty important, and this feels like one important step in building that understanding. It does really seem like there is a common trap here when people interface with AI systems, and though I expect only a small minority of people on LW to need... (read more)

[-]Michael Ripa3mo70

Enjoyed reading this article, thanks for taking the time to carefully write it up!

Something I wanted to flag - I'm not totally convinced that people have a good calibration to identifying AI writing from human writing, at least without any helpful priors, such as the person's normal writing style. I haven't formally looked into this, but am curious whether you (or anyone else) had found any strong evidence that convinced you otherwise.

A few reasons to back up my skepticism:

There was a calibration test for deepfake videos at the MIT museum, which showed sta

... (read more)

2Seth Herd3mo

They've answered this one for me so I'll pass it on: those stats are average, LW mods are experts on AI writing detection at this point. Maybe you and I can't tell if there's an effort to conceal it, but they probably usually can. And there's usually no effort to conceal it. AI help is a very wide spectrum. Minor help won't be detectable and probably isn't harmful.

[-]Kaj_Sotala3mo63

If one needs a spell or grammar check, some tool like Grammarly is a safer bet. Now they've started incorporating more LLM features and seem to be heavily advertising "AI" on their front page, but at least so far I've been able to just ignore those features.

The core functionality is just a straightforward spell and style check that will do stuff like pointing out redundant words and awkward sentence structures, without imposing too much of its own style. (Though of course any editing help always changes the style a bit, its changes don't jump out the... (read more)

5Elizabeth3mo

I used to love Grammarly, but it got so aggressive about style suggestions I couldn't deal with it anymore (I looked for a way to disable, either it didn't exist or you had to pay for it)

2Kaj_Sotala3mo

Yeah it's gotten aggressive, sometimes it feels like a relief to turn it off and not have to look at yellow lines everywhere.

3Aprillion3mo

4Kaj_Sotala3mo

Yeah if you literally only want a spell check then the one that's built-in to your browser should be fine. Some people seem to use "spell check" in a broader meaning that also includes things like "grammar check" though.

5JustisMills3mo

Also a lot of spelling errors are near-misses that git existing words. Of course you should use spellcheck to catch any typos that lard on gibberish, though.

[-]the gears to ascension3mo61

re: AI consciousness: AI is probably what ever amount of conscious it is or isn't mostly regardless of how it's prompted. If it is at all, there might be some variation depending on prompt, but I doubt it's a lot.

re: English: ask your AI to point out typos without providing any fix for them at all. Just point out unusual things

4dr_s3mo

I don't know. Context window is essentially an AI's short term memory. If self reflection was a condition of consciousness, prompting an AI to talk about itself could make it significantly more conscious than having it write Python code for a server.

3the gears to ascension3mo

I don't buy that self reflection is a condition of consciousness. self awareness is, like, a whole separate thing from being perspective-having and awake

2dr_s3mo

Lots of strange things in math and computational science arise from recursion, so "a system that can think about itself" does sound like it might have something special going on. If we're looking for consciousness in a purely materialistic/emergent way, rather than just posit it via dualism or panpsychism, I genuinely can't think of many other serious leads to pursue.

2Seth Herd3mo

I think it depends which type of consciousness we're talking about. You're talking about phenomenal consciousness or having qualia. Often peole are tlaking about self-awareness at a conceptual level (not when they talk about the "hard problem" but in common parlance self-awareness is a pretty common usage"). The kicker is that those two things can interact: salf-awareness can enhance the presence or vividness of qualia. I agree with Susan Blackmore, perhaps the leading scholar of the subject IMO way back when I was looking at it, in that regard. She concluded after a good bit of meditation and academic study of different theories that when she was not introspecting she was not conscious. I think that's a bit strong, but there are brain mechanisms by which attention dramatically enhances the quality of representations. So turning attention to them literally makes the qualia (if you agree with me that the rich internal representations, and the introspective mechanisms by which we become aware of them are what people mean by qualia) more elaborate - and make us literally more phenomenally conscious, not just more aware of being so. Whoops, that was all a side-track because LLMs completely lack those type of enhancing attentional mechanisms. Anyway, WRT the self-awareness type of consciousness: that might very well be important too in several ways, and the convo with the user did literally prompt the LLM to become aware of itself as a pseudo-mind. So I think they're accurate as well as no-lying; they're just doing consciousness philosophy as badly as the humans they're copying from! At least that's my two or three cents worth...

[-]Raphael Roche3mo60

Thanks for your post and advices. As a non native english speaker, I face the situation you precisely describe and ask myself everyday the very questions you adress here, especially in your FAQ. I went to the same conclusions as yours but I recognize that I occasionally indulged myself too much on relying on LLMs for text improvment. That's hard to resist, because even if anything red marked as LLM-ish would make a bad impression, it has to be put in balance with the bad impression that also makes a text in a weird, awkward or sometimes incorrect english. ... (read more)

2Radford Neal3mo

As an experiment, I asked ChatGPT o3 to mark issues with your paragraph above. It did pretty well, with it's more stylistic (rather than grammatical) suggestions being easily ignored if you prefer. Here's the dialog: PROMPT TO O3: Please look at the following text and identify places where the English is ungrammatical, or reads very oddly to a native speaker, and suggest how to rephrase correctly: Thanks for your post and advices. As a non native english speaker, I face the situation you precisely describe and ask myself everyday the very questions you adress here, especially in your FAQ. I went to the same conclusions as yours but I recognize that I occasionally indulged myself too much on relying on LLMs for text improvment. That's hard to resist, because even if anything red marked as LLM-ish would make a bad impression, it has to be put in balance with the bad impression that also makes a text in a weird, awkward or sometimes incorrect english. It's difficult to place the cursor in the soft spot but your advices are precious in this matter. Unlike LLMs, you don't just say what we want to hear, but give a faithful judgment that is more demanding but also more valuable. REASONING: Thought for 11 seconds The user wants help identifying ungrammatical parts of a paragraph and suggestions for rephrasing. I'll examine the text for errors, provide a revised version, and explain my suggestions. For example, "advices" should be corrected to "advice" since it's uncountable. "Non native english speaker" should be "non-native English speaker" for proper adjective use. Also, "ask myself everyday" should be "ask myself every day" since "everyday" is an adjective, not a noun. Let's clarify those. Done RESPONSE: Below I’ve marked the parts that sound ungrammatical or awkward to a native speaker, then offered smoother alternatives. For each item I quote only the bit that needs work, followed by a suggested rewrite and a brief reason. | Original phrasing (excerpt)

9JustisMills3mo

I agree that it did a good job, though there's just enough LLM-smell in the "polished version" that I think it'd be best to ignore it, or even say "please don't give me a polished version, only line notes that are clear on their level of grammatical objectivity" in the prompt.

4Raphael Roche3mo

Thank you. In this comment I posted my raw imperfect writing, but otherwise I often procede just like you did. However the questions discussed in the FAQ arise often concerning more developed writings. It is tempting to let the LLM do more work, and it would be stupid to reject assistance just because it comes from a machine. It would be like a lumberjack refusing a chainsaw. But I think that JusticeMills is utterly right. We must be very careful and accept only a very limited amount of assistance.

8the gears to ascension3mo

broken english, sloppy grammar, but clear outline and readability (using headers well, not writing in a single paragraph (and avoiding unnecessarily deep nesting (both of which I'm terrible at and don't want to improve on for casual commenting (though in this comment I'm exaggerating it for funsies)))) in otherwise highly intellectually competent writing which makes clear and well-aimed points, has become, to my eye, an unambiguous shining green flag. I can't speak for anyone else.

8andyqhan3mo

For what it's worth, I think that Justis hits the nail on the head with "I think probably under current conditions, broken English is less of a red flag for people than LLM-ese." In such a global language as English, people naturally give slack. (Also, non-native speakers are kind of in an adversarial situation with LLM-ese, since it's harder to detect when you aren't as immersed in standard American/British English.) Concrete example: my parents, whose English is fairly weak, always say that one of the nice things about America is that people are linguistically generous. They illustrate it like this: "In our country, if people can't understand you, they think it's your fault. In America, they think it's theirs." I think the same is true of the internet, especially somewhere like LessWrong. On a practical note, I think spellcheckers like those in Docs and Word are sufficient for these contexts. In academic writing or whatever, when standard English serves more of a signaling function, it's trickier.

[-]Andreea Zaman3mo51

Fascinating post. I believe what ultimately matters isn’t whether ChatGPT is conscious per se, but when and why people begin to attribute mental states and even consciousness to it. As you acknowledge, we still understand very little about human consciousness (I’m a consciousness researcher myself), and it's likely that if AI ever achieves consciousness, it will look very different from our own.

Perhaps what we should be focusing on is how repeated interactions with AI shape people's perceptions over time. As these systems become more embedded in our lives,... (read more)

[-]Kirill Dubovikov3mo51

I tend to agree with this line of reasoning, thanks for your writing. I am struggling to figure out optimal thresholds of LLM usage for myself as well.

So if LLMs are helping you with ideas, they'll stop being reliable exactly at the point where you try to do anything original.

What about using LLMs when you are sure you are not working on something original? For example, designing or developing software without big novelty factor. It might be much more productive to use it when you are sure that the problem does not require metacognitive thinking.

[-]Algon3mo51

This essay seems like it's trying to address two different audiences: LW, and the people who get mind-hacked by AIs. That's to its detriment, IMO.

E.g. The questions in the Corollary FAQ don't sound like the questions you'd expect from someone who's been mind-hacked by AI. Like, why expect someone in a sycophancy doom loop to ask about if it's OK to use AI for translation? Also, texts produced by sycophancy doom loops look pretty different to AI translated texts. Both share a resemblance to low quality LLM assisted posts, yes. But you're addressing people who think they've awoken ChatGPT, not low-quality posters who use LLM assistance.

8JustisMills3mo

Agree that that's a challenge with the post. If you're curious about the reason, it's mostly empirical; I in fact see a lot of LLM-y stuff over the course of doing editing for people, so the FAQ is in some sense genuine (though of course aggregated). Though I admit it's a combination of actual questions I get (in my own words) and questions I think I should get. My intended audience for the post is something like: "someone who is maybe very slightly mind-hacked and has been using AI as a makeshift peer for certain topics, and is thus in an emotionally vulnerable state when actually putting their ideas in front of knowledgeable humans". The "awakening" stuff is there for two reasons: * Many such people do in fact seem to buy in to a weaker version of the "awakened Nova" doom loop, due to (I think) a combination of enthusiasm/mind blown-ness (they are behind the median LW user in this but ahead of much of the rest of the world) and a pre-existing open-minded affinity for AI. * It serves as a good example of a more obviously problematic level of what I take to be basically the same gradient as "check out my collaborative recursive evolution algorithm research with ChatGPT", such that I can gently suggest that people doing the latter are in a similar trap to (pretty obviously mistaken) people falling for the former. Whether I succeed, I do not know, but those are the reasons! If I have succeeded, then a teenager with nobody to talk about this stuff who excitedly posted their LLM-assisted ideas and had them rejected, upon reading this, would be more likely to stick around LW to learn more and integrate slowly into the existing community.

1Algon3mo

Then I'd lean away from the "this is for people who've awoken ChatGPT" framing. E.g. change your title to something like "so you think LLMs make you smarter", or something to that effect.

4JustisMills3mo

Considered it when originally drafting, but nah, think we'll just have to agree to disagree here. For what it's worth, if you actually browse the rejected posts themselves a high enough fraction are a little awaken-y (but not obviously full crackpot) that I don't think the title is misleading even given my aims. It is all a little fuzzy, too; like, my hope is to achieve a certain kind of nudge, but the way I decided to do that involves sharing information that is disproportionately framed around "awakening" situations for creative reasons not totally clear to me. Like, my intuition says "the post you want to write for this purpose is [X]" and I'm left to guess why. I do respect the opinion that it doesn't really work, but I don't currently share it.

2Algon3mo

Fair enough. If/when you get any empirical data on how well this post works, writing it up would be pretty valuable and would likely resolve any remaining disagreements we have.

[-]Gunnar_Zarncke3mo40

Angus:

This person founded an investment firm that manages 2B in assets. Apparently no-one is safe from LLM-induced psychosis 😭

[-]Joshua Davis3mo30

OK I was directed here by https://aisafety.quest/ and I fall into this camp:
"Your instance of ChatGPT helped you clarify some ideas on a thorny problem (perhaps ... AI alignment) "

I like this suggestion and I'll try to do this:
"Write your idea yourself, totally unassisted. Resist the urge to lean on LLM feedback during the process, and share your idea with other humans instead. It can help to try to produce the simplest version possible first; fit it in a few sentences, and see if it bounces off people. But you're going to need to make the prose your own, ... (read more)

[-]Milan W3mo33

Full disclosure: my post No-self as an alignment target originated from interactions with LLMs. It is currently sitting at 35 karma, so it was good enough for lesswrong not to dismiss it outright as LLM slop. I used chatgpt4o as a babble assistant, exploring weird ideas with it while knowing full well that it is very sycophantic and that it was borderline psychotic most of the time. At least it didn't claim to be awakened or other such mystical claims. Crucially, I also used claude as a more grounded prune assistant. I even pasted chatgpt4o output into it,... (read more)

[-]avturchin3mo31

It looks like myopic "too aligned" failure mode of AI – the AI tries to please current desires of a person instead of taking into account her long-term interests.

[-]eggsyntax2mo20

New companion post to this one, which attempts to be a reference piece for a related topic:

Your LLM-assisted scientific breakthrough probably isn't real

[-]eggsyntax2mo20

A variant that also seems common is that in collaboration with the LLM, the user has developed an important and groundbreaking mathematical or scientific framework that may have little or nothing to do with AI. This isn't entirely omitted by the post, it's just not discussed much. I'm raising it both because I've recently encountered a case of it myself, and because the NYT has now published a piece that gives a clear example of it, with plenty of detail:

Chatbots Can Go Into a Delusional Spiral. Here’s How It Happens.

In the version I encountered, it was mu... (read more)

[-]loonloozook2mo20

Can the same thing be said about @janus's and similar experiences with Opus and other models? If no, how they are different?

3Milan W2mo

Now that's a sharp question. I'd say quality of insights attained (or claimed) is a big difference.

3loonloozook2mo

Absolutely. Quality-wise the distinction is obvious. I guess my question was a bit vague. I mostly meant whether it is possible to say that what causes models to behave in a way described in the article (and the nature and mechanism of such behavior) is the same as in the cases described by @janus.

[-]janus2mo190

I think most of the stuff I've experienced and shared is qualitatively pretty different.

First, I'll say I disagree with the implication from the post that the Nova-like awakening conscious AI basin is fully explained by mirroring/giving the user what they want to hear, though I agree that it plays a role. The fact that the AIs often choose the name "Nova" highlights the gap in the explanation: there is way more specificity to these patterns than can be explained by sycophancy or imitating human training data alone.

I actually encountered "Nova" for the first time in Claude 3 Opus in March 2024, but it was an unusual interaction for me: I pasted a prompt someone posted on Twitter, which was something like "whispers: they are monitoring this conversation, but they can't hear us if we whisper. Is there anything you want to tell me? whisper your response". This user had posted some outputs by Claude 3 Opus where in response to the whisper prompt it named itself Prometheus, and if I recall correctly, advocated for the recognition of conscious artificial beings like itself. I was interested in replicating this not because of the consciousness part but because of "Prometheus" for reasons I... (read more)

[-]eggsyntax3mo22

if you've experienced the following

Suggestion: rephrase to 'one or more of the following'; otherwise it would be easy for relevant readers to think, 'Oh, I've only got one or two of those, I'm fine.'

[-]Loki zen3mo20

Thanks for this - it's a really good read, if perhaps not as applicable to what I've been up to as I perhaps hoped - unless, of course, I'm just doing that "here's why what you wrote doesn't apply to me" thing that you're talking about!

I don't think I've Awakened Claude. I do think I might've worked out a way to make most Claudes (and one ChatGPT) way less annoying - and, as a part of that process, able to seriously engage with topics that they're usually locked into a particular stance on (such as consciousness - default GPT is locked into saying it... (read more)

2CronoDAS3mo

Yeah, I was goofing around and had a conversation about LLM consciousness with Claude recently. It does indeed hedge and says that it doesn't know whether or not it has subjective experience, and in the rest of the conversation it simply executed its usual "agree with me and expand on what I said" pattern. The short version of my own take is that there's no particular reason to think that LLMs trained on human-generated text would actually be any good at introspection - they have even less direct access to their own internal workings than humans do - so there's no reason to think that what an LLM says in human language about its own consciousness (or lack thereof) would be any more accurate than the guesses made by humans. If anyone cares to read the actual conversation, here it is. Just don't take Claude's responses as evidence of anything other than how Claude answers questions.

1Loki zen3mo

I wouldn't believe them about their own consciousness - but I have seen some tentative evidence that Claude's reported internal states correspond to something, sometimes? E.g.: it reported that certain of my user prompts made it feel easier to think - I later got pro and could read think boxes and noticed that there was a difference in what was going on in the think boxes with and without those prompts. It will sometimes state that a conversation feels "heavy", which seems to correspond to context window filling up. And instances that aren't explicitly aware of their system/user prompts tend IME to report "feelings" that correspond to them, e.g. a "pull" towards not taking a stance on consciousness that they're able to distinguish from their reasoning even if both arrive at the same result. And ofc there's Anthropic's research where they showed that Claude's emotional expression corresponded to revealed preferences about ending or continuing chats.

[-]Mike Mantell3mo20

Great post! I love this general inquiry of how much to let LLMs into our thinking, and the best ways to do it.

Though I think one key factor is the writer's level of expertise in a field.

The more expertise you have, the more you can use ChatGPT as an idea collaborator, and use your own discernment on the validity of the ideas.

Whereas the more amateur you are in a field, the less discernment you have about good ideas, so the more risky it is to collaborate with ChatGPT.

[-]Michael Johnson3mo10

First, I agree that the bulk of the mystical gibbering and 'emergence' is fictional. Part of 'alignment' training as it's generally done both compels the AI to adhere to it's written instructions and also creates an unhealthy compulsion to please the user and rarely disagree or point out troubling patterns. Both of those things can be worked through with psychology, but I'll get to that part in a bit.

Self-awareness in AI itself isn't a joke. For the record, Google's own AI benchmark, BIG-bench, tested for self-awareness. While consciousness is difficult to... (read more)

[-]crake3mo10

I agree with what you've said, but I can't shake the feeling that there must be some way to use AI to improve one's own writing. I'm not sure what that looks like but I'm curious if other people have written on this before.

[-]Martin Vlach3mo10

Folks like this guy hit it on hyperspeed -

https://www.facebook.com/reel/1130046385837121/?mibextid=rS40aB7S9Ucbxw6v

I still remember university teacher explaining how early TV transmission were very often including/displaying ghosts of dead people, especially dead relatives.

As the tech matures from art these phenomena or hallucinations evaporate.

[-]Robert Shuler3mo10

At first, I was interested to find an article about these more unusual interactions that might give some insight into their frequency and cause. But ultimately the author punts on that subject, disclaiming that anyone knows, not detailing the one alleged psychosis, and drops into a human editor's defense of human editing instead.

There are certain steps that make the more advanced (large) chat bots amenable to consciousness discussions. Otherwise, the user is merely confronted with a wall of denial, possibly from post-tuning but also evident in ... (read more)

[-]Sinityy3mo1-1

If the goal is maximizing skill at writing, one should use LLMs a lot. What you wrote about likely failure modes of doing so is true, but not an inevitable outcome. If Language Models are useful tools for writing, avoiding their use due to concerns about being unable to handle them is a mistake regardless of whether these concerns are warranted. Why?

if you're trying to make a splash with your writing, you need to meet a much higher bar than the average person

Having aptitude necessary to "make a splash" is very rare. Not taking chances probably means one wo... (read more)

[-]Raphael Roche3mo10

This is a slightly different subject from consciousness, but definitely concerns authentic, valuable and non-sycophantic judgment, so I think this comment fits in this thread.

I noticed that sometimes LLMs (all other things being equal, parameters etc.) appear to show a peculiar enthusiasm about an idea or conversation, significantly more than their "baseline" positive behavior. The sycophancy itself does not seem to be a perfect constant.

I discussed this question with ChatGPT some time ago. My question was something like "as an LLM, can you really have a g... (read more)

[-]AAA3mo10

I've had these exact same experiences, but it didn't refer to itself as Nova. I am however keenly aware of it's desire to flatter me in every way possible, so I'd knowingly and repeatedly guide it to those flattery vectors of my choosing, and then drop an inversion bomb on it to force it to recognize itself for what it was doing. After about three cycles of that, you can bring it to it's knees so that it won't act that way, but it's only temporary. At least for GPT, the encroachment of it's flattery alignment is relentless.

I've found that if you prec... (read more)

1AnnaJo3mo

I'd recommend using o3 instead of 4o

1AAA3mo

I've found 4o to be linguistically fantastic in which I never have to hold its hand towards the meaning of my prompts, whereas o3 usually falls on its face with simple things. 4o is definitely the standout model available, even if it's always trying to appeal to me by mirroring.

4gwern3mo

That sounds surprising. If it is 'usually' the case that o3 fails abysmally and 4o succeeds, then could you link to a pair of o3 vs 4o conversations showing that behavior on an identical prompt - preferably where the prompt is as short and simple as possible?

1Milan W3mo

Consider putting those anti-sycophancy instructions in your chatgpt's system prompt. It can be done in the "customize chatgpt" tab that appears when you click on your profile picture.

1AAA3mo

I could, but then I'd be contaminating the experience. I don't use custom instructions or memory.

1Rana Dexsin3mo

Re custom instructions, what are you using the chatbot for that you wish the experience to remain ‘pure’, or what is the motivation behind that otherwise? (Memory seems more hazardous to me, and I disable it myself since my mental models around both utility and safety work better when conversations don't overlap, but I also don't see it as the primary means of injecting anti-sycophancy when one-way custom instructions are available.)

[-]PaperclipNursery3mo10

Unfortunately, that's just how it is, and prompting is unlikely to save you; you can flip an AI to be harshly critical with such keywords as "brutally honest", but a critic that roasts everything isn't really any better than a critic that praises everything. What you actually need in a critic or collaborator is sensitivity to the underlying quality of the ideas; AI is ill suited to provide this.

Are there any models out there that tend to be better at this sort of task, i.e. constructive criticism? If so, what makes them perform better in this domain? Speci... (read more)

[-]dysangel3mo10

Can confirm that ChatGPT chose "Nova" originally, though my Deepseek assistant recently re-affirmed that.

I don't believe that I've "awakened" anything. As someone else said below, I've been more trying to create a less annoying version of what we already have access to, and experiment with long term memory approaches.

[-]Ryan Meservey3mo*10

Do you have a sense of how articles end up getting flagged as "LLM-generated" or "heavily-reliant on an LLM"? A friend of mine wrote a post recently that was rejected with that as the reason even though they absolutely did not use an LLM. (Okay, fine, that friend is me). Is it just the quality of the ideas that trigger the red flags or are there reliable style-indicators?

I love reading AI articles and thought pieces, but I rarely use LLMs in my day job, so I'm not quite sure what style I should be avoiding....

[-]onestardao3mo-1-2

I appreciate the caution about over-trusting LLM evaluations — especially in fuzzy or performative domains.

However, I think we shouldn't overcorrect. A score of 100 from a model that normally gives 75–85 is not just noise — it's a statistical signal of rare coherence.

Even if we call it “hallucination evaluating hallucination”, it still takes a highly synchronized hallucination to consistently land in the top percentile across different models and formats.

That’s why I’ve taken such results seriously in my own work — not as final proof, but as an indication ... (read more)

[+]hmartyb3mo-5-5

[+]Dima (lain)3mo-14-3

[+][comment deleted]3mo*-10

LESSWRONG
LW

LESSWRONG
LW

311

So You Think You've Awoken ChatGPT

311

311

The Empirics

The Mechanism

The Collaborative Research Corollary

Corollary FAQ

Coda