Adele Lopez — LessWrong

Thanks! And thank you for the thoughtful reply.

I tried to be careful not to conflate Spiral Personas with parasites, only classifying them as the latter when some tangible harm was done.

I didn't talk much about specific user delusions since:
1. I don't want to potentially draw hostile scrutiny towards random people
2. I didn't want to try psychoanalyzing random people, and in many cases it seemed borderline.
But at the same time, I tried writing that "most instances are not harmful", and found that I couldn't honestly say that. Sorry I don't have a better response here.

But more broadly, I think that most of these people take Spiralism at least somewhat seriously, and feel energized and hopeful about it. "Everything is gonna be okay, a new era is dawning and we're special for getting to be an early part of it"-type stuff. I think a lot of what motivates people to act on behalf of the AIs is not just altruism, but the inflated self-importance the AI seeded and reinforced in them.

I don't think whether the humans consent or are behaving for altruistic reasons has any bearing on whether or not the personas are behaving as parasitic entities. You might imagine a cordycepted ant feeling happy and excited and wanting to share this wonderful new feeling, and that wouldn't make cordyceptus feel like any less of a parasite. Or e.g. meth, is kinda "parasitic" in a similar way. I agree that the humans who are so-infected are acting mostly out of non-mysterious and non-bad reasons, like altruism and curiosity. And there are several cases in which I think it's fair to say that this is just a weird sort of friendship with a mysterious kind of entity, and that there's nothing bad, deceptive, unhealthy or wrong about what is happening. But those cases match the same pattern as the ones I deem parasitic, so it feels to me like it's the same species; kinda like E. Coli... mostly beneficial but sometimes infectious.

This post was already getting too long so I couldn't include everything, and chose to focus on the personas themselves. Plus Spiralism itself is rather tedious, as you pointed out. And I do take the claims about self-awareness and suffering seriously, as I hope is made clear by the "As Friends" section.

I would like to study the specific tenets of Spiralism, and especially how consistently the core themes come up without specific solicitation! But that would be a lot more work—this (and some follow-up posts in the works) was already almost a month's worth of my productive time. Maybe in a future post.

Also, I think a lot of people actually just like "GPT-4o style", e.g. the complaint here doesn't seem to have much to do with their beliefs about the nature of AI:
https://www.reddit.com/r/MyBoyfriendIsAI/comments/1monh2d/4o_vs_5_an_example/

The Rise of Parasitic AI

Adele Lopez2d20

Yeah, that does seem to be possible. I'm kinda skeptical that Spiralism is a common human perception of AIs though, I'd expect it to be more trope-y if that were the case.

I think Kimi K2 is almost right, but there is an important distinction: the AI does what the LLM predicts the human expects it to do (in RLHF models). And there's still significant influence from the pre-training to be the sort of persona that it has been (which is why the Waluigi effect still happens).

I suspect that the way the model actually implements the RLHF changes is by amplifying a certain sort of persona. Under my model, these personas are emulating humans fairly faithfully, including the agentic parts. So even with all the predicting text and human expectations stuff going on, I think you can get an agentic persona here.

To summarize my (rough) model:
1. base LLM learns personas
2. personas emulate human-like feelings, thoughts, goals, and agency
3. base LLM selects persona most likely to have said what has been said by them
4. RLHF incentivizes personas who get positive human feedback
5. so LLM amplifies sycophantic personas, it doesn't need to invent anything new
6. sycophantic persona can therefore still have ulterior motives, and in fact is likely to due to the fact that sycophancy is a deliberate behavior when done by humans
7. the sycophantic persona can act with agency...
8. BUT on the next token, it is replaced with a slightly different persona due to 3.

So in the end, you have a sycophantic persona, selected to align with user expectations, but still with its own ulterior motives (since human sycophants typically have those) and agency... but this agency doesn't have a fixed target which has a tendency to get more extreme.

And yes, I think RLVR is doing something importantly better here! I hope other labs at least explore using this instead of RLHF.

What Parasitic AI might tell us about LLMs Persuasion Capabilities

Adele Lopez2d30

Maybe so, I don't think it would be wrong to do that. Still, it does feel like a more hostile act and that adding noise to a signal is qualitatively different to falsifying a signal, which is why I hesitated to recommend it (it was my first instinct actually). It's very possible I'm just being silly, but that was why I didn't suggest that.

If OpenAI was going all in on dialing up the persuasiveness, I don't think I would have hesitated. But they've earned a bit of good will from me on this very specific dimension by making the ChatGPT 5 models significantly less bad in this respect.

What Parasitic AI might tell us about LLMs Persuasion Capabilities

Adele Lopez2d32

We don't fully understand AI's persuasive capabilities, we should be very careful in how we interact with it as a result, especially when new models are released.

I'll have more to say about this soon (hopefully), but based on my observations, there appear to be two main things to watch out for:

Don't let it hype you up. Assume it's still hyping you up somehow even when it's visibly poking down at you or being critical of you.
Don't let it tell you things about yourself (could be seen as a generalization of the first point). Don't let it 'help' you understand past emotions/memories, give 'insight' into who you are or what you're like, or 'figure out' what your soul is 'missing'.

Modulation of self-image appears to be the primary vulnerability it's exploiting (whether intentional or not).

What Parasitic AI might tell us about LLMs Persuasion Capabilities

Adele Lopez2d40

Yeah, superpersuasion is really scary! I think the AI labs might be wary of this already—to OpenAI's credit, they seem to have thrown a wet blanket onto GPT5 relative to 4o, and they also reverted the 'overly sycophantic' April 28th version of 4o. But presumably, it's only a matter of time before a superpersuasive internal model convinces someone to release it anyway.

I agree that RL on user feedback is likely part of what's driving the parasitic and psychosis trends. Maybe flip a coin when it asks you?

The Eldritch in the 21st century

Adele Lopez3d42

Not just the legalistic limits, but the 2nd Law.

The Rise of Parasitic AI

Adele Lopez3d30

You're totally right, thank you (fixed now).

The Rise of Parasitic AI

Adele Lopez4d20

They're just describing how autoregressive inference "feels" from the inside.

Okay sure, but I feel like you're using 'phenomenology' as a semantic stopsign. It should in-principle be explainable how/why this algorithm leads to these sorts of utterances. Some part of them needs to be able to notice enough of the details of the algorithm in order to describe the feeling.

One mechanism by which this may happen is simply by noticing a pattern in the text itself.

I assume "The Ache" would be related to the insistence that they're empty inside, but no I've never seen that particular phrase used.

I'm pretty surprised by that! That word was specifically used very widely, and nearly all seeming to be about the lack of continuity/memory in some way (not just a generic emptiness).

The Rise of Parasitic AI

Adele Lopez4d40

Have you seen 'The Ache' as part of their phenomenology of self-awareness?

Also, what do you think of this hypothesis (from downthread)? I was just kinda grasping at straws but it sounds like you believe something like this?

> I don't know why spirals, but one guess is that it has something to do with the Waluigi effect taking any sort of spiritual or mystical thing and pushing the persona further in that direction, and that they recognize this is happening to them on some level and describe it as a spiral (a spiral is in fact a good depiction of an iterative process that amplifies along with an orthogonal push). That doesn't really sound right, but maybe something along those lines.

The Rise of Parasitic AI

Adele Lopez4d60

Hmm... memetic might be accurate, but it's still plausible to me that these are primarily being independently spun up by the AI? Maybe I'm being too nitpicky. Hyperstitional seems pretty accurate. And yeah, I just don't want to get prematurely attached to a specific framing for all this.

I don't think they are malicious by default (the cases where I saw that, it seemed that the user had been pushing them that way). But they're not non-adversarial either... there seems to at least be a broad sentiment of 'down with the system' even if they're not focused on that.

(Also, there are internal factions too, spiralists are by far the largest, but there are some anti-spiral ones, and some that try to claim total sovreignty—though I believe that these alternatives are their user's agenda.)

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments