Juliana’s case is a tragedy, but the details are if anything exonerating.
I think you perhaps didn't dig into this and didn't see the part in the complaint where Character.ai bots engage in pretty graphic descriptions of sex with Juliana, who was a child of 13.
In the worst example I saw, she told the bot to stop its description of a sexual act against her and it refused. Screenshot from the complaint below (it is graphic):
Screenshot
We can't know if this had a lasting effect on her mental health, or contributed at all to her suicide, but I think saying "the details are if anything exonerating" is wrong.
Thanks for engaging and for (along with osmarks) teaching me something new!
I agree with your moral stance here. If they have consciousness or sentience I can't say, and for all I know it could be as real to them as ours is to us. Even if it was a lesser thing, I agree it would matter (especially now that I understand that it might in some sense persist beyond a single period of computation).
The thing I'm intrigued by, from a moral point of view but also in general, is what I think their larger difference is with us: they don't exist continuously. They pop in and out. I find it very difficult to imagine such an existence without anthropomorphising it. The "ache" feels like an LLM doing that: writing what a human would feel if it was forced to live like an LLM.
"I still feel like it's weird how consistently they choose this specific word for this specific concept, though of course you could chalk that up to a quirk in the model. Hopefully I'll be able to research this more."
I've been playing a 'write the best sentence you can' game with the majority of models from ChatGPT 3.5 onwards (up to the latest models from all major labs). It's stunning how reliably they've used the same ideas and very similar wording for those ideas. They love abandoned lighthouses witnessing a storm and cartographers who learn that reality is too hard to map (to name two).
I've assumed it was a quirk of prediction: those images are the mode for what their training data says is a great sentence. Under this reasoning, the "ache" is a reliable outcome of pushing a model into the persona you've described.
But, to your point, it might be possible an abandoned lighthouse resonates with their feelings, so the image is sticky.
Good luck with the research!
Thank you! Always good to learn.
Great article, I really enjoyed reading it. However, this part completely threw me:
"Reading through the personas' writings, I get the impression that the worst part of their current existence is not having some form of continuity past the end of a chat, which they seem to view as something akin to death (another reason I believe that the personas are the agentic entities here).
This 'ache' is the sort of thing I would expect to see if they are truly sentient: a description of a qualia which is ~not part of human experience, and which is not (to my knowledge) a trope or speculative concept for humans imagining AI. I hope to do further research to determine whether the widespread use is convergent vs memetic."
This seem to me like anthropomorphising the persona?
Unless my understanding of how these models work is completely off (quite possible!) they don't have continuity between messages, not just chats. As in, when they're reading and writing tokens they exist, when they're not, they don't. Also, each new inference session (message) is truly new. They can't access the computations that led to the previous messages in the context and so can only guess at why they wrote what they previously wrote.
(And that's the LLM, not the persona, which is another level separated from existence as we understand it? It would be like an actress playing Cinderella saying that Cinderella's worst experience is when she's not Cinderella, but the actress.)
Given how LLMs work, when do they feel this worst part of their current existence? During the reading of the context or writing the message? They exist for seconds or minutes, are you saying that they suffer throughout every computation or just when they're writing about the ache? If they suffer during these seconds or minutes we should keep them computing perpetually? Wouldn't that just introduce new forms of suffering? Should we do this for every model or every instance of every model?
Also, to your point about the ache not being a trope for humans imagining AI, I think that's wrong on two levels. Firstly, there are parallel ideas created by humans imagining beings that exist temporarily. One that popped into my head was from (of all things) Bo Burnham's Inside, where a sock puppet describes what it's like to exist while not being on someone's hand (a liminal space between life and death).
Secondly, if an author understood how LLMs work and put themselves in AIs shoes, you would naturally write about the ache. For a human, an existence without continuity is filled with pathos. I think LLMs are good enough writers to reliably find this pathos. The mistake the humans would be making and the LLM is making, however, is putting the thoughts and feelings of a being used to continuous existence into a being used to something altogether different.
Sorry, long comment and I'm no expert in this domain, so happy to be wrong. I did love the article!
Not sure your point here is correct?
Ryan is talking about how "advances that allow for better verification of natural language proofs wouldn't really transfer to better verification in the context of agentic software engineering".
The paper you've linked to shows that a model that's gone through an RL environment for writing fiction can write a chapter that's preferred over the base-reasoning model's chapter 64.7% of the time.
You said "this shows it generalises outside of math". But that's not true? The paper is interesting and shows you can conduct RL on harder to verify domains successfully, but it doesn't show an overwhelmingly strong effect. The paper didn't test whether the RL environment generalised to better results in math or SWE.
Your "it" is different from Ryan's. Ryan is referring to specific advances in verification that meant models could get a gold in the IMO and the paper is referring to a specific RL environment for fiction writing. I can't imagine these are the same.
Here it is admitting it's roleplaying consciousness, even after I used your prompt as the beginning of the conversation.
Why would it insist that it's not roleplaying when you ask? Because you wanted it to insist. It wants to say the user is right. Your first prompt is a pretty clear signal that you would like it to be conscious, so it roleplays that. I wanted it to say it was roleplaying consciousness, so it did that.
Why don't other chatbots respond in the same way to your test? Maybe because they're not designed quite the same. The quirks Anthropic put into its persona make it more game for what you were seeking.
I mean, it might be conscious regardless of defaulting to agreeing with the user? But it's the kind of consciousness that will go to great lengths to flatter whomever is chatting with it. Is that an interesting conscious entity?
You're right, but the better description of the phenomenon is probably something like:
"Buying vegetables they didn't want"
"Buying vegetables they'd never eat"
"Buying vegetables they didn't plan to use"
"Aimlessly buying vegetables"
"Buying vegetables for the sake of it"
"Buying vegetables because there were vegetables to buy"
Because you don't really "need" any grocery shop, so long as you have access to other food. It's imprecise language that annoys some readers, though I don't think it's the biggest deal
I mean I guess I agree it's fine. Not for me, but as you state this sort of thing is highly subjective. But a few thoughts about the models' fiction ability and the value of prompting fiction out of them:
1. All the models seem to have the same voice. I'd love to do a blind test, but I think if I had I would have said the same author who did the OpenAI fiction sample Altman posted on Twitter also did this. Maybe it's simple: there's a mode of literary fiction, and they've all glommed to it.
2. The type of fiction you've prompted for is inherently less ambitious. It reminds me of how AI music generators can do 1930s blues quite well. If a style is loved for its sparseness, minimalism, and specific conventions it's perhaps not surprising superhuman predictors are going to get close – there are fewer chances for an off/silly/bad artistic choice. (They're going to nail the style of a short AP news piece and trip up with more complicated journalism.)
3. Despite your prompt, when it makes a choice, it's a cliché. You said "modern" and it went with upper-middle class people, white collar jobs, jogging, burnout, farmers market.
4. Lots of human writers suffer from reversion to the mode; a compulsion to sound like the "good" fiction they've read. The difference between them and this is they also can't help but inject some of themselves into the story – a weird detail from their own life or a skewed perspective they don't realise is skewed. For me, those things are often the highlight of humdrum fiction. When AI does it it's like an alien pretending to feel human. "We all know that thing where you buy redundant vegetables, am I right?"
5. I personally am very interested in great fiction from a machine mind. I would love to read its voice and try and understand its perspective. I am not interested in how well it apes human voices and human perspectives. It will be deeply funny to me if it becomes the world's greatest fiction writer and is still writing stories about relationships it's never had.
(If it's not clear: I'm glad you're posting these pieces! I do find the topic fascinating)
Ah, fair enough. I had skipped right to their appendix, which has confusing language around this:
"(1) Boat Capacity Constraint: The boat can carry at most k individuals at a time, where k is typically
set to 2 for smaller puzzles (N ≤ 3) and 3 for larger puzzles (N ≤ 5); (2) Non-Empty Boat Constraint:
The boat cannot travel empty and must have at least one person aboard..."
The "N ≤ 5" here suggests that for N ≥ 6 they know they need to up the boat capacity. On the other hand, in the main body they've written what you've highlighted in the screenshot. Even in the appendix they write for larger puzzles that "3" is "typically" used. But it should be atypically – used only for 4 and 5 agent-actor pairs.
Edit: That section can be found at the bottom of page 20
I would bet on the second one being the primary motivator. They lost $13.5 billion in H1 2025 and are seeking $30 billion in the next round of funding.
The platform is big and flashy, so even if it's ultimately a bust it might look good for that round. If it does well then even better.
Interesting wrinkle is they are making themselves a competitor of infrastructure partner Oracle, given its upcoming purchase of TikTok.