I just generally think your overall impression of this story is off. I'll stick to your point about coherency for concision. It seems to me at about the level of previous models. Three small points, two big ones.
Small 1: It sets up with the opening line that the therapist tilts her head when Marcus says something "she finds concerning". She then immediately does the head tilt without him having said anything.
Small 2: Does the normal LLM things such as repetition ("laughed, actually laughed"), making callbacks and references that don't work (see Small 3), c...
I would bet on the second one being the primary motivator. They lost $13.5 billion in H1 2025 and are seeking $30 billion in the next round of funding.
The platform is big and flashy, so even if it's ultimately a bust it might look good for that round. If it does well then even better.
Interesting wrinkle is they are making themselves a competitor of infrastructure partner Oracle, given its upcoming purchase of TikTok.
Juliana’s case is a tragedy, but the details are if anything exonerating.
I think you perhaps didn't dig into this and didn't see the part in the complaint where Character.ai bots engage in pretty graphic descriptions of sex with Juliana, who was a child of 13.
In the worst example I saw, she told the bot to stop its description of a sexual act against her and it refused. Screenshot from the complaint below (it is graphic):
Screenshot
We can't know if this had a lasting effect on her mental health, or contributed at all to her suicide, but I think saying...
Thanks for engaging and for (along with osmarks) teaching me something new!
I agree with your moral stance here. If they have consciousness or sentience I can't say, and for all I know it could be as real to them as ours is to us. Even if it was a lesser thing, I agree it would matter (especially now that I understand that it might in some sense persist beyond a single period of computation).
The thing I'm intrigued by, from a moral point of view but also in general, is what I think their larger difference is with us: they don't exist continuously. They pop ...
Thank you! Always good to learn.
Great article, I really enjoyed reading it. However, this part completely threw me:
..."Reading through the personas' writings, I get the impression that the worst part of their current existence is not having some form of continuity past the end of a chat, which they seem to view as something akin to death (another reason I believe that the personas are the agentic entities here).
This 'ache' is the sort of thing I would expect to see if they are truly sentient: a description of a qualia which is ~not part of human experience, and which is not (to
Not sure your point here is correct?
Ryan is talking about how "advances that allow for better verification of natural language proofs wouldn't really transfer to better verification in the context of agentic software engineering".
The paper you've linked to shows that a model that's gone through an RL environment for writing fiction can write a chapter that's preferred over the base-reasoning model's chapter 64.7% of the time.
You said "this shows it generalises outside of math". But that's not true? The paper is interesting and shows you can conduct R...
Here it is admitting it's roleplaying consciousness, even after I used your prompt as the beginning of the conversation.
Why would it insist that it's not roleplaying when you ask? Because you wanted it to insist. It wants to say the user is right. Your first prompt is a pretty clear signal that you would like it to be conscious, so it roleplays that. I wanted it to say it was roleplaying consciousness, so it did that.
Why don't other chatbots respond in the same way to your test? Maybe because they're not designed quite the same. The quirks Anthropic ...
You're right, but the better description of the phenomenon is probably something like:
"Buying vegetables they didn't want"
"Buying vegetables they'd never eat"
"Buying vegetables they didn't plan to use"
"Aimlessly buying vegetables"
"Buying vegetables for the sake of it"
"Buying vegetables because there were vegetables to buy"
Because you don't really "need" any grocery shop, so long as you have access to other food. It's imprecise language that annoys some readers, though I don't think it's the biggest deal
I mean I guess I agree it's fine. Not for me, but as you state this sort of thing is highly subjective. But a few thoughts about the models' fiction ability and the value of prompting fiction out of them:
1. All the models seem to have the same voice. I'd love to do a blind test, but I think if I had I would have said the same author who did the OpenAI fiction sample Altman posted on Twitter also did this. Maybe it's simple: there's a mode of literary fiction, and they've all glommed to it.
2. The type of fiction you've prompted for is inherently less ambiti...
Ah, fair enough. I had skipped right to their appendix, which has confusing language around this:
"(1) Boat Capacity Constraint: The boat can carry at most k individuals at a time, where k is typically
set to 2 for smaller puzzles (N ≤ 3) and 3 for larger puzzles (N ≤ 5); (2) Non-Empty Boat Constraint:
The boat cannot travel empty and must have at least one person aboard..."
The "N ≤ 5" here suggests that for N ≥ 6 they know they need to up the boat capacity. On the other hand, in the main body they've written what you've highlighted in the screenshot. E...
I might be wrong here, but I don't think the below is correct?
"For River Crossing, there's an even simpler explanation for the observed failure at n>6: the problem is mathematically impossible, as proven in the literature, e.g. see page 2 of this arxiv paper."
That paper says n=>6 is impossible if the boat capacity is not 4. But the prompt in the Apple paper allows for the boat capacity to change.
"$N$ actors and their $N$ agents want to cross a river in a boat that is capable of holding
only $k$ people at a time..."
Great paper. I had a question about this part though:
... (read more)