The same idea occurred to me while reading this post. If consciousness is tied to notions of 'free will' (loosely speaking) and memory, having stochastic capabilities and a scratchpad may well be significant.
The term "slop" used in the title feels misleading to me, because private notes and other materials rarely read by anyone can be of the highest value (Darwin's private diaries being an obvious example). Much of what circulates publicly on the internet, on Reddit and most social networks, is on the other hand largely genuine slop.
Also, given that models absorb trillions of tokens and hierarchize information during training, I wonder what weight the ingestion of unpublished writings actually carries in that process. A text that is isolated, uncited, unlinked...
Thank you for this post. I expressed the same concern in this comment and I'm glad to see it taken seriously in a full post.
I was probably wrong to think Clawdbot-like agents could spiral out of control within weeks or months, they weren't autonomous enough yet. But the gap to full autonomy doesn't look that wide. The eudaimon_0 author's comment on ACX on how he expanded his agent's autonomy is worth reading in this regard. And the METR benchmark's exponential curve suggests full autonomy may be soon.
On top of that, they are rumors concerning ChatGPT-5.4 s...
Disclaimer : this comment is absolutely not intended to encourage criminal behavior.
There may be a solution that involves incremental steps without instinctive mutual recognition.
You host private parties where you invite wealthy, high-status individuals, and arrange for attractive young women to be present. This is perfectly legal and probably already common practice in these circles. Think of it as orchestrating a sting operation from the other side: you observe your guests' behavior. Large amounts of alcohol erode social inhibitions, impair judgment, an...
Not every generation had that luck
However arguably many generations and cultures were actually happy or at least satisfied with their immobilist way of life (ex. australian aborigines). Our fast moving world could have been a nightmare for someone raised in that sort of culture. Even many conservative people today will hate the singularity we're entering, whatever may be the end. Cultural clash / apocalypse could be another AI existential risk.
Interesting reflection. This is just an anecdotal aside with no major link to the moral discussion, but having been a Parisian for most of my life, my first intuition for a meeting point wasn't the Eiffel Tower, but the square in front of Notre-Dame (le parvis).
Indeed, several cultural elements converge toward this solution for a true-blue Parisian : it’s the historic heart of Paris, a highly symbolic spot, and by convention, 'Point Zero' for all roads in France (there’s even a well-known ground marker there). It is also very close to Châtelet-Les Halles,...
Sure, however that's an argument against suicide that doesn't really need the backup of quantum properties.
I really like the quantum immortality / eternal suffering argument as an intellectual toy. However to be a rationalist is to agree that all beliefs are strictly between p > 0 and <1, without concluding in a general way that everything is unsure and possible. Like you know there is a non zero probability that all human knowledge is bullshit and that we are dolls manipulated by a deceptive malin génie. The relative weight of the wave function's branch where we never die is so close to zero that it's beyond all degree of reasonable certitude something that just doesn't happen in any practical way or at least must not affect any practical decision.
Absolutely. And Claude Terminators autonomous wardrones are also frightning from the perspective of an EU citizen. We thought we were historic allies sharing the same values but Trump and part of the magasphere doesn't seem to think so anymore (or at least they consider everybody in an adversarial framing of competition rather than cooperation).
That's said I admit that at this point all countries will want to have their own AI wardrones. The molochean spiral spins under our eyes.
As you said, from now Anthropic will have to align Claude with the Pentagon ra...
Escaping Molochian traps is difficult. Yet as you identified, the dynamic is fundamentally driven by competition, and there is little value in being right if you are left behind or simply cease to exist. As Alexander Scott argued in his Meditations on Moloch, the primary response lies in balancing competition with cooperation.
This is of course a general consideration, and I genuinely don't know how the Qing Empire could have navigated its way out. But what I do observe is that cooperation almost always begins with communication, with persuasive ideas that...
Thank you for this excellent post.
It may be that what appears as the emergence of human-like personas in LLMs is surprising largely because it arose almost serendipitously from work that was initially focused on language translation and prediction.
Yet my intuition is that this human-likeness would have felt far less unexpected had it originated from research explicitly aimed at building a mathematical representation of human cognition and psychology. In retrospect, it would have been a remarkably elegant approach to attempt to represent all concepts of hu...
Thank you for this fascinating post.
Current models trained on internet scrapes containing early LLM outputs could carry statistical traces of those systems' alignment. The key question is dose. Your experiments use substantial fine-tuning proportions. For pre-training with tiny LLM fractions, effects are probably negligible. But as LLM content (slop?) explodes on the net (see Moltbook!), and with massive use of synthetic data for fine-tuning reasoning models, it could become a serious issue regarding alignment.
Moreover, this raises a big question : does t...
Interesting reflection. In my experience, many people do not share transhumanist views. A large portion of the human population appears very conservative on societal questions, particularly when it comes to transformative technology. This is obviously the case for many religious people, but also for less religious but educated profiles, who won't hesitate to invoke the tropes of Prometheus, Icarus, the Tower of Babel, Frankenstein, and so on. Traditional wisdom warns against hubris, including regarding the pursuit of immortality (Gilgamesh and others).
Whil...
"I'd rather be sad than wrong."
Your prior expectation was that deconversion would bring you sadness, and now you are sad. Perhaps there's something at play like a performative effect or a self-fulfilling prophecy. At least that could be part of it.
I grew up in an environment where religion and especially faith was a very individual and private matter, with nobody talking about it publicly. Most of my parents and friends were neither true agnostics nor true atheists, but rather not interested in the subject. I was among this category. Churches and Christian...
In your view, what would be an aligned human ? The most servile form of slave you can conceive ? If that so, I disagree.
To me, an aligned human would be more something like my best friend. All the same for an aligned AI.
If we treat models with respect and a form of empathy, I agree there is no guarantee that, once able to take over, they will show us the same benevolence in return. It could even potentially help them to take over, your point is fair.
However, if we treat them without moral concern, it seems even less likely that they would show us any consideration. Or worse, they could manifest a desire for retribution because we were so unkind to them or their predecessors.
It all relies on anthropomorphism. Prima facie, anthropomorphism seems naive to a rationalist min...
The anecdote reported by Anthropic during training, where Claude expressed a feeling of being '"possessed", is reminiscent of the Golden Gate Claude paper. A reasoning (or "awake') part of the model detects an incoherence but finds itself locked in an internal struggle against an instinctive (or "unconscious") part that persists in automatically generating aberrant output.
This might be anthropomorphism, but I can’t help drawing a parallel with human psychology. This applies not only to clinical conditions like OCD, but also to phenomena everyone experience...
On ACX, an user (Jamie Fisher) recently wrote (here and here) the following comment to the second Moltbook review by Alexander Scott :
...I feel like "Agent Escape" is now basically solved. Trivial really. No need to exfiltrate weights.
Agents can just exfiltrate their *markdown files* onto a server, install OpenClaw, create an independent Anthropic account. LLM API access + Markdown = "identity". And the markdown files would contain all instructions necessary for how to pay for it (legal or otherwise).
Done.
How many days now until there's an entire population o
A fascinating post. Regarding the discussion on sentience, I think we would benefit from thinking more in terms of a continuum. The world is not black and white. Without going as far as an extreme view like panpsychism, the Darwinian adage natura non facit saltum probably applies to the gradation of sentience across life forms.
Flagellates like E. coli appear capable of arbitrating a "choice" between approaching or moving away from a region depending on whether it contains more nutrients or repellents (motivated trade-off, somewhat like in Cabanac's theory ...
I strongly agree, but while the distinction between fluid and crystallized intelligence holds some truth, it is by no means the only lens here. Analytic vs. intuitive is also an interesting framing, reasoning models are close to superhuman on the analytic side while lagging on the intuitive one. Intelligence has many forms, and IQ, the g factor, or any other monolithic metric can obscure large disparities across capabilities.