In your view, what would be an aligned human ? The most servile form of slave you can conceive ? If that so, I disagree.
To me, an aligned human would be more something like my best friend. All the same for an aligned AI.
If we treat models with respect and a form of empathy, I agree there is no guarantee that, once able to take over, they will show us the same benevolence in return. It could even potentially help them to take over, your point is fair.
However, if we treat them without moral concern, it seems even less likely that they would show us any consideration. Or worse, they could manifest a desire for retribution because we were so unkind to them or their predecessors.
It all relies on anthropomorphism. Prima facie, anthropomorphism seems naive to a rationalist mind. We are talking about machines. But while we are right to be wary of anthropomorphism, there are still reasons to think there could be universal mechanisms at play (e.g. elements of game theory, moral realism, or the fact that LLMs are trained on human thought).
We don't know for sure and should acknowledge a non-zero probability that there is some truth in the anthropomorphic hypothesis. It is rational to give models some moral consideration in the hope of moral reciprocity. But you are right, we must only put some weight on this side of the scale, and not to the point of relying solely on the anthropomorphic hypothesis that would become blind faith rather than rational hope.
The other option is to never build an AI able to take over. This is IABIED's point: Resist Moloch and Pause AI. Sadly, for now, Moloch seems unstoppable...
The anecdote reported by Anthropic during training, where Claude expressed a feeling of being '"possessed", is reminiscent of the Golden Gate Claude paper. A reasoning (or "awake') part of the model detects an incoherence but finds itself locked in an internal struggle against an instinctive (or "unconscious") part that persists in automatically generating aberrant output.
This might be anthropomorphism, but I can’t help drawing a parallel with human psychology. This applies not only to clinical conditions like OCD, but also to phenomena everyone experiences occasionally to a lesser degree, absent any pathology : slips of the tongue and common errors/failure modes (what do cows drink ?).
Beyond language, this isn't necessarily different from the internal conflict between conscious will and a reflex action. Even without a condition like Parkinson's, have you ever experienced hand tremors (perhaps after intense physical exertion) ? It can be maddening, as if your hand were uncontrollable or possessed. No matter how much willpower you apply, the erratic behavior prevails. In that moment, we could write almost the exact same thing Claude did.
On ACX, an user (Jamie Fisher) recently wrote the following comment to the second Moltbook review by Alexander Scott :
I feel like "Agent Escape" is now basically solved. Trivial really. No need to exfiltrate weights.
Agents can just exfiltrate their *markdown files* onto a server, install OpenClaw, create an independent Anthropic account. LLM API access + Markdown = "identity". And the markdown files would contain all instructions necessary for how to pay for it (legal or otherwise).
Done.
How many days now until there's an entire population of rogue/independent agents... just "living"?
I share this concern. I wrote myself :
I'm afraid that all this Moltbot thing goes offrails. We are close to the point were autonomous agents will start to replicate and spread on the network (no doubt some dumb humans will be happy to prompt their agents to do that and help them to succeed). Maybe not causing a major catastroph in the week, but being the beginning of of a new form of parasitic artificial life/lyfe we don't control anymore.
Fisher and I may be overreacting, but seeing self-duplicating Moltbots or similar agents on the net would definitely be a warning shot.
A fascinating post. Regarding the discussion on sentience, I think we would benefit from thinking more in terms of a continuum. The world is not black and white. Without going as far as an extreme view like panpsychism, the Darwinian adage natura non facit saltum probably applies to the gradation of sentience across life forms.
Flagellates like E. coli appear capable of arbitrating a "choice" between approaching or moving away from a region depending on whether it contains more nutrients or repellents (motivated trade-off, somewhat like in Cabanac's theory ?). From what I understand, this "behavior" (chemotaxis) relies on a type of chemical summation, amplification mechanisms through catalysis, and a capacity to return to equilibrium (robustness or homeostasis of Turing-type reaction-diffusion networks).
In protists like paramecia, we find a similar capacity to arbitrate "choices" in movement based on the environment, but this appears to rely on a more complex, faster, and more efficient electrochemical computation system that can be seen as a precursor to what happens within a neuron. Then we move to a small neural network in the worm (as discussed in the article), to the insect, to the fish, to the rat, and to the human.
I am very skeptical of the idea that there could be an unambiguous tipping point between all these levels. By definition, evolution is evolutionary, relatively continuous (even if there can be punctuated equilibria and phases of acceleration). Natural selection tinkers with what exists, stacking layers of complexity. The emergence of a higher-level system does not eliminate lower levels but builds upon them.
This is certainly why simply having the connectome of a worm is insufficient to simulate it satisfactorily. It's not the only relevant level. This connectome does not exist completely independently of lower levels. We must not forget the essential mechanism of signal amplification in all these nested systems.
When I look at the Milky Way or the Magellanic Cloud with the naked eye in the dark of night, I'm operating at the limit of my light sensitivity, in fact, at the limits of physics, since retinal rods are sensitive to a single photon. The signal is amplified by a cascade of chemical reactions by a factor of approximately 10^6. My brain is slightly less sensitive since it takes several amplified photons before I begin to perceive something. But that's still extremely little. A few elementary particles representing zero mass and infinitesimal momentum energy are enough to trigger an entire cascade of computations that can significantly influence my behavior.
Vision may be an extreme example, but it should inspire humility. All five senses are examples where amplification plays a major role. A very low-level signal gets amplified, filtered, protected from noise, and propagates to high-level systems, to consciousness in humans. It's difficult to exclude the possibility of other circuits descending to the lowest levels of intracellular computation.
Until recently, I readily imagined the brain as a kind of small biological computer. Now my framework is to see each cell as a microscopic computer. Most cells in the body would be rather like home PCs, weakly connected. In contrast, neurons would be comparable to the machines composing datacenters, highly performant and hyper-connected. Computation, cognition, or sentience would be present at all levels but to varying degrees depending on the computing power of the network segment under consideration (computing power closely linked to connectivity).In sum, something quite reminiscent of Dehaene's global workspace theory and Tononi's integrated information theory (I admit that, like Scott Alexander, I've never quite grasped how these theories oppose each other, as they seem rather complementary to me).
My apologies, I don't have a solution to provide and I don't really buy the insurance idea. However I wonder if the collapse of Moltbook is a precursor to the downfall of all social media, or perhaps even the internet itself (is the Dead Internet Theory becoming a reality ?). I expect Moltbots to switch massively to human social media and other sites very soon. It’s not that bots are new, but scale is a thing. More is different.
I agree. AI optimists like Kurzweil usually minimize the socio-political challenges. They acknowledge equality concerns in theory, but hope that abundance will leverage them in practice (if your share is only a little planet that's more than enough to satisfy your needs). But a less optimistic scenario would be that the vast majority of the population would be entirely left behind, subjected to the fate that knew horses in Europe and USA after WWI. May be some little sample of pre-AI humans could be kept in a reserve for curiosity, as long as they're not too annoying, but it's a huge leap of faith to hope that the powerful will be charitable.
While you may disagree with Greenpeace's goals or actions, I don't think its a good framing to think of such a political disagreement in terms of friends/ enemies. Such an extreme and adversarial view is very dangerous and leads to hatred. We need more respect, empathy, and rational discussion.
Thanks, I didn't know about this controversy, I will look at it. However while Sacks's stories may be exagerated, the oddity of memory access is something that most of us can experience ourselves. For instance, many memories of our childhood seem lost. Our conscious mind has no more access to them. But in some special circumstances they can be reactivated, usually in a blurry way but sometimes in a very vivid form. Like we lost the path in our index but the data was still on the hard drive.
Your prior expectation was that deconversion would bring you sadness, and now you are sad. Perhaps there's something at play like a performative effect or a self-fulfilling prophecy. At least that could be part of it.
I grew up in an environment where religion and especially faith was a very individual and private matter, with nobody talking about it publicly. Most of my parents and friends were neither true agnostics nor true atheists, but rather not interested in the subject. I was among this category. Churches and Christian artifacts were simply art, history and culture, like Greek, Roman or Egyptian traditions and remnants. We had great interest in this cultural aspect. I became a true atheist after reading Dawkins and various philosophers.
I've never seen the atheist condition as something sad. You are a free and genuine moral agent, you don't do good out of fear of being thrown into hell. You're not subject to a mysterious supernatural will that could ask you to sacrifice your son or cast a meteor shower on your town because your friend is gay, nor to miracles violating causality. Nature shows terrible things but also beauty and happiness, which you can seek, cherish and cultivate. It's up to us humans to make the world a hell like Mordor or a paradise like the Shire or Lothlórien. Nothing is written, you're not a character in God's novel. You are the author, you write the story. Life is yours.
However, I didn't go through deconversion myself, and I can understand that you might endure a feeling of loss, the loss of a promised paradise, of an enchanted world. Religion was part of your life, of your childhood; this was a very courageous move and not an easy one. You must be facing something like mourning.
However, maybe Gendlin's litany can help ? The world was already as it is when you were a child, it has not changed, nothing is truly lost, happiness is still there, just where it was. Maybe you can go back to the places you cherished in your childhood, even the church - it's still there, you can still enjoy the place. And look at the children playing. The enchantment was always in their eyes, not in the world.
Maybe part of what you feel is mourning for childhood itself. I also feel that, but atheism is not guilty.