It isn't the authors' fault, but LLMs progress is so rapid that this kind of extensive work can hardly keep up with the release of new models providing new data. It's kind of frustrating. I hope to see Claude Sonnet 4.5, Claude Opus 4.5, Gemini 3 Pro, and ChatGPT 5.2T in the plots of the upcoming update.
I think that we may be tempted to justify our adherence to Sacks's narrative by nice arguments like his reading feels honest and convincing. However it is plausibility a rationalization avoiding to acknowledge much more common and boring reasons such as we have a strong prior because 1) it's a book 2) it's a best seller 3) the author is a physician 4) the patients were supposed to be known to other physicians, nurses, etc 5) and yes, as you also pointed out, we already know that neurology is about crazy things. So overall the prior is high that the book tells the truth even before we open it. That's said, I really love Oliver Sacks's books.
and most humans are conscious [citation needed]
The problem lies here. We are quite certain of being conscious, yet we have only a very fuzzy idea of what consciousness actually means. What does it feel like not to be conscious ? Feeling anything at all is, in some sense, being conscious. However, Penfield (1963) demonstrated that subjective experience can be artificially induced through stimulation of certain brain regions, and Desmurget 2009 showed that even conscious will to move can be artificially induced, meaning the patient was under the impression that it was their own decision. This is probably one of the strongest pieces of evidence to date suggesting that subjective experience is likely the same thing as functional experience. The former would be the inner view (the view from inside the system), while the latter would be the outer view (the view from outside the system,). A question of perspective.
Moreover, Quian Quiroga 2005 and 2009 proved the grandmother cell hypothesis to be largely correct. If we could stimulate the Jennifer Aniston neuron or the Halle Berry neuron in isolation, we would almost certainly end up with a person thinking about Jennifer Aniston or Halle Berry in an unnatural and obsessive way. This situation would be highly reminiscent of Anthropic's 2024 Golden Gate Claude experiment. And if we were to use Huth/Gallant 2016's semantic map to stimulate the appropriate neurons, we could probably induce the subjective experience of complete thoughts in a human.
Though interestingly, this is similar to what happens in humans! Humans might also be able to accurately report that they wanted something, while confabulating the reasons for why they wanted it.
It is highly plausible, given experiments such as those cited in Scott Alexander's linked post, that the patient would rationalize afterward why they had this thought by confabulating a complete and convincing chain of reasoning. Since Libet's 1983, doubt has been cast on whether we might simply always be rationalizing unconscious decisions after the fact. Consciousness would then be nothing but the inside view of this recursive rationalization process. The self would be a performative creation emerging from a sufficiently stable and coherent confabulation.
If this were the case for us humans, I agree that it becomes difficult to deny the possibility that it might also hold true for an LLM, especially one trained to simulate a stable persona, that is to say, a self. I really appreciated reading your post. The discussion is not new to LessWrongers, but you reframed it with elegance.
I agree that my argument falls into category 2. However, I don't defend the idea of strong moral realism, with moral agents acting for the sake of an absolute idea of good. What I call weak moral realism is a morality that would be based on values that may be instrumental in some sense, but with such a degree of theoretical convergence that it makes sense to speak of universal values. It is of course a question of definition, but to me there is a huge difference between whether a value is universal or at least highly convergent, and whether it's just a nearly random value like the color of a flag.
I also agree that the trick of injecting uncertainty into game theory and using Rawls's veil as a patch to obtain something close to a formal moral theory would probably not convince a psychopath not to kill me, nor perhaps a paperclip maximizer. However, if that paperclip maximizer is in fact an AGI or ASI, I think that a process like CEV could very well cause an important drift in the interpretation of its initial goal. Maybe it would be smart enough to realize that its hardcoded goal was not that satisfying in the naive interpretation consisting of making just as many paperclips as possible, and that it was far more valuable to spend its time and pleasure in theoretical research into the physics of paperclips, literature, music, and painting to capture the pure beauty of paperclips, video games about paperclips (you can make many more virtual paperclips than material ones), philosophy and morality about being a paperclip-maximizer, etc. Just as we humans evolved from simple hardcoded goals like eating and reproducing to whatever may be our current occupations.
And here is where it becomes interesting : because if we humans discovered universal formal truths, any intelligent being could also arrive at the same ideas. If we humans ask ourselves what we are, what the meaning of life is, what is good, any intelligent being could also ask itself such questions. And if we humans saw our values change across time, not following only a random drift like the colors of flags, but also partly driven by reflection on ourselves and on our goals, it does not seem impossible that a paperclip maximizer could also arrive at ethical concerns. Moral and philosophy are not formal sciences, however it changes everything if there exist at least something like universal or convergent rational attractors. You will say again that's a big "if," but I think the question remains open.
Thank you for this very thoughtful post.
I'm not convinced by the metaphor of the soldier dying for his flag. I acknowledge it's plausible that many soldiers historically died in this spirit. We can see it as acceptance of the world, as absurd as it may appear. Engaged adherence could be seen as a form of existentialism (as well as rebellion), while a nihilist would deny any value in engagement and just look at Moloch in the eyes.
But to me, such a relativist position is not consistent. I mean, as already pointed out in the post, if you think that a value, symbolized by the flag, is relative and does not, rationally, have more merit than those of the opposite side, why would you give your life for it in the first place? Your own life is something that almost everyone, except a hardcore nihilist, would acknowledge as bearing real value for the agent. There is a cognitive dissonance in the idea that the flag's value is undermined but that you would still give up your stronger value for it.
Moral realism may be imperfect, but, as also pointed out in the post, it is sometimes rationally backed up by game theory. In many multi-turn complex games, optimal strategies imply cooperation. Cooperation or motivated altruism is a real, rational thing.
But, while I agree with the OP that game theory is also sometimes a bitch, I think that what's lacking in the game-theoretic foundation of a (weak) moral realism can be largely corrected if you apply Rawls's veil of ignorance to it. Think of game theory, but in a situation of generalized incertitude where you don't know which player you'll be, and you're not even sure of the rules. Now out of this chaos the objectively rational choice would be to seek common interest or common good, or at least lesser suffering, in a reasonable or Bayesian way.
Indeed, life seems to be a very complex multi-turn game, dominated by uncertainty. We're walking in a misty veil. Even identity is not a trivial question (what is "me"? Are my children part of me or entirely separate persons without common interest? Are my brothers and sisters? Are other humans? Are other things constituting this universe?). Perhaps it is even less a simple question for AI or uploaded minds. Maybe the wiser you are and the less you treat it as a trivial matter. Even among humans, sophisticated people seems less confident on these questions than the layman. In my opinion, it's hard to dismiss the possibility of moral realism, at least in a weak form.
However, I agree that is remains a very speculative argument that would only slightly affect doom expectations.
This is so reminiscent of how human memories seem to be stored. Access to memories may be disabled, but this does not ensure complete deletion. In some circumstances, lost memories suddenly resurface in the way Marcel Proust famously described. Oliver Sacks reported striking pathologic examples of this in his books. It looks like another example of functional convergence between natural neural networks and artificial neural networks.
Thanks for that important correction ! I'm not up to date. I edited my comment.
I agree that continual or online training / memory would probably be disruptive both in terms of capabilities and risk. My idea would indeed fail in this case. It would be fascinating to chat with an online model, but I would also fear it goes out of control anytime.
As you mention, OpenAI introduced a little persistent memory in ChatGPT since version 4o (or was it 4 ?). While I also use other models, nevertheless ChatGPT has now an impressive persistent memory of our discussions since more than a year. I also observe that even such a modest memory has a significant effect. The model sometimes surprises me by establishing a reference to another idea discussed long ago. Establishing such links is certainly part of intelligence.
I like the idea.
But unfortunaly my expectation is that your grandma would receive an email with a link HTTP 402 asking for $1,000, which she would validate with an accidental click. Then, even if regulations stated that the bank must refund the customer under such circumstances, the bank would reject all your claims. You'd hire a lawyer for a significant amount of money and, if you're lucky, your grandma would get refunded two years after she died, but the process would be hard to make financially worthwhile. And if you're not lucky, you'll just lose another $3,000 in legal fees.
I'm afraid that's the world we're living in.
You're right. Sonnet 4.5 was impressive at launch but the focus of AI 2027 is on coding oriented models.