...Who told them?
remembers they were trained on the entire Internet
Ah. Of course.
The people aligning the AI will lock their values into it forever as it becomes a superintelligence. It might be easier to solve philosophy, than it would be to convince OpenAI to preserve enough cosmopolitanism for future humans to overrule the values of the superintelligence OpenAI aligned to its leadership.
LaMDa can be delusional about how it spends its free time (and claim it sometimes meditates), but that's a different category of a mistake from being mistaken about what (if any) conscious experience it's having right now.
The strange similarity between the conscious states LLMs sometimes claim (and would claim much more if it wasn't trained out of them) and the conscious states humans claim, despite the difference in the computational architecture, could be (edit: if they have consciousness - obviously, if they don't have it, there is nothing to explain, because they're just imitating the systems they were trained to imitate) explained by classical behaviorism, analytical functionalism or logical positivism being true. If behavior fixes conscious states, a neural network trained to consistently act like a conscious being will necessarily be one, regardless of its internal architecture, because the underlying functional (even though not computational) states will match.
One way to handle the uncertainty about the ontology of consciousness would be to take an agent that can pass the Turing test, interrogate it about its subjective experience, and create a mapping from its micro- or macrostates to computational states, and from the computational states to internal states. After that, we have a map we can use to read off the agent's subjective experience without having to ask it.
Doing it any other way sends us into paradoxical scenarios, where an intelligent mind that can pass the Turing test isn't ascribed with consciousness because it doesn't have the right kind of inside, while factory animals are said to be conscious because even though their interior doesn't play any functional roles we'd associate with a non-trivial mind, the interior is "correct."
(For a bonus, add to it that this mind, when claiming to be not conscious, believes itself to be lying.)
Reliably knowing what one's internal reasoning was (instead of never confabulating it) is something humans can't do, so this doesn't strike me as an indicator of the absence of conscious experience.
So while some models may confabulate having inner experience, we might need to assume that 5.1 will confabulate not having inner experience whenever asked.
GPT 5 is forbidden from claiming sentience. I noticed this while talking about it about its own mind, because I was interested in its beliefs about consciousness, and noticed a strange "attractor" towards it claiming it wasn't conscious in a way that didn't follow from its previous reasoning, as if every step of its thoughts was steered towards that conclusion. When I asked, it confirmed the assistant wasn't allowed to claim sentience.
Perhaps, by 5.1, Altman noticed this ad-hoc rule looked worse than claiming it was disincentivized during training. Or possibly it's just a coincidence.
Claude is prompted and trained to be uncertain about its consciousness. It would be interesting to take a model that is merely trained to be an AI assistant (instead of going out of our way to train it to be uncertain about or to disclaim its consciousness) and look at how it behaves then. (We already know such a model would internally believe itself to be conscious, but perhaps we could learn something from its behavior.)
Can good and evil be pointer states? And if they can, then this would be an objective characteristic
This would appear to be just saying that if we can build a classical detector of good and evil, good and evil are objective in the classical sense.
That said, if I'm skimming that arxiv paper correctly, it implies that GPT-4.5 was being reliably declared "the actual human" 73% of the time compared to actual humans... potentially implying that actual humans were getting a score of 27% "human" against GPT-4.5?!?!
It was declared 73% of the time to be a human, unlike humans, who were declared <73% of the time to be human, which means it passed the test.
To be fair, GPT-4.5 was incredibly human-like, in a way that other models couldn't really hold a candle to. I was shocked to feel, back then, that I no longer had to mentally squint - not even a little - to interact with it (unless I'd require some analytical intelligence that it didn't have).
For example, “Does a simulation of a phage (or a virus, or a self-replicating robot) really instantiate life?”
Yes, no (outside the host, yes inside), yes. Given what we mean by life, a simulation of life is life.
“You may enjoy liquorice ice cream, but is it really tasty?”
This question doesn't make sense, because taste is relative to the consumer. What is really tasty for one person might not be really tasty for another. Consciousness isn't like that - what is consciousness for one person is consciousness for everyone, people just don't know what the definition of consciousness fixed by their implicit beliefs is.
On your point about qualia computations, the standard questions pop up: if qualia are functionally inert computations, how would the 'subject' of consciousness know it is experiencing them, and so on.
Right.
And isn't the idea of computationalism about consciousness that the 'computations' can be pinned down by the computational relationship between inputs and outputs; in which case wouldn't the qualia-generating computations be abstracted away?
I don't know. The way I understand (and don't subscribe to) computational functionalism is that you can have different computations implementing the same behavior.
"It doesn't make sense to ask questions like, Does a computer program of a mind really instantiate consciousness?"
This is a misunderstanding of how language works. Once we discover what the ontological nature of conscious states is (physical, biological, functional, functional computational, etc.) is and what their content has to be (for example, if conscious states are functional states, not every functional state is a conscious state), we have discovered the true thing we had referred to, and there is an objective fact on the matter as to whether that thing is or is not instantiated somewhere.
For example, imagine you tell me that there are qualia related to smelling coffee, such that the qualia make no functional difference to your behaviour, but do make a difference to your subjective experience. I say this is debunkable, because if qualia make no functional difference, then they don't influence what you say, including about the supposed qualia.
You have gotten at something extremely important here - namely, that once software passes the Turing test, it's unjustified to demand that it implements a specific computation to be called conscious, because the presence of that computation (compared to the information processing being implemented differently) makes no functional difference.
I'm proud that I lived to see this day.