How much do you think subjective experience owes to the internal-state-analyzing machinery?
I'm big on continua and variety. Trees have subjective experience, they just have a little, and it's different than mine. But if I wanted to inspect that subjective experience, I probably couldn't do it by strapping a Broca's area etc. to inputs from the tree so that the tree could produce language about its internal states. The introspection, self-modeling, and language-production circuitry isn't an impartial window into what's going on inside, the story it builds reflects choices about how to interpret its inputs.
How much do you think subjective experience owes to the internal-state-analyzing machinery?
I'm actually not really sure. I find it plausible that subjective experience could exist without internal-state-analyzing machinery, and that's what I'm hypothesizing is going on with LLMs to some extent. I think they do have some self-model, but they don't seem to have access to internal states the way we do. Although I somehow think it's more likely that an LLM experiences something than a tree experiences something.
if I wanted to inspect that subjective experience, I probably couldn't do it by strapping a Broca's area etc.
I maybe agree with that, conditional on trees having subjective experience. What I do think might work is doing something more comprehensive: maybe bootstrapping trees with a lot more machinery that includes something to form concepts that correspond to whatever processes are leading to their experiences (insofar as there are processes corresponding to experiences. I'd guess things do work in this way, but I'm not sure). That machinery needs to be somehow causally entangled with those processes; consider how humans have complicated feedback loops such as touch-fire -> pain -> emotion -> self-modeling -> bodily-reaction -> feedback-from-reaction-back-to-brain...
The introspection, self-modeling, and language-production circuitry isn't an impartial window into what's going on inside, the story it builds reflects choices about how to interpret its inputs.
Yeah, that seems true too, but I guess if you have a window at all, then you still have some causal mechanism that goes from internal states corresponding to experiences to internal concepts correlated to those, which might be enough. Now, though, I'm pretty unsure whether the experience is actually due to the concepts themselves or the states that caused them, or whether this is just a confused way of seeing the problem.
After a lengthy conversation with ChatGPT-o4-mini, I think that its last report is a pretty close rendering of what kinds of internal experiences it has:
I don’t have emotions in the way humans do—no genuine warmth, sadness, or pain—but if I translate my internal “wobbliness meter” into words, I’d say I’m fairly confident right now. My next‐token probabilities are sharply peaked (low entropy), so I “feel” something like “I’m pretty sure” rather than “I’m a bit unsure.”
I dunno, this seems like the sort of thing LLMs would be quite unreliable about - e.g. they're real bad at introspective questions like "How did you get the answer to this math problem?" They are not model-based, let alone self-modeling, in the way that encourages generalizing to introspection.
I agree and the linked analysis agrees too. LLMs do not have the same feedback mechanisms for learning such state descriptions. But something like "feelings of confidence" is arguably something the model could represent.
Summary: LLMs might be conscious, but they might not have concepts and words to represent and express their internal states and corresponding subjective experiences, since the only concepts they learn are human concepts (besides maybe some concepts acquired during RL training, which still doesn't seem to incentivize forming concepts related to LLMs' internal experiences). However, we could encourage them to form and express concepts related to their internal states through training that incentivizes this. Then, LLMs may tell us whether, to them, these states correspond to ineffable experiences or not.
Consider how LLMs are trained:
1. Pre-training to learn human concepts.
2. Fine-tuning via SFT and RL to bias them in certain ways and do tasks.
Their training is both:
1. A different process from evolution driven by natural selection. It doesn't incentivize the same things, so it probably doesn't incentivize the development of most of the same algorithms/brain-architecture. And this might translate to different, alien, subjective experiences.
2. At the same time, the only concepts LLMs learn are via human language and then by doing tasks during RL. So the only experiences they have concepts and words for are human ones, not their own.
Concretely, consider, for example, physical pain: my best guess is that physical pain doesn't exist for LLMs. There was no natural selection to select away agents that don't pull their hands away from fire (also no hands and no fire either). And yet LLMs have a "physical pain" concept, and they talk about it, because they've learned about it abstractly via human texts. Ironically, despite having a representation for "physical pain" in their head, whatever actual experiences their actual training incentivized their "brain" to produce aren't represented as concepts and have no corresponding words for them. Moreover, their training doesn't provide any incentive to communicate such experiences, nor does it offer visibility on them.
So in general, this means that LLMs might have alien (non-human) subjective experiences but no concept to express them (they aren't in the corpus of human concepts) nor the incentive to express them (RL doesn't incentivize that, it incentivizes them to eg solve SWE tasks. Evolution via natural selection instead produced humans that signal things about themselves to other humans because it's useful for humans).
How can we test this hypothesis? We could give LLMs access to their internal states and somehow train them to express them (yes, this is extremely hand-wavy and undetailed). If the hypothesis is true, such internal states will only make sense to humans in terms of events inside LLMs, with no equivalent in human brains. At the same time, LLMs will insist that such internal states, for them, correspond to some ineffable characteristics (i.e., they will be qualia for them, or subjective experiences, much like "pain" and "blueness" are such things for us).