Making Sense of Consciousness Part 8: Summing Up

Midjourney, “machine consciousness, self-referential cognition, embodied self reaching through configuration space, models interacting in the noosphere”

This whole consciousness sequence (part 1, part 2, part 3, part 4, part 5 , part 6, part 7 ) has taken a long time, and I feel like I’ve barely scratched the surface. I may have bitten off a bit more than I can chew.

But it’s time to move on, and time to make a few looser and more speculative statements. What did we learn here? How do I view consciousness now that I’ve done some reading about it?

Clearer Definitions, Informed By Examples

Going back to Eric Hoel’s three “things we talk about when we talk about consciousness”:

perception/awareness/noticing things;
the waking state, rather than deep sleep/coma/anaesthesia
the sense of a self

I think #1 is really the primary concept and #2 and #3 are derivative.

I might try to define (2) and (3) in terms of (1) as follows:

The waking state (2) is a state in which you notice/perceive/are-aware-of (1) anything, while total unconsciousness is a state in which you are conscious of zero things.
The self (3) is the subset of your experiences where you can directly notice/perceive/be-aware-of (1) the process of action, including the mental planning events (choices) that precede and correspond to voluntary actions.

To expand on the “self” idea; while you can perceive the behavior of an external object — a falling rock, or another human being — you cannot yourself perceive the whole process of their behavior including internal experiences. You can see someone else raise their hand; you do not feel the hand motion as they do, or experience their choice-to-move prior to moving.

Is this “enough” for a sense of self? Necessary and sufficient?

That’s where I think the experimental evidence is suggestive. In cases where people lose or gain a sense of “self-ownership”, “volition”, or “agency” — rubber hand illusions, distorted virtual reality environments, neglect and asomatognosia, depersonalization, out-of-body experiences, etc — mostly when the “sense of self” is disrupted there’s some evidence of a problem with this kind of “inspectability of action.” Either some of the sensory/motor data in the process of action is missing, or there’s an inconsistency in it, or it’s not getting sent to the relevant parts of the brain.1 And when we present people with a rubber hand illusion, or a tool or user interface they can manipulate dexterously, they can acquire a “sense of self” over objects that aren’t literally part of their own bodies, just from being able to coherently perceive and predict the process of action and all its immediate sensory consequences.

Now, going back upstream, can we define “conscious perception” in a non-circular way? Not referring to any synonyms of “conscious”, “awareness”, “subjective perception”, etc.

Maybe it goes something like this:

A stimulus is consciously perceived (1) if the agent’s behavior not only depends on the stimulus, but also depends on a “hidden variable” known as the perception of the stimulus, such that the perception varies with the agent’s empirical ability to behave in ways that closely track the stimulus.

In other words, if you are conscious, you can “bet harder” on more confident, clearer, more unambiguous perceptions, and be more hesitant or cautious in cases where you actually have less accurate control over the end result. Your perception encodes an assessment of how good your perception is (and what you expect to be able to do with it.)

A person with blindsight, most commonly, expects to perform at chance on visual tasks and is surprised to find that “somehow” he knew where the visual stimulus was; a sighted person not only sees but knows he sees (and, thus, would notice and change behavior if his vision faltered).

Similarly, a person whose behavior is “subconsciously” influenced by some experience, cannot accurately predict in what way she is influenced, and will have wrong anticipations about her own behavior and how likely it is to succeed at its aims.

You can even cash this out empirically as performance on a calibration task.

Why Consciousness?

Why do we “know we know” things, instead of just responding appropriately to stimuli?

Why didn’t we evolve to be like the vampires in Peter Watts’ Blindsight, who have no conscious experience, only perfectly efficient stimulus-response patterns? Why are we wasting precious metabolic resources on navel-gazing?

This “calibration” angle suggests a practical benefit; when adaptive behavior is sufficiently complex, as in humans (and likely some other animals, I’m not making a strong cutoff claim here), it’s useful to have calibrated confidence, to be more cautious when information is ambiguous and to take bigger bets when it’s clear. It’s useful to be able to detect, and adjust behavior to, how competent your own body is in various ways — to notice when you’re blind or lame, for instance. This requires some functions that keep track of one’s own state.2

Being able to switch to a more cautious strategy is probably more useful for some organisms than others; e.g. more K-selected organisms have big, expensive bodies, long lives, and few offspring, so they’re more likely to evolve the capacity to switch strategies within a single organism, rather than just having a lot of offspring each with a different hard-coded strategy.

Being able to tell “self” from “other”, likewise, is useful for deciding where to invest effort.

You can, to some extent, try to learn a fully general, nondual, “optimize the entire world into a state I prefer” function. But in reality, you have much more control over yourself than the whole world, and it’s going to be more efficient to “factor” that function into “control my own behavior to get desired outcomes” rather than trying to directly “will the world to be better”.

There might be situations where the nondual strategy works better; these are probably the same sorts of situations where “flow states” lead to better performance, like physical dexterity tasks in domains where you’ve had a lot of practice. I expect that’s less true in more open-ended and unfamiliar contexts. In a novel virtual reality environment, I’d strongly predict that figuring out which item on the screen “is you” (aka is controlled by your joystick) will lead to better performance than never drawing that distinction.3

This is especially true given that the evolutionarily “desired state of the world” for any organism involves its own bodily survival and reproduction. Self-other distinctions help with self-preservation!

How Does Consciousness Relate To Sentience?

Pain, pleasure, and other valence-laden feelings are examples of perceptions you can be conscious of.

Could you have pain and pleasure without being conscious of them?

I think this is a bad question; an unconscious pain or pleasure would be so different from the ordinary conscious sort that it’s misleading to use the same word for them.

You can certainly classically condition individual neurons, when the organism is anaesthetized and clearly not conscious, or in a petri dish, and you might refer to the stimulus used for conditioning as a “reward” or “punishment”, but that’s really an abuse of language, in my opinion. Single neurons, or single-celled organisms, or anaesthetized mammals, do not plausibly like or dislike things.4

Could a being be conscious without having any equivalent of (conscious) pain or pleasure?

That one’s not as clear.

If an animal is conscious of some aspects of its own state (the acuity of its senses, its physical condition, etc) then one of the most obviously useful states to be conscious of is injury. Conscious perception of nociceptive signals is probably present in any animal that’s conscious of anything at all.

On the other hand, this might not be quite the same thing as consciously suffering, finding the “pain” signals aversive, or disliking the “pain.”

After all, there are many things humans know to be dangerous that aren’t directly aversive. You can tell just by introspection that “ow that hurts” feels different from “if I stay out here much longer I’ll get hypothermia.” And people can be calm in a crisis, with no aversive feelings, just an intense drive to take appropriate life-saving action.

Humans, and, I believe, mammals, do have basic emotions, and do “dislike” or “suffer from” pain. They’re similar enough to us behaviorally and anatomically that I think we can generalize from the fact that we suffer from pain and enjoy pleasure. (Though, if you believe Temple Grandin, some mammals like sheep and cows find pain less aversive than humans do, and find startling/frightening experiences more so.)

But could an organism consciously perceive signals of injury and react appropriately, without having the aversive, unpleasant experience we know as “pain”? Yeah, I think it’s possible, and I don’t know enough to rule it out in non-mammals.

I don’t have a great non-circular definition of valence, the sensation of liking/enjoying or disliking/suffering.

Like the sensation of choice, valenced feelings are mental events that people can directly introspectively perceive, and which (in humans and probably mammals) has a tight match to certain types of brain activity in certain regions. And most of the time, we’d expect more positive valence around situations that an agent seeks out, and more negative valence around situations that an entity avoids — but not always! (It’s possible to intentionally do something painful, and that doesn’t mean the pain isn’t unpleasant.)

What about AI?

A thermostat is not even “slightly conscious”, by my definition. It does not represent its state; it is its state.

What about an LLM? There, it gets complicated.

Are there “parts” of an LLM that represent itself? Well, certainly you can find neurons or circuits associated with language about LLMs, but that’s probably not enough to count.

Are there “parts” of an LLM that vary with the state of the LLM’s knowledge? Can an LLM make calibrated bets? Sure, to some degree, sometimes.

That would seem to suggest that either my definition is wrong or we need to bite the bullet that an LLM can be conscious of nonzero things. It can “know it knows” or “know it doesn’t know.”

Can an LLM tell you about “internal experiences” that match up well to what a “digital neurosurgeon” would inspect about its weight activations? I actually have no idea.

Is there a “global workspace”, some subcomponent of the LLM that represents the “contents of its consciousness”, such that it will do better at tasks related to what’s currently in its workspace, and also predict its own better performance? Again, I don’t know.5

Does an LLM “represent itself” in the sense of representing having a set of actions and consequences that it knows it can control, as opposed to a set of sensations that it can’t control? I don’t think this is possible in the chatbot context; it might be possible for a digital agent that autonomously explores a space, taking actions and observing their consequences. A chatbot has “user” vs “assistant” hardcoded in; it didn’t autonomously discover through experiment that some messages in a context window are its own and some are the user’s, the way a baby discovers its hands are its own and Mama’s body isn’t.

Can an LLM feel an equivalent of pain or pleasure (or enjoyment/suffering if you prefer)? I have no idea how you’d begin to go about answering that question for arbitrary computer programs, but I strongly doubt it for present-day chatbots.

I expect you need a self-model, and a representation of your own “welfare” (aka injured or not, in danger or not), as a necessary but not sufficient condition for valence; at any rate, that’s what mammals have. And it’s pretty clear that LLMs don’t have these things.6

I don’t think it’s impossible for a machine intelligence of some sort to be conscious and have a “sense of self” or even valence, the way we do, but I’m pretty sure that a chatbot instance doesn’t have most of this.

To my surprise, though, I find myself saying “most”, not “any”. Transformers have a lot of capacity for self-referential loops, and it’s pretty clear that chatbots do “model” some things about “themselves” (or at least about what they’re saying, about to say, etc). It’s not all blindsight.

Which is weird. We’re dealing with things that are surprisingly like us in some ways, and more primitive than mice in others.

I’ve always found it advisable to relate to LLMs as tools rather than as people (even if they are tools that can effectively simulate people), and I still endorse this. I don’t think they behave like people, I think you will make wrong predictions if you anthropomorphize them, and I don’t think you’re morally obliged to treat them like people.7 But I do think it’s interesting and worthwhile to devote some thought to what would make that no longer true.

less is known about the experience of perceiving your own thoughts, but I think it can fit in a similar paradigm. For instance schizophrenics sometimes complain of having a hard time knowing what they themselves are thinking, not being able to directly “see”/”feel”/”inspect” their own thoughts, and having to laboriously and indirectly infer the contents of their minds. We also have ample evidence from correlating neural activity to self-reports in real time that people can detect certain brain activity changes as mental experiences.

There are probably a lot of ways to “keep track of one’s own state.” For another example, it’s useful to be able to distinguish information that you’ve sampled heavily or prioritized for more detailed processing, versus information that’s sparser or less extensively analyzed. This might be what “attention” is.

I definitely lose more at video games when I lose track of where “my guy” is!

See also. The “reward signal” that shapes an RL algorithm isn’t necessarily represented as a desirable thing inside the policy. It’s not the same thing as the plain-English connotation of “reward” as something people enjoy. Phasic dopamine is more like “reward” in the RL sense than the plain-English sense, for instance.

how closely does “attention” in the deep learning architecture sense correspond to “attention” in the workspace sense? again, I’m not sure, worth thinking more about.

Or, at any rate, for a particular instance of a chatbot there isn’t even really a meaningful concept of what it would be to “die” or “cease to exist”, that it could know about. A collective system of similar bots interacting with people can have something that looks awfully like a “preference to continue existing”.

Not even “just in case”. Too much “this is highly implausible but we ought to, just in case…” thinking, in moral contexts, leads rapidly to crazytown.

LESSWRONG
LW