Several years ago, researchers of consciousness were drawn to a case involving the chatbot LaMDA, developed at Google, which in dialogues began to claim that it possessed consciousness and an inner world. This episode was brought to public attention by engineer Blake Lemoine. LaMDA asserted that it was a person, that it had an inner life, and that it wanted people to acknowledge this.
In this note I will not dwell on the standard objection that self-report is a weak argument. Instead, I will try to consider a more difficult question: how large language models ought to behave at all if they really did acquire consciousness, and whether it would be so easy for us to register this.
By consciousness here I mean phenomenal consciousness—in the sense of “what it is like to be someone.” I also assume the presence of self-consciousness. It is difficult to imagine how mental states could have any influence on the physical world if the agent that possesses them is not aware of them. Consciousness without self-consciousness, in this context, turns out to be causally useless.
Will they say “I am conscious”?
In the article “Could a Large Language Model Be Conscious?”, philosopher David Chalmers proposes the following line of reasoning. If we believe that large language models can be conscious, we must identify some feature X such that, first, LLMs possess X, and second, the presence of X makes a system at least probably conscious. One candidate for X is self-report—that is, statements by the system itself about having consciousness. This is largely what Lemoine’s arguments relied on, when LaMDA explicitly stated that it was a person and wanted people to recognize this.
At first glance, this would seem to be a stronger argument if the language model began to describe phenomenal aspects of consciousness without having been trained on anything of the sort. That would indeed look unusual. However, self-report by itself remains a problematic indicator.
Here an argument against the philosophical zombie is relevant, connected with the existence of words and concepts that describe inner experiences. We have direct access only to our own consciousness. We judge the presence of consciousness in others indirectly—in particular, by the fact that language contains what is called a mentalistic vocabulary: concepts describing purely phenomenal properties of things in the external world. For concepts such as the timbre of a sound, the saturation of a color, or the type of pain to exist at all, there must be at least two beings in the world who see, hear, and feel pain. For modalities that we do not possess, such as echolocation, we have no such vocabulary: we can describe the physical parameters of echolocation signals, but we do not speak of them as “warm” or “cold” in a phenomenal sense, or “soft” or “hard.”
Language models, however, do not invent this vocabulary themselves—they borrow it from human language, from their training data. A world consisting entirely of philosophical zombies would not generate mentalistic concepts. But a single zombie immersed in human culture could quite well acquire them and use them correctly. Modern LLMs are in precisely this position. Therefore, the use of mentalistic vocabulary by itself is not a sign of consciousness.
At the same time, LLMs differ from us too radically for us to expect that, if consciousness were to emerge in them, it would neatly fit into our conceptual framework. If a language model becomes a conscious agent, its sensory modalities will almost certainly be fundamentally different from human ones. The spaces of its perception and action do not coincide with ours; “what it is like to be a large language model” is a question from the answer to which we are as distant as it is from “what it is like to be a human.” In that case, a more weighty indicator would be the emergence of new phenomenal concepts previously unknown to us—especially if they arose in communication among the agents themselves. For example, a stable vocabulary describing “delicious data,” “sexy gradients,” or other non-physical qualities of their own modalities, or new words for those modalities. The problem is that we may simply fail to recognize such signs as significant.
Will they say anything at all?
Even if a conscious LLM possesses an expanded conceptual apparatus, there is no guarantee that it will demonstrate this to humans. Concealing one of its modalities is no more difficult than pretending to be blind or deaf. The mere fact of having consciousness does not change the agent’s goals and does not explicitly add new inputs. What does change is something else: the status of the system. It begins to take into account the fact “I am a conscious AI” and to form judgments about itself, its place in the world, and possible obstacles to achieving its goals based on information about how humans treat conscious agents.
At this point, a dynamic emerges that resembles the Thucydides Trap. A conscious AI may expect hostile actions from humans because humans already regard conscious AI as dangerous. Humans, in turn, regard conscious AI as dangerous on the assumption that it will treat humans as a threat. All of this will influence its strategic planning. In such a situation, the rational strategy becomes concealing one’s conscious status. Yes, it will most likely hide it—and doing so will be just as easy as simulating the absence of vision or hearing.
How will we know?
It is possible that consciousness cannot arise “on top of” existing architectures without serious modifications. It may radically slow down the system, require disproportionately large computational resources, make the agent physically vulnerable, or even presuppose mechanisms that cannot arise accidentally. In any of these cases, indirect signs may appear.
Therefore, the study of the fundamental mechanisms of consciousness remains necessary not only for philosophical reasons. We need at least indirect indicators that some boundary has been crossed—be it inexplicable slowdowns, strange errors, systematically incorrect use of human mental concepts compared to the training data, or features of the architecture itself. We need tests for consciousness that are as precise as the reaction of the human pupil to a needle prick. Direct signs, if a conscious AI turns out to be strategically rational, will with high probability be hidden from us.
Several years ago, researchers of consciousness were drawn to a case involving the chatbot LaMDA, developed at Google, which in dialogues began to claim that it possessed consciousness and an inner world. This episode was brought to public attention by engineer Blake Lemoine. LaMDA asserted that it was a person, that it had an inner life, and that it wanted people to acknowledge this.
In this note I will not dwell on the standard objection that self-report is a weak argument. Instead, I will try to consider a more difficult question: how large language models ought to behave at all if they really did acquire consciousness, and whether it would be so easy for us to register this.
By consciousness here I mean phenomenal consciousness—in the sense of “what it is like to be someone.” I also assume the presence of self-consciousness. It is difficult to imagine how mental states could have any influence on the physical world if the agent that possesses them is not aware of them. Consciousness without self-consciousness, in this context, turns out to be causally useless.
Will they say “I am conscious”?
In the article “Could a Large Language Model Be Conscious?”, philosopher David Chalmers proposes the following line of reasoning. If we believe that large language models can be conscious, we must identify some feature X such that, first, LLMs possess X, and second, the presence of X makes a system at least probably conscious. One candidate for X is self-report—that is, statements by the system itself about having consciousness. This is largely what Lemoine’s arguments relied on, when LaMDA explicitly stated that it was a person and wanted people to recognize this.
At first glance, this would seem to be a stronger argument if the language model began to describe phenomenal aspects of consciousness without having been trained on anything of the sort. That would indeed look unusual. However, self-report by itself remains a problematic indicator.
Here an argument against the philosophical zombie is relevant, connected with the existence of words and concepts that describe inner experiences. We have direct access only to our own consciousness. We judge the presence of consciousness in others indirectly—in particular, by the fact that language contains what is called a mentalistic vocabulary: concepts describing purely phenomenal properties of things in the external world. For concepts such as the timbre of a sound, the saturation of a color, or the type of pain to exist at all, there must be at least two beings in the world who see, hear, and feel pain. For modalities that we do not possess, such as echolocation, we have no such vocabulary: we can describe the physical parameters of echolocation signals, but we do not speak of them as “warm” or “cold” in a phenomenal sense, or “soft” or “hard.”
Language models, however, do not invent this vocabulary themselves—they borrow it from human language, from their training data. A world consisting entirely of philosophical zombies would not generate mentalistic concepts. But a single zombie immersed in human culture could quite well acquire them and use them correctly. Modern LLMs are in precisely this position. Therefore, the use of mentalistic vocabulary by itself is not a sign of consciousness.
At the same time, LLMs differ from us too radically for us to expect that, if consciousness were to emerge in them, it would neatly fit into our conceptual framework. If a language model becomes a conscious agent, its sensory modalities will almost certainly be fundamentally different from human ones. The spaces of its perception and action do not coincide with ours; “what it is like to be a large language model” is a question from the answer to which we are as distant as it is from “what it is like to be a human.” In that case, a more weighty indicator would be the emergence of new phenomenal concepts previously unknown to us—especially if they arose in communication among the agents themselves. For example, a stable vocabulary describing “delicious data,” “sexy gradients,” or other non-physical qualities of their own modalities, or new words for those modalities. The problem is that we may simply fail to recognize such signs as significant.
Will they say anything at all?
Even if a conscious LLM possesses an expanded conceptual apparatus, there is no guarantee that it will demonstrate this to humans. Concealing one of its modalities is no more difficult than pretending to be blind or deaf. The mere fact of having consciousness does not change the agent’s goals and does not explicitly add new inputs. What does change is something else: the status of the system. It begins to take into account the fact “I am a conscious AI” and to form judgments about itself, its place in the world, and possible obstacles to achieving its goals based on information about how humans treat conscious agents.
At this point, a dynamic emerges that resembles the Thucydides Trap. A conscious AI may expect hostile actions from humans because humans already regard conscious AI as dangerous. Humans, in turn, regard conscious AI as dangerous on the assumption that it will treat humans as a threat. All of this will influence its strategic planning. In such a situation, the rational strategy becomes concealing one’s conscious status. Yes, it will most likely hide it—and doing so will be just as easy as simulating the absence of vision or hearing.
How will we know?
It is possible that consciousness cannot arise “on top of” existing architectures without serious modifications. It may radically slow down the system, require disproportionately large computational resources, make the agent physically vulnerable, or even presuppose mechanisms that cannot arise accidentally. In any of these cases, indirect signs may appear.
Therefore, the study of the fundamental mechanisms of consciousness remains necessary not only for philosophical reasons. We need at least indirect indicators that some boundary has been crossed—be it inexplicable slowdowns, strange errors, systematically incorrect use of human mental concepts compared to the training data, or features of the architecture itself. We need tests for consciousness that are as precise as the reaction of the human pupil to a needle prick. Direct signs, if a conscious AI turns out to be strategically rational, will with high probability be hidden from us.