If an AI understands is a topic of debate. Seldom debated, however, are the limits of our own understanding. We imagine our understanding as complete and establish this as a baseline for AI. A different framing appears when we concede our understanding as incomplete and apply this to AI.
The fundamental issue goes like this:
An AI may describe an apple as "sweet" though it has never tasted one, "red" though it has never seen one, and "smooth" though it has never felt one. So while an AI may skillfully apply the word “apple”, in what way, if any, can we say an AI understands the word “apple”?
Steven Sarnad named this the “symbol grounding problem." He reasoned that the symbols we use as reference must be grounded, or referent, in phenomena to be meaningful. After all, the word “apple” when written in a book has no meaning on its own. Meaning of symbols is “parasitic on the meanings in our heads” he writes. To Sarnad, the AI doesn’t understand at all, only we do, as only we can bridge the gap between reference and referent.
Yet now we have AIs that seem to ground text in images. Have we proved Sarnad wrong? Do these AIs understand something previous AIs did not? Predating Sarnad, John Searle, with keen foresight, addressed this issue. In Minds, Brains and Programs, Searle fashions and then refutes the “Robot Reply” argument. He describes an AI that would “have a television camera attached to it that enabled it to 'see,' it would have arms and legs that enabled it to 'act,' and all of this would be controlled by its computer 'brain’.” Yet to Searle, this “adds nothing” to the argument. We are simply introducing more symbols, and to exploit correlations in symbols between modalities is no different than exploiting correlations within a modality. To Searle, grounding was neither the problem, nor the solution, to semantics arising from syntax.
So just what is a grounded AI lacking? On this question, Sarnad and Searle are silent. Less silent, though, were the empiricists predating them. We attempt to get at our question by asking what they asked: what are grounded human minds lacking?
Back to the apple. The empiricists accept that the sweetness, redness and smoothness of an apple is a manifestation of our minds. Outside the mind, apples have no taste, color or texture. These properties make up the phenomena of apples. To know what an “apple” is outside the mind, is to know what Kant called the noumena of an apple: the apple “in itself.” Kant acknowledged that the existence of things outside the mind can only be “accepted on faith” and famously called this dilemma the “scandal to philosophy.” Hume considered our belief in external objects a “gross absurdity”, and Berkeley denied not only our knowledge of external reality, but its very existence. Several centuries later and the issue is far from settled.
Back to our framing question: what is a grounded human mind lacking? One answer is we lack understanding of the noumena.
Our brains receive only representations of reality. Electrical stimuli, whether through the optical nerve or ocular cilia, etc. We are hopeless to know what these stimuli are at the most fundamental level -- their nature eludes us. Our brains feast on physical reality and know it not.
AIs receive only representations of our phenomenological experience. Word tokens for language, or RGB matrices for images, etc. AI is hopeless to know what our experience is at the most fundamental level -- their nature eludes them. AIs feast on computational reality and know it not.
As the electrical stimuli in vacuo carry no meaning, so too do the tokens we give an AI. The inputs into brains are asemic: they are the syntax of “squiggles and squoggles” Searle describes. Meaning is created out of minds. What is peculiar about the situation of AI is that its input is the representation of the very thing we find meaningful. Is not the token embedding of “apple” given a representation of our semantic concept? Is not an image of an apple a representation of our visual phenomena? Our realities are nested. Our phenomena is the AI’s noumena.
The philosopher Colin McGinn founded the epistemological school called “Mysterianism.” It supposes that certain properties can be both real and fundamentally unknowable. All that is knowable to a mind is its “cognitive closure.” Reflecting on what can be outside cognitive closure, McGinn writes, “what is noumenal for us may not be miraculous itself. We should therefore be alert to the possibility that a problem that strikes us as deeply intractable...may arise from an area of cognitive closure in our way of representing the world.” External reality, among other things, McGinn argues, is such.
How can we fault an AI then, without being hypocritical ourselves? We are asking AI to grasp what is outside its cognitive closure, which we ourselves cannot do. A different framing would be to argue that relative to an AI, we have transcendent knowledge. There is an explanatory gap the AI will never bridge, but one so natural we effortlessly cross. Our cognitive closure encircles the AI's. We are like mystics, aware of a reality beyond the comprehension of another.
In the Hebrew rabbinical interpretations of the Book of Genesis, the Bereishit Rabbah, when God created Adam and gave him language to name the animals, the Angels in Heaven could not understand this revelation. Being of spirit, their perception was confined to the spiritual. Only a corporeal being, like Adam, masked by the flesh from perceiving the noumena directly, could comprehend these representations. Adam was the semantic link between the Heavens and the Earth. Unlike derived languages today, this Adamic language is thought to be in harmony with the essential nature of its reference—the semantic collapse of sign and signified.
When we experience words, we experience them conceptually. When we experience images, we experience them visually. Like the Adamic language, these representations are in direct correspondence with their reference. This is their phenomenal nature revealed to us without mediation—the semantic collapse of sign and signified. This is also their noumenal quality concealed from an AI. An AI is masked from perceiving our reality by a computational flesh. For its own comprehension, a representation, cast in the likeness of our own, is needed. Like the Angels, who only experience semantics directly, we are baffled by these representations, unable to grasp their meanings.
Does an AI understand? Yes, but differently. It comprehends a reality grounded in the computational, which is in itself grounded in the phenomenological, which is in itself grounded in the incomprehensible.
Harnad, Steven (1990). The Symbol Grounding Problem
OpenAI, Dalle2 https://openai.com/dall-e-2/
Searle, John (1980). Minds, Brains, and Programs
Kant, Immanuel (1791). Critique of Pure Reason
Hume, David (1739). A Treatise of Human Nature
Berkeley, George (1710). A Treatise concerning the Principles of Human Knowledge
It is an unfortunate accident of recent history that the first AI to capture our imagination of an AGI was a language model. For in a language model both the input and output are the same, though their semantic quality different.
McGinn, Colin (1989). Can We Solve the Mind-Body Problem?
"He (God) brought before them (Angels) beast and animal and bird. He said to them: This one, what is his name? and they didn’t know. He made them pass before Adam. He said to him: This one, what is his name? (Adam) said: This is ox/shor, and this is donkey/chamor and this is horse/sus and this is camel/gamal." https://www.sefaria.org/Bereishit_Rabbah.17.4