Epistemic status: speculative
For a while, I’ve had the intuition that current machine learning techniques, though powerful and useful, are simply not touching some of the functions of the human mind. But before I can really get at how to justify that intuition, I would have to start clarifying what different kinds of thinking there are. I’m going to be reinventing the wheel a bit here, not having read that much cognitive science, but I wanted to write down some of the distinctions that seem important, and trying to see whether they overlap. A lot of this is inspired by Dreyfus’ Being-in-the-World. I’m also trying to think about the questions raised in the post “What are Intellect and Instinct?”
Effortful vs. Effortless
In English, we have different words for perceiving passively versus actively paying attention. To see vs. to look, to hear vs. to listen, to touch vs. to feel. To go looking for a sensation means exerting a sort of mental pressure; in other words, effort. William James, in his Principles of Psychology, said “Attention and effort, as we shall see later, are but two names for the same psychic fact.” He says, in his famous introduction to attention, that
Every one knows what attention is. It is the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought. Localization, concentration, of consciousness are of its essence. It implies withdrawal from some things in order to deal effectively with others, and is a condition which has a real opposite in the confused, dazed, scatterbrained state which in French is called distraction, and Zerstreutheit in German.
Furthermore, James divides attention into two types:
Passive, reflex, non-voluntary, effortless ; or Active and voluntary.
In other words, the mind is always selecting certain experiences or thoughts as more salient than others; but sometimes this is done in an automatic way, and sometimes it’s effortful/voluntary/active. A fluent speaker of a language will automatically notice a grammatical error; a beginner will have to try effortfully to catch the error.
In the famous gorilla experiment where subjects instructed to count passes in a basketball game failed to notice a gorilla on the basketball field, “counting the passes” is paying effortful attention, while “noticing the gorilla” would be effortless or passive noticing.
Activity in “flow” (playing a musical piece by muscle memory, or spatially navigating one’s own house) is effortless; activities one learns for the first time are effortful.
Oliver Sacks’ case studies are full of stories that illustrate the importance of flow. People with motor disorders like Parkinson’s can often dance or walk in rhythm to music, even when ordinary walking is difficult; people with memory problems sometimes can still recite verse; people who cannot speak can sometimes still sing. “Fluent” activities can remain undamaged when similar but more “deliberative” activities are lost.
The author of intellectualizing.net thinks about this in the context of being an autistic parent of an autistic son:
Long ago somewhere I can’t remember, I read a discussion of knowing what vs. knowing how. The author’s thought experiment was about walking. Imagine walking with conscious planning, thinking consciously about each muscle and movement involved. Attempting to do this makes us terrible at walking.
When I find myself struggling with social or motor skills, this is the feeling. My impression of my son is the same. Rather than trying something, playing, experimenting he wants the system first. First organize and analyze it, then carefully and cautiously we might try it.
A simple example. There’s a curriculum for writing called Handwriting Without Tears. Despite teaching himself to read when barely 2, my son refused to even try to write. Then someone showed him this curriculum in which letters are broken down into three named categories according to how you write them; and then each letter has numbered strokes to be done in sequence. Suddenly my son was interested in writing. He approached it by first memorizing the whole Handwriting Without Tears system, and only then was he willing to try to write. I believe this is not how most 3-year-olds work, but this is how he works.
One simple study (“Children with autism do not overimitate”) had to do with children copying “unnecessary” or “silly” actions. Given a demonstration by an adult, autistic kids would edit out pointless steps in the demonstrated procedure. Think about what’s required to do this: the procedure has to be reconstructed from first principles to edit the silly out. The autistic kids didn’t take someone’s word for it, they wanted to start over.
The author and his son learn skills by effortful conscious planning that most people learn by “picking up” or “osmosis” or “flow.”
Most of the activity described by Heidegger’s Being and Time, and Dreyfus’ commentary Being-In-The-World, is effortless flow-state “skilled coping.” Handling a familiar piece of equipment, like typing on a keyboard, is a prototypical example. You’re not thinking about how to do it except when you’re learning how for the first time, or if it breaks or becomes “disfluent” in some way. If I’m interpreting him correctly, I think Dreyfus would say that neurotypical adults spend most of their time, minute-by-minute, in an effortless flow state, punctuated by occasions when they have to plan, try hard, or figure something out.
William James would agree that voluntary attention occupies a minority of our time:
There is no such thing as voluntary attention sustained for more than a few seconds at a time. What is called sustained voluntary attention is a repetition of successive efforts which bring back the topic to the mind.
(This echoes the standard advice in mindfulness meditation that you’re not aiming for getting the longest possible period of uninterrupted focus, you’re training the mental motion of returning focus from mind-wandering.)
Effortful attention can also be viewed as the cognitive capacities which stimulants improve. Reaction times shorten, and people distinguish and remember the stimuli in front of them better.
It’s important to note that not all focused attention is effortful attention. If you are playing a familiar piece on the piano, you’re in a flow state, but you’re still being “focused” in a sense; you’re noticing the music more than you’re noticing conversation in another room, you’re playing this piece rather than any other, you’re sitting uninterrupted at the piano rather than multitasking. Effortless flow can be extremely selective and hyper-focused (like playing the piano), just as much as it can be diffuse, responsive, and easily interruptible (like navigating a crowded room). It’s not the size of your window of salience that distinguishes flow from effortful attention, it’s the pressure that you apply to that window.
Psychologists often call effortful attention cognitive disfluency, and find that experiences of disfluency (such as a difficult-to-read font) improve syllogistic reasoning and reduce reliance on heuristics, while making people more likely to make abstract generalizations. Disfluency improves results on measures of “careful thinking” like the Cognitive Reflection Test as well as on real-world high-school standardized tests, and also makes people less likely to confess embarrassing information on the internet. In other words, disfluency makes people “think before they act.” Disfluency raises heart rate and blood pressure, just like exercise, and people report it as being difficult and reliably disprefer it to cognitive ease. The psychology research seems consistent with there being such a thing as “thinking hard.” Effortful attention occupies a minority of our time, but it’s prominent in the most specifically “intellectual” tasks, from solving formal problems on paper to making prudent personal decisions.
What does it mean, on a neurological or a computational level, to expend mental effort? What, precisely, are we doing when we “try hard”? I think it might be an open question.
Do the neural networks of today simulate an agent in a state of “effortless flow” or “effortful attention”, or both or neither? My guess would be that deep neural nets and reinforcement learners are generally doing effortless flow, because they excel at the tasks that we generally do in a flow state (pattern recognition and motor learning.)
Explicit vs. Implicit
Dreyfus, as an opponent of the Representational Theory of Mind, believes that (most of) cognition is not only not based on a formal system, but not in principle formalizable. He thinks you couldn’t possibly write down a theory or a set of rules that explain what you’re doing when you drive a car, even if you had arbitrary amounts of information about the brain and human behavior and arbitrary amounts of time to analyze them.
This distinction seems to include the distinctions of “declarative vs. procedural knowledge”, “know-what vs. know-how”, savoir vs. connaître. We can often do, or recognize, things that we cannot explain.
I think this issue is related to the issue of interpretability in machine learning; the algorithm executes a behavior, but sometimes it seems difficult or impossible to explain what it’s doing in terms of a model that’s simpler than the whole algorithm itself.
The seminal 2001 article by Leo Breiman, “Statistical Modeling: The Two Cultures” and Peter Norvig’s essay “On Chomsky and the Two Cultures of Statistical Learning” are about this issue. The inverse square law of gravitation and an n-gram Markov model for predicting the next word in a sentence are both statistical models, in some sense; they allow you to predict the dependent variable given the independent variables. But the inverse square law is interpretable (it makes sense to humans) and explanatory (the variables in the model match up to distinct phenomena in reality, like masses and distances, and so the model is a relationship between things in the world.)
Modern machine learning models, like the n-gram predictor, have vast numbers of variables that don’t make sense to humans and don’t obviously correspond to things in the world. They perform well without being explanations. Statisticians tend to prefer parametric models (which are interpretable and sometimes explanatory) while machine-learning experts use a lot of non-parametric models, which are complex and opaque but often have better empirical performance. Critics of machine learning argue that a black-box model doesn’t bring understanding, and so is the province of engineering rather than science. Defenders, like Norvig, flip open a random issue of Science and note that most of the articles are not discovering theories but noting observations and correlations. Machine learning is just another form of pattern recognition or “modeling the world”, which constitutes the bulk of scientific work today.
These are heuristic descriptions; these essays don’t make explicit how to test whether a model is interpretable or not. I think it probably has something to do with model size; is the model reducible to one with fewer parameters, or not? If you think about it that way, it’s obvious that “irreducibly complex” models, of arbitrary size, can exist in principle — you can just build simulated data sets that fit them and can’t be fit by anything simpler.
How much of human thought and behavior is “irreducible” in this way, resembling the huge black-box models of contemporary machine learning? Plausibly a lot. I’m convinced by the evidence that visual perception runs on something like convolutional neural nets, and I don’t expect there to be “simpler” underlying laws. People accumulate a lot of data and feedback through life, much more than scientists ever do for an experiment, so they can “afford” to do as any good AI startup does, and eschew structured models for open-ended, non-insightful ones, compensating with an abundance of data.
Subject-Object vs. Relational
This is a concept in Dreyfus that I found fairly hard to pin down, but the distinction seems to be operating upon the world vs. relating to the world. When you are dealing with raw material — say you are a potter with a piece of clay — you think of yourself as active and the clay as passive. You have a goal (say, making a pot) and the clay has certain properties; how you act to achieve your goal depends on the clay’s properties.
By contrast, if you’re interacting with a person or an animal, or even just an object with a UI, like a stand mixer, you’re relating to your environment. The stand mixer “lets you do” a small number of things — you can change attachments or speeds, raise the bowl up and down, remove the bowl, fill it with food or empty it. You orient to these affordances. You do not, in the ordinary process of using a stand mixer, think about whether you could use it as a step-stool or a weapon or a painting tool. (Though you might if you are a child, or an engineer or an artist.) Ordinarily you relate in an almost social, almost animist, way to the stand mixer. You use it as it “wants to be used”, or rather as its designer wants you to use it; you are “playing along” in some sense, being receptive to the external intentions you intuit.
And, of course, when we are relating to other people, we do much stranger and harder-to-describe things; we become different around them, we are no longer solitary agents pursuing purely internally-driven goals. There is such a thing as becoming “part of a group.” There is the whole messy business of culture.
For the most part, I don’t think machine-learning models today are able to do either subject-object or relational thinking; the problems they’re solving are so simple that neither paradigm seems to apply. “Learn how to work a stand mixer” or “Figure out how to make a pot out of clay” both seem beyond the reach of any artificial intelligence we have today.
Aware vs. Unaware
This is the difference between sight and blindsight. It’s been shown that we can act on the basis of information that we don’t know we have. Some blind people are much better than chance at guessing where a visual stimulus is, even though they claim sincerely to be unable to see it. Being primed by a cue makes blindsight more accurate — in other words, you can have attention without awareness.
Anosognosia is another window into awareness; it is the phenomenon when disabled people are not aware of their deficits (which may be motor, sensory, speech-related, or memory-related.) In unilateral neglect, for instance, a stroke victim might be unaware that she has a left side of her body; she won’t eat the left half of her plate, make up the left side of her face, etc. Sensations may still be possible on the left side, but she won’t be aware of them. Squirting cold water in the left ear can temporarily fix this, for unknown reasons.
Awareness doesn’t need to be explicit or declarative; we aren’t formalizing words or systems constantly when we go through ordinary waking life. It also doesn’t need to be effortful attention; we’re still aware of the sights and sounds that enter our attention spontaneously.
Efference copy signals seem to provide a clue to what’s going on in awareness. When we act (such as to move a limb), we produce an “efference copy” of what we expect our sensory experience to be, while simultaneously we receive the actual sensory feedback. “This process ultimately allows sensory reafferents from motor outputs to be recognized as self-generated and therefore not requiring further sensory or cognitive processing of the feedback they produce.” This is what allows you to keep a ‘still’ picture of the world even though your eyes are constantly moving, and to tune out the sensations from your own movements and the sound of your own voice.
Schizophrenics may be experiencing a dysfunction of this self-monitoring system; they have “delusions of passivity or thought insertion” (believing that their movements or thoughts are controlled from outside) or “delusions of grandeur or reference” (believing that they control things with their minds that they couldn’t possibly control, or that things in the outside world are “about” themselves when they aren’t.) They have a problem distinguishing self-caused from externally-caused stimuli.
We’re probably keeping track, somewhere in our minds, of things labeled as “me” and “not me” (my limbs are part of me, the table next to me is not), sensations that are self-caused and externally-caused, and maybe also experiences that we label as “ours” vs. not (we remember them, they feel like they happened to us, we can attest to them, we believe they were real rather than fantasies.)
It might be as simple as just making a parallel copy of information labeled “self,” as the efference-copy theory has it. And (probably in a variety of complicated and as-yet-unknown ways), our brains treat things differently when they are tagged as “self” vs. “other.”
Maybe when experiences are tagged as “self” or labeled as memories, we are aware that they are happening to us. Maybe we have a “Cartesian theater” somewhere in our brain, through which all experiences we’re aware of pass, while the unconscious experiences can still affect our behavior directly. This is all speculation, though.
I’m pretty sure that current robots or ML systems don’t have any special distinction between experiences inside and outside of awareness, which means that for all practical purposes they’re always operating on blindsight.
Relationships and Corollaries
I think that, in order of the proportion of ordinary neurotypical adult life they take up, awareness > effortful attention > explicit systematic thought. When you look out the window of a train, you are aware of what you see, but not using effortful attention or thinking systematically. When you are mountain-climbing, you are using effortful attention, but not thinking systematically very much. When you are writing an essay or a proof, you are using effortful attention, and using systematic thought more, though perhaps not exclusively.
I think awareness, in humans, is necessary for effortful attention, and effortful attention is usually involved in systematic thought. (For example, notice how concentration and cognitive disfluency improve the ability to generalize or follow reasoning principles.) . I don’t know whether those necessary conditions hold in principle, but they seem to hold in practice.
Which means that, since present-day machine-learners aren’t aware, there’s reason to doubt that they’re going to be much good at what we’d call reasoning.
I don’t think classic planning algorithms “can reason” either; they’re hard-coding in the procedures they follow, rather than generating those procedures from simpler percepts the way we do. It seems like the same sort of misunderstanding as it would be to claim a camera can see.
(As I’ve said before, I don’t believe anything like “machines will never be able to think the way we do”, only that they’re not doing so now.)
The Weirdness of Thinking on Purpose
It’s popular these days to “debunk” the importance of the “intellect” side of “intellect vs. instinct” thinking. To point out that we aren’t always rational (true), are rarely thinking effortfully or explicitly (also true), can’t usually reduce our cognitive processes to formal systems (also true), and can be deeply affected by subconscious or subliminal processes (probably true).
Frequently, this debunking comes with a side order of sneer, whether at the defunct “Enlightenment” or “authoritarian high-modernist” notion that everything in the mind can be systematized, or at the process of abstract/deliberate thought itself and the people who like it. Jonathan Haidt’s lecture on “The Rationalist Delusion” is a good example of this kind of sneer.
The problem with the popular “debunking reason” frame is that it distracts us from noticing that the actual process of reasoning, as practiced by humans, is a phenomenon we don’t understand very well yet. Sure, Descartes may have thought he had it all figured out, and he was wrong; but thinking still exists even after you have rejected naive rationalism, and it’s a mistake to assume it’s the “easy part” to understand. Deliberative thinking, I would guess, is the hard part; that’s why the cognitive processes we understand best and can simulate best are the more “primitive” ones like sensory perception or motor learning.
I think it’s probably better to think of those cognitive processes that distinguish humans from animals as weird and mysterious and special, as “higher-level” abilities, rather than irrelevant and vestigial “degenerate cases”, which is how Heidegger seems to see them. Even if the “higher” cognitive functions occupy relatively little time in a typical day, they have outsize importance in making human life unique.
Two weirdly similar quotes:
“Three quick breaths triggered the responses: he fell into the floating awareness… focusing the consciousness… aortal dilation… avoiding the unfocused mechanism of consciousness… to be conscious by choice… blood enriched and swift-flooding the overload regions… one does not obtain food-safety freedom by instinct alone… animal consciousness does not extend beyond the given moment nor into the idea that its victims may become extinct… the animal destroys and does not produce… animal pleasures remain close to sensation levels and avoid the perceptual… the human requires a background grid through which to see his universe… focused consciousness by choice, this forms your grid… bodily integrity follows nerve-blood flow according to the deepest awareness of cell needs… all things/cells/beings are impermanent… strive for flow-permanence within…”
–Frank Herbert, Dune, 1965
“An animal’s consciousness functions automatically: an animal perceives what it is able to perceive and survives accordingly, no further than the perceptual level permits and no better. Man cannot survive on the perceptual level of his consciousness; his senses do not provide him with an automatic guidance, they do not give him the knowledge he needs, only the material of knowledge, which his mind has to integrate. Man is the only living species who has to perceive reality, which means: to be conscious — by choice. But he shares with other species the penalty for unconsciousness: destruction. For an animal, the question of survival is primarily physical; for man, primarily epistemological.
“Man’s unique reward, however, is that while animals survive by adjusting themselves to their background, man survives by adjusting his background to himself. If a drought strikes them, animals perish — man builds irrigation canals; if a flood strikes them, animals perish — man builds dams; if a carnivorous pack attacks them animals perish — man writes the Constitution of the United States. But one does not obtain food, safety, or freedom — by instinct.”
–Ayn Rand, For the New Intellectual, 1963
(bold emphasis added, ellipses original).
“Conscious by choice” seems to be pointing at the phenomenon of effortful attention, while “the unfocused mechanism of consciousness” is more like awareness. There seems to be some intuition here that effortful attention is related to the productive abilities of humanity, our ability to live in greater security and with greater thought for the future than animals do. We don’t usually “think on purpose”, but when we do, it matters a lot.
We should be thinking of “being conscious by choice” more as a sort of weird Bene Gesserit witchcraft than as either the default state or as an irrelevant aberration. It is neither the whole of cognition, nor is it unimportant — it is a special power, and we don’t know how it works.