Human instincts, symbol grounding, and the blank-slate neocortex

by steve2152 18d2nd Oct 20194 comments


Intro: What is Common Cortical Algorithm (CCA) theory, and why does it matter for AGI?

As I discussed at Jeff Hawkins on neuromorphic AGI within 20 years, and was earlier discussed on LessWrong at The brain as a universal learning machine, there is a theory, due originally to Vernon Mountcastle in the 1970s, that the neocortex (75% of the human brain) consists of ~150,000 interconnected copies of a little module, the "cortical column", each of which implements the same algorithm. Following Jeff Hawkins, I'll call this the "common cortical algorithm" (CCA) theory. (I don't think that terminology is standard.)

So instead of saying that the human brain has a vision processing algorithm, motor control algorithm, language algorithm, planning algorithm, and so on, in CCA theory we say that (to a first approximation) we have a massive amount of "general-purpose neocortical tissue", and if you dump visual information into that tissue, it does visual processing, and if you connect that tissue to motor control pathways, it does motor control, etc.

Whether and to what extent CCA theory is true is, I think, very important for AGI forecasting, strategy, and both technical and non-technical safety research directionssee my answer here for more details.

Should we believe CCA theory?

CCA theory, as I'm using the term, is a simplified model. There are almost definitely a couple caveats to it:

  1. There are sorta "hyperparameters" on the generic learning algorithm which seem to be set differently in different parts of the neocortex. For example, some areas of the cortex have higher or lower density of particular neuron types. I don't think this significantly undermines the usefulness or correctness of CCA theory, as long as these changes really are akin to hyperparameters, as opposed to specifying fundamentally different algorithms. So my reading of the evidence is that if you put, say, motor nerves coming out of visual cortex tissue, the tissue could do motor control, but it wouldn't do it quite as well as the motor cortex does.[1]
  2. There is almost definitely a gross wiring diagram hardcoded in the genome—i.e., set of connections between different neocortical regions and each other, and other parts of the brain. These connections later get refined and edited during learning. Again, we can ask how much the existence of this innate gross wiring diagram undermines CCA theory. How complicated is the wiring diagram? Is it millions of connections among thousands of tiny regions, or just tens of connections among a few regions? Would the brain work at all if you started with a random wiring diagram? I don't know for sure, but for various reasons, my current belief is that this initial gross wiring diagram is not carrying much of the weight of human intelligence, and thus that this point is not a significant problem for the usefulness of CCA theory.

Going beyond these caveats, I found pretty helpful literature reviews on both sides of the issue:

  • The experimental evidence for CCA theory: see chapter 5 of Rethinking Innateness (1996)
  • The experimental evidence against CCA theory: see chapter 5 of The Blank Slate by Steven Pinker (2002).

I won't go through the debate here, but after reading both of those I wound up feeling that CCA theory (with the caveats above) is probably right, though not 100% proven. Please comment if you've seen any other good references on this topic, especially more up-to-date ones.

CCA theory vs human-universal traits and instincts

The main topic for this post is:

If Common Cortical Algorithm theory is true, then how do we account for all the human-universal instincts and behaviors that evolutionary psychologists talk about?

Indeed, we know that there are a diverse set of remarkably specific human instincts and mental behaviors evolved by natural selection. Again, Steven Pinker's The Blank Slate is a popularization of this argument; it ends with Donald E. Brown's giant list of "human universals", i.e. behaviors that are observed in every human culture.

Now, 75% of the human brain is the neocortex, but the other 25% consists of various subcortical ("old-brain") structures like the amygdala, and these structures are perfectly capable of implementing specific instincts. But these structures do not have access to an intelligent world-model—only the neocortex does! So how can the brain implement instincts that require intelligent understanding? For example, maybe the fact that "Alice got two cookies and I only got one!" is represented in the neocortex as the activation of neural firing pattern 7482943. There's no obvious mechanism to connect this arbitrary, learned pattern to the "That's so unfair!!!" section of the amygdala. The neocortex doesn't know about unfairness, and the amygdala doesn't know about cookies. Quite a conundrum!

This is really a symbol grounding problem, which is the other reason this post is relevant to AI alignment. When the human genome builds a human, it faces the same problem as a human programmer building an AI: how can one point a goal system at things in the world, when the internal representation of the world is a complicated, idiosyncratic, learned data structure? As we wrestle with the AI goal alignment problem, it's worth studying what human evolution did here.

List of ways that human-universal instincts and behaviors can exist despite CCA theory

Finally, the main part of this post. I don't know a complete answer, but here are some of the categories I've read about or thought of, and please comment on things I've left out or gotten wrong!

Mechanism 1: Simple hardcoded connections, not implemented in the neocortex

Example: Enjoying the taste of sweet things. This one is easy. I believe the nerve signals coming out of taste buds branch, with one branch going to the cortex to be integrated into the world model, and another branch going to subcortical regions. So the genes merely have to wire up the sweetness taste buds to the good-feelings subcortical regions.

Mechanism 2: Subcortex-supervised learning.

Example: Wanting to eat chocolate. This is different than the previous item because "sweet taste" refers to a specific innate physiological thing, whereas "chocolate" is a learned concept in the neocortex's world-model. So how do we learn to like chocolate? Because when we eat chocolate, we enjoy it (Mechanism 1 above). The neocortex learns to predict a sweet taste upon eating chocolate, and thus paints the world-model concept of chocolate with a "sweet taste" property. The supervisory signal is multidimensional, such that the neocortex can learn to paint concepts with various labels like "painful", "disgusting", "comfortable", etc., and generate appropriate behaviors in response. (See the DeepMind paper Prefrontal cortex as a meta-reinforcement learning system for a more specific discussion along these lines.)

Mechanism 3: Same learning algorithm + same world = same internal model

Possible example: Intuitive biology. In The Blank Slate you can find a discussion of intuitive biology / essentialism, which "begins with the concept of an invisible essence residing in living things, which gives them their form and powers." Thus preschoolers will say that a dog altered to look like a cat is still a dog, yet a wooden toy boat cut into the shape of a toy car has in fact become a toy car. I think we can account for this very well by saying that everyone's neocortex has the same learning algorithm, and when they look at plants and animals they observe the same kinds of things, so we shouldn't be surprised that they wind up forming similar internal models and representations. I found a paper that tries to spell out how this works in more detail; I don't know if it's right, but it's interesting: free link, official link.

Mechansim 4: Human-universal memes

Example: Fire. I think this is pretty self-explanatory. People learn about fire from each other. No need to talk about neurons, beyond the more general issues of language and social learning discussed below.

Mechanism 5: "Two-process theory"

Possible example: Innate interest in human faces.[2] The subcortex-supervised learning mechanism above (Mechanism 2) can be thought of more broadly as an interaction between a hardwired subcortical system that creates a "ground truth", and a cortical learning algorithm that then learns to relate that ground truth to its complex internal representations. Here, Johnson's "two-process theory" for faces fits this same mold, but with a more complicated subcortical system for ground truth. In this theory, a subcortical system gets direct access to a low-resolution version of the visual field, and looks for a pattern with three blobs in locations corresponding to the eyes and mouth of a blurry face. When it finds such a pattern, it passes information to the cortex that this is a very important thing to attend to, and over time the cortex learns what faces actually look like (and suppresses the original subcortical template circuitry). Anyway, Johnson came up with this theory partly based on the observation that newborns are equally entranced by pictures of three blobs versus actual faces (each of which were much more interesting than other patterns), but after a few months the babies were more interested in actual face pictures than the three-blob pictures. (Not sure what Johnson would make of this twitter account.)

(Other possible examples of instincts formed by two-process theory: fear of snakes, interest in human speech sounds, sexual attraction.)

Mechanism 6: Time-windows

Examples: Filial imprinting in animals, incest repulsion (Westermarck effect) in humans. Filial imprinting is a famous result where newborn chicks (and many other species) form a permanent attachment to the most conspicuous moving object that they see in a certain period shortly after hatching. In nature, they always imprint on their mother, but in lab experiments, chicks can be made to imprint on a person, or even a box. As with other mechanisms here, time-windows provides a nice solution to the symbol grounding problem, in that the genes don't need to know what precise collection of neurons corresponds to "mother", they only need to set up a time window and a way to point to "conspicuous moving objects", which is presumably easier. The brain mechanism of filial imprinting has been studied in detail for chicks, and consists of the combination of time-windows plus the two-process model (mechanism 5 above). In fact, I think the two-process model was proven in chick brains before it was postulated in human brains.

There likewise seem to be various time-window effects in people, such as the Westermarck effect, a sexual repulsion between two people raised together as young children (an instinct which presumably evolved to reduce incest).

Mechanism 7 (speculative): empathetic grounding of intuitive psychology.

Possible example: Social emotions (gratitude, sympathy, guilt,...) Again, the problem is that the neocortex is the only place with enough information to, say, decide when someone slighted you, so there's no "ground truth" to use for subcortex-supervised learning. At first I was thinking that the two-process model for human faces and speech could be playing a role, but as far as I know, deaf-blind people have the normal suite of social emotions, so that's not it either. I looked in the literature a bit and couldn't find anything helpful. So, I made up this possible mechanism (warning: wild speculation).

Step 1 is that a baby's neocortex builds a "predicting my own emotions" model using normal subcortex-supervised learning (Mechanism 2 above). Then a normal Hebbian learning mechanism makes two-way connections between the relevant subcortical structures (amygdala) and the cortical neurons involved in this predictive model.

Step 2 is that the neocortex's universal learning algorithm will, in the normal course of development, naturally discover that this same "predicting my own emotions" model from step 1 can be reused to predict other people's emotions (cf. Mechanism 3 above), forming the basis for intuitive psychology. Now, because of those connections-to-the-amygdala mentioned in step 1, the amygdala is incidentally getting signals from the neocortex when the latter predicts that someone else is angry, for example.

Step 3 is that the amygdala (and/or neocortex) somehow learns the difference between the intuitive psychology model running in first-person mode versus empathetic mode, and can thus generate appropriate reactions, with one pathway for "being angry" and a different pathway for "knowing that someone else is angry".

So let's now return to my cookie puzzle above. Alice gets two cookies and I only get one. How can I feel it's unfair, given that the neocortex doesn't have a built-in notion of unfairness, and the amygdala doesn't know what cookies are? The answer would be: thanks to subcortex-supervised learning, the amygdala gets a message that one yummy cookie is coming, but the neocortex also thinks "Alice is even happier", and that thought also recruits the amygdala, since intuitive psychology is built on empathetic modeling. Now the amygdala knows that I'm gonna get something good, but that Alice is gonna get something even better, and that combination (in the current emotional context) triggers the amygdala to send out waves of jealousy and indignation. This is then a new supervisory signal for the neocortex, which allows the neocortex to gradually develop a model of fairness, which in turn feeds back into the intuitive psychology module, and thereby back to the amygdala, allowing the amygdala to execute more complicated innate emotional responses in the future, and so on.

The special case of language.

It's tempting to put language in the category of memes (mechanism 4 above)—we do generally learn language from each other—but it's not really, because apparently groups of kids can invent grammatical languages from scratch (e.g. Nicaraguan Sign Language). My current guess is that it combines three things: (1) a two-process mechanism (Mechanism 5 above) that makes people highly attentive to human speech sounds. (2) possibly "hyperparameter tuning" in the language-learning areas of the cortex, e.g. maybe to support taller compositional hierarchies than would be required elsewhere in the cortex. (3) The fact that language can sculpt itself to the common cortical algorithm rather than the other way around—i.e., maybe "grammatical language" is just another word for "a language that conforms to the types of representations and data structures that are natively supported by the common cortical algorithm".

By the way, lots of people (including Steven Pinker) seem to argue that language processing is a fundamentally different and harder task than, say, visual processing, because language requires symbolic representations, composition, recursion, etc. I don't understand this argument; I think vision processing needs the exact same things! I don't see a fundamental difference between the visual-processing system knowing that "this sheet of paper is part of my notebook", and the grammatical "this prepositional phrase is part of this noun phrase". Likewise, I don't see a difference between recognizing a background object interrupted by a foreground occlusion, versus recognizing a noun phrase interrupted by an interjection. It seems to me like a similar set of problems and solutions, which again strengthens my belief in CCA theory.


When I initially read about CCA theory, I didn't take it too seriously because I didn't see how instincts could be compatible with it. But I now find it pretty likely that there's no fundamental incompatibility. So having removed that obstacle, and also read the literature a bit more, I'm much more inclined to believe that CCA theory is fundamentally correct.

Again, I'm learning as I go, and in some cases making things up as I go along. Please share any thoughts and pointers!

  1. The visual cortex actually does do a bit of motor control: it moves the eyeballs. ↩︎

  2. See Rethinking Innateness p116, or better yet Johnson's article ↩︎