(Not an expert—just quickly writing down where I’m at, after a few hours of research. Happy to chat more in the comments!)

I recently learned (thanks @aysja!) that there’s a thing called “FOXP2-related speech and language disorder”. Here’s the backstory in brief:

…a group at Oxford University led by Simon Fisher identified a locus, or region, on chromosome 7 associated with a severe speech impediment called verbal dyspraxia through the study of one British family, in which the speech disorder was highly prevalent across at least three generations. The disorder is dominantly inherited and results in an inability to articulate basic sounds and syllables. source

The group published their discovery in a 2001 Nature paper. The disorder, as it turned out, was associated with a variation in a gene that codes for a protein called FOXP2.

People (appropriately) are very interested in the fact that humans have language in a way that non-human animals don’t. So when people saw that this FOXP2 variant pretty specifically impacts human speech (or so it appears! see below), they immediately rushed off to try to leverage that discovery into a better understanding of how human language works, and how it evolved, producing a flurry of follow-up research. For example, does FOXP2 variation across the great apes correlate with how they communicate? (maybe?) Was there selection for FOXP2 among early humans, Neanderthals, etc.? (latest study says “apparently not”?) Do mice squeak differently, and do birds chirp differently, when you edit their FOXP2? (yes!)

What’s the mechanism relating FOXP2 to speech?

Now, it would be neat if I could tell you a story like: “The part of the brain responsible for speech is (blah blah), and the FOXP2 protein is a specific essential component of neurons that are exclusively used in that part of the brain.”

But alas, that story is not true.

For one thing, the FOXP2 protein is not directly a component of a machine; instead it functions as a transcription factor, i.e. a protein that affects which other types of proteins are or aren’t manufactured in a cell.

For another thing, FOXP2 is expressed all around the brain and body. If you genetically-manipulate mice to have no FOXP2, you don’t get otherwise-healthy mice that fail to squeak. Instead, you get mice that die shortly after birth, with severe defects in their lungs and esophagus, among other things (source).

So, if the above is not the right story, then what is?

After skimming some studies (especially of birds, whose songs are impacted by FOXP2, e.g. here), my impression is that the balance of current evidence points towards the striatum as the main culprit. (The second-most-likely culprit—well, they’re not mutually exclusive—would be the cerebellum, but given how I think about the cerebellum, it would hardly change the following paragraphs. So I’ll just assume striatum for simplicity.)

The striatum is the largest part of the basal ganglia. As background (this is oversimplified & somewhat controversial), I would say that the striatum implements a learning algorithm, that this algorithm involves dopamine as an error signal, and that the striatum (in this context) is trained to assess whether the cortex is issuing reward-maximizing motor-control outputs right now, and if it seems not to be, the striatum nudges the cortex to do something different.

So anyway, here are two hypotheses:

  • HYPOTHESIS 1: This FOXP2 variant specifically messes up a subregion of the striatum involved in speech.
  • HYPOTHESIS 2: This FOXP2 variant messes up the whole striatum, but as it turns out, the effects are only noticeable in the case of speech.

I’m pretty fond of Hypothesis 2 right now. Let me elaborate.

ELABORATION OF HYPOTHESIS 2: This FOXP2 variant messes up the whole striatum in a way that makes it generally (marginally) less able to learn to assess / critique extremely fast & complicated & intricate & precise motor control tasks. And as it turns out (according to this hypothesis), the motor control associated with speech is by far the most fast & complicated & intricate & precise motor control task that a human routinely does. So if striatal motor control is slightly messed up, speech ability will be majorly impacted while other aspects of motor control will be only slightly impacted, if at all.

As an analogy: if a car engine is in pretty bad shape, such that the car’s top speed is 80km/h (50mph), you won’t notice it on a neighborhood street, but it will be very obvious on the German Autobahn.

Possible evidence for & against Hypothesis 2:

Is non-speech-related fine motor control impacted at all in FOXP2-related speech & language disorder? There seems to be at least some evidence for “yes”, e.g. “fine motor-skill deficits have been reported in some individuals with FOXP2 disruptions” (source), or “Fine motor skills may be impaired (e.g., buttoning clothes, tying shoelaces), yet gross motor skills (walking, running) are normal” (source).   

Is it actually true that speech is by far the most fast & complicated & intricate & precise motor control task that a human routinely does? I mean, I think so? But I’m not sure how to prove or quantify that.

What about chewing? Ah, good point. It does seem true to me that chewing entails some fancy tongue gymnastics etc., although I don’t know how it compares to speech. However, according to my Hypothesis 2, the FOXP2 thing is a forebrain issue, and I believe that human speech is totally reliant on forebrain motor control, whereas chewing is a mostly-brainstem activity that the forebrain helps with but doesn’t micromanage.

What about receptive language deficits? I note that “Expressive language is more severely affected than receptive language” according to here. But there does seem to be a consensus that receptive language is impacted to some extent. My guess here is: (1) FOXP2 impacts expressive language, and then (2) expressive language problems have a knock-on effect on receptive language. I can think of a few possible mechanisms of (2), including:

  • If the kid is struggling with sounds and articulation, they’ll be paying more attention to that, and less to grammar etc.
  • Maybe the supposed receptive language deficits are somewhat illusory, and really reflecting things like motivation and testing. More specifically, I think that expressive language problems can easily flow into things like “The kid is less likely to comply with the receptive language test, and observers tend to mistake ‘unwillingness to answer a question’ for ‘inability to answer a question’”. I have some bitter personal experience here.
  • People with expressive language problems will generally get fewer hours of practice per day on receptive language skills, because a major source of such practice is “engaging in conversation”, an activity that they would presumably find frustrating.

Conclusion

So, Hypothesis 2 is my working hypothesis right now, but again, this is just where I’m at after a few hours of research. Happy to chat in the comments!

Addendum May 2023

Hey, I just noticed a paragraph from a neuroscience book I like (Murray, Wise, Graham, Evolution of Memory Systems) that came to a similar conclusion as me:

Regardless of its origins, the era of exaggerated claims about the FOXP2 gene seems to be nearly over, at long last. It is not a specific “language gene,” as celebrated in popular science. For example, in one famous family a mutation in the FoxP2 gene affected both speech and other aspects of orofacial coordination[62]. Brain imaging results in these individuals pointed to structural and activation abnormalities in the dorsal striatum, which expresses this gene. These striatal defects appear to affect coordinated movement sequences generally, not just those for speech. However, the speech impairments are particularly severe, which could reflect any of several factors: a greater requirement for coordination; the fact that the effectors used for speech have more degrees of freedom than other effectors; or the need to integrate auditory feedback into precisely timed articulatory gestures[62]. The latter possibility agrees with the idea that the fundamental function of the basal ganglia involves the use of feedback to adjust ongoing behavior (see Chapter 12, “If not habits, what?”).

(I had no memory of that paragraph when I came across it just now, but I did previously read the book years earlier than I wrote this blog post, so it’s possible that I was subconsciously plagiarizing, as opposed to independently coming to the same conclusion.)

New Comment
8 comments, sorted by Click to highlight new comments since: Today at 4:54 AM

Speaking as a biologist knowledgeable about transcription factors, I would also favor hypothesis 2.

I would also add that the idea of there being a "language gene" is quite outdated. Complex traits such as language are highly polygenic.

Interesting! Speaking as a person not remotely knowledgeable about transcription factors, I’d be interested if you could elaborate on that first sentence, if you get a chance.  :)

Basically if FOXP2 is mutated and functionally impaired, then it's reasonable to assume all the cells that express FOXP2 will be affected. FOXP2 is expressed in the whole striatum (and other parts of the brain too) not just regions involved in speech. See: https://academic.oup.com/brain/article/126/11/2455/403806

Sounds plausible but this article is evidence against the striatum hypothesis: Region-specific Foxp2 deletions in cortex, striatum or cerebellum cannot explain vocalization deficits observed in spontaneous global knockouts


In short, they edited mice to have Foxp2 deleted in only specific regions of the brain, one of them being striatum. But those mice didn't have the 'speech' defects that mice with whole-body Foxp2 knock-outs showed. So Foxp2's action outside of the striatum seems to play a role. They didn't do a striatum+cerebellum knock-out, though, so it could still be those two jointly (but not individually) causing the problem.

Interesting!!

That paper repeatedly does the annoyingly-common thing of conflating “our tests didn’t find any group differences that pass the p<0.05 threshold” with “our tests positively confirm that there are no group differences”. 😠 I’m not saying they’re necessarily wrong about that, I’m just complaining.

No, actually, I will complain, and they are wrong. Their tests show that, in global knockouts (well, “spontaneous deletion”, but I think it amounts to the same thing?), ultrasonic vocalizations (USV) go significantly down while “click” vocalizations go significantly up. So at that point, one would think that they should do 1-sided t-tests on the region-specific knockouts to see if USVs go down and if clicks go up. But instead they did 1-sided tests to see if USVs go down and if clicks go down. And the clicks actually went up (who would have guessed!), which they reported as “click rates [were not] affected [with] p=0.99”. 🤦🤦🤦

I guess that they were trying to avoid p-hacking by pre-committing to which tests they’d run, which I suppose is admirable, but I still think they were being pretty boneheaded here!! (Unless I misunderstand. I was skimming.)

Anyway, the USV result does indeed look like “no change”, but clicks went up (with “p=0.99” on the incorrectly-oriented 1-sided t-test, which really means p=0.01) for both the Purkinje- & striatum-specific knockouts. (Not cortex.)

That still leaves the USV question. The hypothesis that FOXP2 impacts USV squeaking in mice via the lungs still seems to me like a live possibility, in which case FOXP2→mouse-USV-squeaking would be totally disanalogous to FOXP2→human-speech, I think, i.e. just a funny coincidence. Hmm. The striatum+cerebellum interaction thing you mentioned is also possible AFAIK.

(I have somewhat more confidence that FOXP2-affects-bird-vocalization ↔ FOXP2-affects-human speech is mechanistically analogous, than that FOXP2-affects-mouse-vocalization ↔ FOXP2-affects-human speech is mechanistically analogous. I think the human-vs-bird symptoms are more closely related, not just “hey in both cases it has something to do with vocalizations”. This might be wrong though, I didn’t double-check.)

Human speech and bird song are both cases of vocal learning. They are (at least) largely learned, more complex and require more precision control, whereas mouse vocalizations are mostly hardwired https://en.wikipedia.org/wiki/Vocal_learning?wprov=sfla1

Is it actually true that speech is by far the most fast & complicated & intricate & precise motor control task that a human routinely does? I mean, I think so? But I’m not sure how to prove or quantify that

Hmm, I think it is but anecdotally speaking it seems to me that you can be quite clumsy, have bad coordination etc bad no speech impairment

There was a study where mice were engineered to have human alleles of foxp2 and grew longer dendrites that made them faster learner IIRC. Heard about this in Dehaene's book about consciousness.

Also you might be interested in reading about the olduvai domain.

Sorry for not linking i'm on mobile.