Clarification: Behaviourism & Reinforcement

by Zaine2 min read10th Oct 201230 comments


Reinforcement Learning
Personal Blog

Disclaimer: The following is but a brief clarification on what the human brain does when one's behaviour is reinforced or punished. Thorough, exhaustive, and scholarly it is not.

Summary: Punishment, reinforcement, etc. of a behaviour creates an association in the mind of the affected party between the behaviour and the corresponding punishment, reinforcement, etc., the nature of which can only be known by the affected party. Take care when reinforcing or punishing others, as you may be effecting an unwanted association.

I've noticed the behaviourist concept of reinforcement thrown around a great deal on this site, and am worried a fair number of those who frequent it develop a misconception or are simply ignorant of how reinforcement affects humans' brains, and why it is practically effective.

In the interest of time, I'm not going to go into much detail on classical black-box behaviourism and behavioural neuroscience; Luke already covered the how one can take advantage of positive reinforcement. Negative reinforcement and punishment are also important, but won't be covered here.


The Couple has a three year old son. They are worried their son ingests too few and poorly varied micronutrients. They want their son to become a world conqueror someday, so a poor diet just won't do. Their son loves DragonBox. Every time they all sit down for a meal, their son will barely consume anything, so eager is he to play more DragonBox.

He also very much likes Dragons.

The Couple decides in favor of serving their son a balanced meal consisting solely of micronutrient dense, bioabsorptive foodstuffs. They tell him, "Spawn, you must fletcherise and swallow all that is on your plate if you wish to play more DragonBox." The son understands, and acts accordingly; he eats all that is put before him so he may DragonBox.

What happens in the mind of The Couple's son when he is told thus? His brain creates a new association, or connexion, of concepts[2]; in this case, 'eating all that is put before me' becomes associated with 'more DragonBox!' Perhaps, though, he associates 'more DragonBox!' with a different concept: 'eating green things', say, or 'eating brown things'.

In other words, one can never be certain of the precise association another creates when they are reinforced or punished.


I think I can make this explanation clearer.

Once a man named Watson wanted to investigate whether he could make a child fear something the child would otherwise not fear. He took a fluffy white rat, put it in front of a baby, Little Albert, and created a loud resounding metallic gong of a noise. After a while, Little Albert came to associate 'loud scary noise' with 'fluffy white thing' - not 'fluffy white rat'. Afterwards, when presented with a fluffy white bunny, dog, and even cotton balls, he displayed a fear response.[1]

Humans are constantly creating associations between anything that can be conceptualised - the color indigo, Herrenhausen, toothpick collecting - anything. When one is forced to link one concept to another, through any means, an association is created; one can never know with certainty the nature of another's association.

Be careful when using reinforcement and punishment on others. Ever be diligent.


[1] This is called a generalised response in psychology; look into fear learning of the amygdala if keen.

[2] Vide

[3] Richard Kennaway provided a quote epitomising this concern, and further elaborated upon it admirably.


30 comments, sorted by Highlighting new comments since Today at 10:59 PM
New Comment

Also relevant.

ETA: And it is a cliché that teenagers react to this sort of thing, not by dutifully doing what the behaviorist parents think they are "reinforcing", but by becoming surly and rebellious. IMHO, the whole reinforcement thing is a crock.

BTW, I see the large letters "~" in the middle of the article. Some sort of font encoding problem?

When "don't perform the trainer-desired behavior, go buy the reward at the store yourself" becomes an available behavior, it can get reinforced pretty fast.

What does that statement mean? What role does "reinforcement" play in that? Reinforcement, in behavioural theory, takes repetition to develop. How does it account for a person doing something at the first opportunity to do it?

Newly autonomous kid happens to be at the store, and to have money for cookies. Ey buys the cookie because of explicit cookie-getting planning. This is rewarded by a delicious cookie, and "buy cookies" is reinforced. Later, ey does homework, and obtains a cookie, which isn't much of a reward because ey just ate one. "Do homework" is extinguished.

This describes a process of positive feedback, which predicts that the child will end up compulsively buying cookies at every opportunity. This is not what is generally observed.

ETA: I guess this might be what is going on in cases of OCD, except that the behaviours there, such as compulsive hand-washing, are generally not of a particularly gratifying kind. OCD does not seem to result from superstimuli.

It is a simple truth of economics that what one rewards, one gets more of.

What is extremely hard to do is determine what exactly one is rewarding when one responds positively to a particular behavior. When my toddler asks to be picked up, is he seeking approval, attention, or escape from whatever task I've just set for him (e.g. "Sit on the potty chair"). If you don't carefully determine the function of a behavior, you won't be able to accurately determine whether your response is reinforcing or punishing. And if you reinforce what you intended to punish, you'll get more of the behavior instead of less.

To put it slightly differently, one might think that the purpose of teenage rebellion is changing the norms. Or the purpose might be to shock others (and thereby obtain attention). What responses will decrease the frequency of of rebellious behavior depends a great deal on which reason is true in a particular case.

It is a simple truth of economics that what one rewards, one gets more of.

Only if you redefine "what one rewards" to be "more of whatever one gets".

Does paying a worker more "make" them work more, or work less?

Yes, a behaviorist defines reinforcement as that which increases the frequency of a behavior.

If increased pay does not cause more work, then what reasonable usage says that increased pay is "reinforcing" of work? Increasing pay might be punishing of the behavior "quit and find another job."

At least among animal trainers, "reinforcement" can refer to things that increase the frequency of a behavior relative to the expected frequency without reinforcement, in addition to relative to the previous frequency. E.g., if I expect to get a behavior less often when I move the animal outside (due to greater distractions), I might bring rewarding treats with me to offset that. The observed frequency of behavior might then be exactly the same between the two trials, but I would still describe the treats as reinforcement.

It seems the same principle applies to increasing pay in order to ensure the same level of work in scenarios where I expect the level of work to otherwise decline (e.g., I expect the animal to quit), so it seems reasonable to me to refer to that increased pay as "reinforcing" of work even if the frequency of work done doesn't increase.

Yes, reinforcement relates to increased frequency of behavior.

I stand by my core assertion that trying to analyze the reinforcement of an intervention before determining the function of the behavior is analytically confused and probably a waste of time.

I stand by my core assertion that trying to analyze the reinforcement of an intervention before determining the function of the behavior is analytically confused and probably a waste of time.

I stand by a different assertion: that trying to predict the response to a disturbance before determining the purpose of the behaviour is analytically confused and certainly a waste of time.

The only difference between us is that you are interpreting simple observations of what people do with a theory that falls apart on close examination, while I am interpreting them with a theory that stands up to close examination.

I'm not sure about the content of our disagreement.

I'm pretty sure that what I mean by "analyze the [amount of] reinforcement of an intervention" is the same as what you mean by "predict the response to a disturbance."

And I'm almost certain that what I mean by "function of the behavior" is the same thing as you mean by "purpose of the behavior."

I certainly think that trying to predict the change in frequency of a behavior after a disturbance/intervention before figuring out the purpose of the behavior is foolish on many levels.

But what does "reinforcement" mean then? The word is a description, not an arbitrary proper name, which has a definite meaning in behaviourist psychology. Unless the supposed reinforcement is actually reinforcing something, within the terms of that theory, it isn't a reinforcement.

Reinforcement and punishment are not what I would call the units of insight of behaviorism. Behaviorism stands for the proposition that behavior modification does not require a self-reflexive cognitive component.

But making those kinds of changes requires a rigorous analysis of the function / purpose of the behavior (escape, attention, etc). The labels "reinforcement" and "punishment" are intend to focus on the key point of behaviorism: Frequency of behavior - without reference to beliefs or feelings - is the only acceptable data.

Once that point is made, I accept that defining "reinforcement" (as stimuli that increase or sustain the frequency of behavior) does not stand on its own as analytically useful in changing behaviors.

Also, stimuli is not the technical word, but I'm not an expert on Applied Behavioral Analysis.

Consider the story I heard during a lecture on ABA:

A patient at an in-patient mental health institution was engaging in "garbage talk" that the care providers wanted to extinguish (don't ask me why this was a priority). They determine that the function of the behavior was attention, and implemented a protocol (i.e. told the staff to stop interacting with the patient when she engaged in garbage talk). This intervention reduced the occurrence of garbage talk around all but one attendant.

When this discrepancy was noticed, the attendant was observed, and "ignoring" was implemented by that attendant as follows:

P: Garbage talk
A: I want to warn you that I'll start ignoring you if you keep doing that.
P: Garbage talk.
A: Ok, I'm going to start ignoring you.
P: Garbage talk.
A: I'm currently ignoring you.

Needless to say, that didn't work at reducing garbage talk. Perhaps if the attendant had worried less about making sure that the patient "understood" the intervention and focused solely on frequency of the behavior, then the attendant wouldn't have made this mistake.

The word "brain" or "brains" occurs three times in the article, yet does not seem to be doing any work.

Are you referring to the diction, or stating there is no description of how associations are created? I created a footnote to address the latter.

Are you referring to the diction, or stating there is no description of how associations are created?

The latter. You are using physical language to describe mental phenomena, but the physical language is not earning its keep. Using "brain" instead of "mind" does not make this neuroscience.

I created a footnote to address the latter.

It answers the question "Where are associations created & what do they look like?" with:

We haven't really found the answer to that yet, to be honest.

You continue (I summarise, emphasis added): Hippocampal damage results in disturbance to memory function, leading to a "general theory that perhaps" ... "recent findings...may indicate glia also have some role to play in cortical functioning" ... "There have been some attempts to identify" ... "fraught with controversy" ... "no consensus" ... "The more one indulges a behaviour or association, the theory then goes, the broader the pathway or the stronger the association becomes."

It all looks terribly vague, and the last sentence describes runaway positive feedback. You can't make a functioning machine on that basis, except by introducing hacks (e.g. "satiation") to prevent it behaving the way the theory says.

You're right - in trying to demonstrate function I extrapolated from OCD findings too much. That much is fixed.

I think explaining how the HPA axis functions proves ample example. Vide.

Do let me know - your input has contributed to vast improvements in the quality of the article.

I'm not sure if the post is even needed; I also worry I may have been confusing. Please let me know.

The Little Albert example really makes the point.

I would judge it as 90% good. Which means: good.

Possible improvement: If the core of the article is less than one screen long, it either does not need a summary, or it needs much shorter summary. The length of summary should be cca 10% of the article length, not 30%. And perhaps (other readers may disagree with this) the summary should be at the end of the article, not at the beginning.

[Perhaps] the summary should be at the end of the article, not at the beginning.

First, thank you for your input. I actually switched the order so some could more easily read the summary if that was all they wanted. I'm very curious in whether the original order was indeed preferable.

I'm very curious in whether the original order was indeed preferable.

For a long text it probably would be. It probably also depends on what exactly is included in the summary: the question you are going to answer, or also the answer to the question? In other words, does the "summary" contain spoilers? But even then, some people prefer spoilers (because that saves time), and some people prefer to read text without spoilers (because they have the time, and want to think about the problem first).

Perhaps the best solution would be: A short question at the beginning, with an information that the answer is at the end. Then, the text body. A short answer at the end. (Just my opinion.)

I tried shortening the summary, but ended up breaking down the second sentence into a subordinate clause of the first; hopefully it makes the summary appear shorter, despite the unchanged length.

I also added a horizontal rule so those who wish to go right into the meat of things can dismiss what's above the rule without missing content.

You may want to distinguish classical conditioning from operant conditioning.

I thought about that; the original DragonBox example involved operant conditioning instead of classical.

Fortunately Richard's quote provided in footnote three offers an example of operant conditioning. Should I say thus in the footnote itself?

Where are associations created & what do they look like?

In brief:

We haven't really found the answer to that yet, to be honest. There are certain brain regions, like the hippocampus, that when damaged, lesioned, or hijacked by the HPA axis effect a perturbation of memory formation or recall. According to the most prevalent current theory on how associations are stored, neurones or small clusters of neurones may somehow convey the recall of a memory through synchronous activation; hippocampal damage interferes with cortical capacity for recall of the type of memories the hippocampus specializes in helping learn. More recent findings that some glia respond to neurotransmitters may indicate glia also have some role to play in cortical functioning as well. There have been some attempts to identify the structure of cortex-wide neuronal pathways, but to my knowledge no theory is yet accepted by the field as a whole.

In a bit more detail:

The hypothalamo-pituitary adrenal axis, involved in stress response, effects an increase in the synthesisation of cortisol by the adrenal glands of the kidneys, which helps the body respond to a threat by decreasing immune function and increasing the amount of readily available glucose. Its operations begin with a(n environmental) stimulus, which is relayed to the hypothalamus by the amygdala and various other inputs. The hypothalamus then secretes CRH (corticotropin releasing hormone), which is received by the anterior pituitary gland, which secretes ACTH (adrenocorticotropin-releasing hormone), which is picked up by the adrenal glands, which secrete cortisol.

The process is halted (inhibited) by the hippocampus; it has a large number of glucocorticoid receptors (GR), which are all over the body, but particularly on the hippocampus. The hippocampus plays an important role in memory formation - specifically short-term memory (STM) and context-heavy memories; if you can't form a memory short-term, long term memory thus is also affected. The GR of the hippocampus, when activated, hijack the structure's normal processes; GR are activated by cortisol. GR are proximal to the parts of the hippocampus critical to its function in memory, so when GR are activated, the hippocampus can't quite execute those functions properly. Keep in mind, though, that it takes but an infinitesimal amount of time for GR to process cortisol; afterwards, the hippocampus signals the hypothalamus to stop producing CRH (the hypothalamus is inhibited), ending the cycle. Only when GR are hyper-active, when the blood contains an exorbitant amount of cortisol, could function of the hippocampus be hindered. Rather more often the case GR are over-activated by excessive and persistent levels of cortisol in the blood, leading to their failure - as they are part of the hippocampus, the structure itself degenerates and loses volume. This is chronic stress. Degeneration of the hippocampus effects a decrease in one's capacity for STM and recall of context-heavy memories.

I describe the HPA axis to show how we know where associations are created. When the hippocampus cannot exercise its role in memory formation and recall, synchronous firing of both newly allocated and recalled associations (neurone clusters) cannot occur; thus perturbations of memory result. On an fMRI scan, one would seen two (or more) spatially separated regions 'light up', signifying increased blood-oxygen flow to, and thus increased activation of, those regions. What do these associations look like on the cellular level? Right now that depends on your imaging techniques. No definitive answer to that question has yet been accepted by the field.

Research on OCD (which is an addictive behaviour) has found that the neuronal pathways in regions related to addictive behaviour (like the Ventral-Tegmental Area) have more activation - are more brightly lit on an fMRI scan - than in humans without OCD; id est, the pathways are more like an interstate than a country road (not my analogy). The more one indulges the OCD-associated behaviour, the theory then goes, the broader the pathway becomes.

*We refers broadly to humans, and specifically to neuroscientists.

Thanks for the thorough reply.

I guess what I'm getting at though is, if we can't point at something & say, "This is an association. It weighs 5 grams & consumes 0.5% of the brain's energy", then how do we quantify an association? Are we referring to behavior? A subjective feeling? A concept? What?

Think of it as an inter-connexion of neurones and neurites (dendrites and axonal fibers). When an association is created, concepts become related.

We have yet to be able to pin down a specific neurone cell body, or a small cluster of neurones, and say, "This neurone contains the face of Bob, and this cluster of neurones stores Bob's name and data on how you know him."

You can look into connectionist models of how neurones operate if interested.

Personally I think the models aren't an accurate representation of cortical processes - though they might have helped in the recent Blue Brain results. The models are accurate to an extent, but I do not think them comprehensive enough to adequately describe, and then predict, all cortical processes. Again, that is just personal speculation.