Follow-up to Julia Wise on "Don’t Shoot The Dog"

Alex Flint

Julia Wise recently wrote up notes from "Don’t Shoot The Dog", a book by Karen Pryor about behavioral training methods:

The book applies behavioral psychology to training animals and people. The author started off as a dolphin trainer at an aquarium park in the 1960s and moved on to horses, dogs, and her own children. There are a lot of anecdotes about how to train animals (apparently polar bears like raisins). At the time, training animals without violence was considered novel and maybe impossible.

Julia’s notes are wonderful. She skilfully picks interesting anecdotes and key points from the text, adding her own beautiful and concise connections to the practical experience of raising children. It’s a real treat to read.

Julia begins by addressing the perception that systematic training might be in some way unethical:

I can understand not wanting to use behavioral methods on children; the idea can sound overly harsh or reductive. The thing is, we already reinforce behavior all the time, including bad behavior, often without meaning to. So you might as well notice what you’re doing.

What is it about systematic training that seems harsh and reductive? Because it does seem, at least on the surface, potentially harsh and reductive. It may be that this is a mistaken perception, or it may be that it is a fact that we have to live with, but where does this perception come from in the first place?

Togetherness

When I interact with friends, family, lovers, colleagues, children, and even animals, I long for a boundaryless togetherness in which no private plan or intention is held back on either side. But for practical reasons it is very difficult to reach this state of affairs in any connection for any length of time. For example, Julia offers this lovely anecdote:

[...] our four-year-old was eager to go home from the park, and left without us towards the house. I caught up with her and told her not to leave without us. We were halfway to the house, but If I’d continued home with her from there, she would still have achieved what she wanted: getting home sooner. So I took her back to the park and we redid the whole situation: she said "I want to go home" and I walked home with her. Running off on her own didn’t pay, and she hasn’t repeated it.

It would not have been very practical for Julia to operate from a place of "boundaryless togetherness" in this scenario. What would that even mean? For Julia to allow her daughter to wander home alone? To follow her and go straight home, reinforcing the behavior of leaving the park along? To follow her and, rather than bringing her back to the park, express some deep dilemma in words that her young daughter wouldn’t understand? None of these options seem very practical.

But when we use systematic training, perhaps we should ask ourselves whether it is leading us in the direction of fewer or more numerous boundaries. Consider using systematic training to establish a mutually beneficial chore sharing with a housemate:

Often when we are teaching the behavior [...]. For example, once a pattern of chore sharing has been established, your roommate or spouse may stop at the dry cleaners on the way home without being reinforced each time

This may work as a means to getting one’s roommate or spouse to stop at the dry cleaner’s. But if I interact with my roommate or spouse from a place of strategic reinforcement, then I am spending at least some of my attention when I am with them on thinking about when to offer praise and what its effect will be as a form of reinforcement. The one receiving this systematic training is then going to think about what they are being trained to do and whether that is good for them. Both people then spend some of their time together thinking about training or thinking about being trained by each other, which is at best a distraction and at worst a hindrance to togetherness.

Now it may be necessary to use systematic training. And it may not be possible to achieve boundaryless togetherness in any given connection. But it would be nice to use systematic training in a way that moves us in the direction of reducing, not increasing the boundaries in our connections.

Fun

Clicker trainers have learned to recognize play behavior in animals as a sign that the learner has become consciously aware of what behavior was being reinforced. When ‘the light bulb goes on,’ as clicker trainers put it, dogs gambol and bark, horses prance and toss their heads, and elephants, I am told, run around in circles chirping. They are happy. They are excited.

Learning can be fun. I enjoyed learning jiu-jitsu during graduate school. There were times when I received praise and times when I received criticism. I guess I felt a tiny bit of pleasure when receiving praise and a tiny bit of pain when receiving criticism but these feelings were so tiny in magnitude compared to the joy of learning something that took me beyond what I had previously been able to do that it just didn’t matter very much. There was a kind of boundarylessness in the training. My teacher would give very energetic and demanding feedback. I knew that I was being trained and I knew that I wanted to be trained. I understood how the reinforcement was shaping me, and I wanted it to shape me. My teacher was aware not just of how the training was affecting me, but also that I wanted it to affect me. In this mutual "yes" to training there was a willingness in both trainer and trainee to hold back less. If I had been an elephant, I definitely would have run around in circles chirping. It was exhilarating.

But consider now the use of a clicker to train pilots:

A flight instructor can also click a student for initiative and for good thinking: for example, for glancing over the instrument panel before being reminded to do so. So the clicker can reward nonverbal behavior nonverbally in the instant it’s occurring.

Wouldn’t it be tedious to be "clicked" each time you glance over the instrument panel while learning to fly an airplane? Wouldn’t it be tedious to be the instructor doing the "clicking"? I imagine this whole enterprise being cold and painful for both sides. Perhaps the trainer and trainee would both see it as necessary in order to engender the kind of rigor necessary to fly an aircraft. But is this really the case?

What is it that caused the dogs to gambol and the horses to prance and the elephants to run around in circles chirping? I doubt they were celebrating the food they were receiving as reinforcement. Surely they were celebrating learning!

Learning is joyful. If a trainee never experiences joy, perhaps they are not learning. So one way to evaluate a training setup is to ask: is this setup leading, eventually, to joy? It’s not that the point of the training is to give the trainee joy. It’s that joy is a sign that training is working. It may take some time to get there, but the trainee should eventually get there. If trainees are not getting there then perhaps the training setup should be re-worked.

Respect

The training methods described in Pryor’s book are powerful. They allow us to hone in on the basic reward/punishment structure of biological brains, and to use it to bring about profound behavior change:

It often happens, especially when training with food reinforcers, that there is absolutely no way you can get the reinforcer to the subject during the instant it is performing the behavior you wish to encourage. If I am training a dolphin to jump, I cannot possibly get a fish to it while it is in midair. If each jump is followed by a thrown fish with an unavoidable delay, eventually the animal will make the connection between jumping and eating and will jump more often. However, it has no way of knowing which aspect of the jump I liked. Was it the height? The arch? Perhaps the splashing reentry? Thus it would take many repetitions to identify to the animal the exact sort of jump I had in mind. To get around this problem, we use conditioned reinforcers.

Breland called the whistle a ‘bridging stimulus,’ because, in addition to informing the dolphin that it had just earned a fish, the whistle bridged the period of time between the leap in midtank—the behavior that was being reinforced—and swimming over to the side to collect one’s pay.

But there is a risk that we will use these techniques to teach the whole world to behave in the way that we imagine we want, and lose track of what the world has in turn to teach us. Training a dolphin to jump through a hoop might be an exhilarating journey that opens up a connection between trainer and dolphin that leads to the trainer appreciating and learning from the dolphin in profound ways. But it could also be a cold and harsh experience in which a trainer exerts a kind of iron will over the dolphin and becomes less and less able to see and learn from the dolphin.

This is the basic dynamic of respect: am I balancing that which I have to teach you with that which I have to learn from you? Am I devoting an appropriate amount of my time and attention to discovering what I have to learn from you, relative to the time and attention I am putting into teaching you?

Not all beings understand how to wield systematic training techniques. I might use systematic training to teach my friend to cut and arrange flowers, but my friend might not in turn be equipped with the skill of using systematic training, so may not respond by offering their own gifts back to me in as clear or forceful of a way. The more I teach to my friend, the more I ought to look carefully for that which my friend has to teach me.

Trust

If you get into a relationship with someone who is fascinating, charming, sexy, fun, and attentive, and then gradually the person becomes more disagreeable, even abusive, though still showing you the good side now and then, you will live for those increasingly rare moments when you are getting all those wonderful reinforcers: the fascinating, charming, sexy, and fun attentiveness. And paradoxically from a commonsense viewpoint, though obviously from the training viewpoint, the rarer and more unpredictable those moments become, the more powerful will be their effect as reinforcers, and the longer your basic behavior will be maintained. Furthermore, it is easy to see why someone once in this kind of relationship might seek it out again. A relationship with a normal person who is decent and friendly most of the time might seem to lack the kick of that rare, longed-for, and thus doubly intense reinforcer.

But even more painful than a relationship with a disagreeable partner is the fear that every action by every partner might be training us into a pattern that is not good for us. If we view the whole world as a giant reinforcement trainer, some of it random, some of it deliberate, then how can we trust any of it?

And it is not just the world that trains us, but also ourselves:

I found that if I broke down the journey, the first part of the task, into five steps—walking to the subway, catching the train, changing to the next train, getting the bus to the university, and finally, climbing the stairs to the classroom—and reinforced each of these initial behaviors by consuming a small square of chocolate, which I like but normally never eat, at the completion of each step, I was at least able to get myself out of the house, and in a few weeks was able to get all the way to class without either the chocolate or the internal struggle.

Every action I take has some training effect on myself as a positive or negative reinforcement signal. If I take actions that cause myself pleasure then I am positively reinforcing my recent behavior. If I take actions that cause myself pain then I am negatively reinforcing my recent behavior. If I take actions that cause neither pleasure nor pain then I am actively omitting to reinforce my recent behavior, which itself has an effect on training. And in fact the decision to engage in training in the first place, as well as the decisions about what training objectives to pursue, are themselves the result of our past implicit and explicit training. How can we trust any of it?

When we are born, we inherit billions of years of positive and negative reinforcement in the form of our genome. And beyond that the way our parents treat us when we are children, which has a huge training effect on us, is the result of the positive and negative reinforcement that they received over their own lifetimes, including as children from their own parents, and so on backwards through the generations. And then we look at our own actions and see that they are the result of this huge mixture of random and deliberate reinforcement over so many generations, and wonder how we can possibly trust our own actions.

And the answer, so far as I’m concerned, is that there is no answer. This is just a frightening way to look at the world. If we choose to look at the world through frightening lenses, then we shouldn’t be surprised to find ourselves in a state of fright. Perhaps it is necessary to be in a state of fright some of the time. Perhaps not. But we do have the power to choose our lenses. The entire story of systematic training is a story of having the power to shape our own behavior. To then doubt that we have the power to shape the actions that matter most -- the actions of choosing what lenses we use to view the world -- would be deeply ironic.

Graduate school

My favorite of all of Julia’s delightful quotes is the following:

One psychologist jokes that the longest schedule of unreinforced behavior in human existence is graduate school.

The opposite, then, of graduate school is the following hopeful vision:

When you get a whole family, or household, or corporation working on the basis of real stimulus control— when all the people keep their agreements, say what they need, and do what they say— it is perfectly amazing how much gets done, how few orders ever need to be given, and how fast the trust builds up. Good stimulus control is nothing more than true communication— honest, fair communication.

[-]Viliam3y40

I guess it matters whether we are reinforced by people who have our best interest in mind.

[-]Alex Flint3y50

Yeah I agree. But when trying to work out whether someone has our best interests in mind, we might wonder whether our reasoning is itself a faulty product of past training, as is often the case in abusive relationships. How can we trust anything in this case?

[-]Viliam3y20

Yeah, the problem is that "evaluating who has my best interest in mind" seems like exactly the type of question where we are most likely to deceive ourselves, because it has obvious social implications. We will systematically overestimate the friendliness of high-status and/or sexually attractive people.

The only ideas I have about how to mitigate the danger are:

Interact with many people, and hope that their impacts may somewhat cancel each other; for example one high-status person may point out that another high-status person is abusing you. There is a problem when high-status people are coordinated about something (for example, if you live in a religious community, all high-status people you interact with are going to be religious, and will approve of conditioning against questioning religion), but even then there is a chance that if 9 high-status people agree on something and 1 high-status person disagrees, it will unlock your mind's ability to secretly disagree. You should interact with people outside your bubble, even if you believe they are mostly wrong: the occassional situation when they are right may be worth it.
Spend some time thinking alone, maybe talking to yourself.

Reversed stupidity is not intelligence, but it is not an accident that cults typically try to prevent you from doing these two things.

[-]theme_arrow3y10

This is a lovely companion writeup to Julia's. I especially liked your section on respect, I think that it's critical to not become Machiavellian when using this kind of method to shape the world.

I also wanted to add some personal thoughts related to your comments on fun. Part of my work involves being a mission controller for spacecraft. Training for that is long and daunting process involving going from being a trainee during simulated rehearsals, then being in a non-lead role during actual operations, then being in a lead role in simulations, and finally getting to lead actual operations. It can take a year or more between steps in that chain, so bridging stimuli are absolutely essential to making the time between advances not feel endless. I would describe the emotion that the process is headed towards more as "satisfaction" than as "fun," and bridging stimuli are things that let you immediately experience some of the eventual satisfaction of advancement. The kinds of bridging stimuli I can think of might be a thumbs up from a team member, a compliment for something well done, or being temporarily allowed to do something like make a call over the comm net when it's not normally your role. These are a little more personal than the clicker, but I think it's really getting at the same basic idea.