An ML interpretation of Shard Theory

[-]TurnTrout3y20

In a fun intellectual circle, a lot of shard theory / model-free RL in general seems to be people reinventing behaviourism, except this time programming agents for which it is true. For instance, in behaviourism, agents never ‘optimise for reward’ but always simply display ‘conditioned’ behaviours which were associated with reward in the past. There are also various Pavlovian/associative conditioning experiments which might be interesting to do with RL agents.

I think behaviorism is wrong, and importantly different from shard theoretic analyses. (But maybe you mean something like "some parts of the analyses are re-inventing behaviorism"?)

From my shortform:

Notes on behaviorism: After reading a few minutes about it, behaviorism seems obviously false. It views the "important part" of reward to be the external behavior which led to the reward. If I put my hand on a stove, and get punished, then I'm less likely to do that again in the future. Or so the theory goes.
But this seems, in fullest generality, wildly false. The above argument black-boxes the inner structure of human cognition which produces the externally observed behavior.
What actually happens, on my model, is that the stove makes your hand hot, which triggers sensory neurons, which lead to a punishment of some kind, which triggers credit assignment in your brain, which examines your current mental state and judges which decisions and thoughts led to this outcome, and makes those less likely to occur in similar situations in the future.
But credit assignment depends on the current internal state of your brain, which screens off the true state of the outside world for its purposes. If you were somehow convinced that you were attempting to ride a bike, and you got a huge punishment, you'd be more averse to moving to ride a bike in the future -- not averse to touching stoves.
Reinforcement does not directly modify behaviors, and objects are not intrinisically reinforcers or punishments. Reinforcement is generally triggered by reward circuitry, and reinforcement occurs over thoughts which are judged responsible for the reward.
This line of thought seems closer to "radical behaviorism", which includes thoughts as "behaviors." That idea never caught on -- is each thought not composed of further subthoughts? If only they had reduced "thought" into parts, or known about reward circuitry, or about mesa optimizers, or about convergently learned abstractions, or about credit assignment...

[-]Kaj_Sotala3y120

After reading a few minutes about it, behaviorism seems obviously false. It views the "important part" of reward to be the external behavior which led to the reward. If I put my hand on a stove, and get punished, then I'm less likely to do that again in the future. Or so the theory goes.

This is probably true for some versions of behaviorism but not all of them. For instance, the author of Don't Shoot the Dog explicitly identifies her frame as behaviorist and frequently cites academic research on behaviorist psychology as the origin of her theoretical approach. At the same time, she also includes the mental state of the organism being trained as a relevant variable. For example, she talks about how animal training gets faster once the animals figure out how they are being taught, and how they might in some situations realize the trainer is trying to teach them something without yet knowing what that something is:

With most animals, you have to go to some lengths to establish stimulus control at first, but often by the time you start bringing the third or fourth behavior under stimulus control, you will find that the animal seems to have generalized, or come to some conceptual understanding. After learning three or four cued behaviors, most subjects seem to recognize that certain events are signals, each signal means a different behavior, and acquiring reinforcers depends upon recognizing and responding correctly to signals. From then on, establishment of learned stimuli is easy. The subject already has the picture, and all it has to do is learn to identify new signals and associate them with the right behaviors. Since you, as trainer, are helping all you can by making that very clear, subsequent training can itself go much faster than the initial laborious steps. [...]
A special case of the conditioned aversive signal has recently become popular among dog trainers: the noreward marker, often the word "Wrong," spoken in a neutral tone. The idea is that when the dog is trying various behaviors to see what you might want, you can help him by telling him what won't work, by developing a signal that signifies "That will not be reinforced." [...]
I once videotaped a beautiful Arabian mare who was being clicker-trained to prick her ears on command, so as to look alert in the show ring. She clearly knew that a click meant a handful of grain. She clearly knew her actions made her trainer click. And she knew it had something to do with her ears. But what? Holding her head erect, she rotated her ears individually: one forward, one back; then the reverse; then she flopped both ears to the sides like a rabbit, something I didn't know a horse could do on purpose. Finally, both ears went forward at once. Click! Aha! She had it straight from then on. It was charming, but it was also sad: We don't usually ask horses to think or to be inventive, and they seem to like to do it.

The book also has some discussion about reinforcing specific cognitive algorithms, such as "creativity":

Reinforcement has been used on an individual and group basis to foster not just specific behavior but characteristics of value to society—say, a sense of responsibility. Characteristics usually considered to be "innate" can also be shaped. You can, for example, reinforce creativity. My son Michael, while going to art school and living in a loft in Manhattan, acquired a kitten off the streets and reinforced it for "cuteness," for anything it did that amused him. I don't know how the cat defined that, but it became a most unusual cat— bold, attentive, loyal, and full of delightful surprises well into middle age. At Sea Life Park we shaped creativity with two dolphins—in an experiment that has now been much anthologized—by reinforcing anything the animals did that was novel and had not been reinforced before. Soon the subjects caught on and began "inventing" often quite amusing behaviors. One came up with wackier stuff than the other; on the whole, even in animals, degrees of creativity or imaginativeness can vary from one individual to another. But training "shifts" the curve for everyone, so that anyone can increase creativity from whatever baseline he or she began at. [...]
Some owners of clicker-wise dogs have become so accustomed to canine initiative and experimentation that they rely on the dog "offering behaviors," both learned and new, as a standard part of the training process. Many clicker trainers play a game with their dogs that I have nicknamed "101 Things to Do with a Box" (or a chair, or a ball, or a toy). Using essentially the same procedure we used at Sea Life Park to develop "creativity" in a dolphin, in each session the dog is clicked for some new way of manipulating the object. For example, you might put a cardboard box on the floor and click the dog for sniffing it and then for bumping it with his nose, until he's pushing it around the room. The next time, you might let the dog discover that pushing the box no longer gets clicked but that pawing it or stepping over the side and eventually getting into the box is what works. The dog might also come up with dragging the box, or lifting and carrying the box. One dog, faced anew with the challenge of the box game, got all his toys and put them into the box. Click! My Border terrier once tipped the box over onto herself and then scooted around under it, creating the spectacle of a mysterious traveling box. Everyone in the room laughed hysterically, which seemed to please her. Some dogs are just as clever at coming up with new ideas as any dolphin could be; and dogs, like dolphins—and horses—seem to love this challenging clicker game.

(The whole book is available online and is an easy read, I very much recommend it)

[-]LawrenceC3y20

Nitpick: I think this link failed to embed.

[here](https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality-is-the-tiger-and-agents-are-its-teeth)

[-]beren3y10

Whoops! Thanks for spotting. Fixed!

[-]NickGabs3y10

My understanding of Shard Theory is that what you said is true, except sometimes the shards "directly" make bids for outputs (particularly when they are more "reflexive," e. g. the "lick lollipop" shard is activated when you see a lollipop), but sometimes make bids for control of a local optimization module which then implements the output which scores best according to the various competing shards. You could also imagine shards which do a combination of both behaviors. TurnTrout can correct me if I'm wrong.

^{^}

In a fun intellectual circle, a lot of shard theory / model-free RL in general seems to be people reinventing behaviourism, except this time programming agents for which it is true. For instance, in behaviourism, agents never ‘optimise for reward’ but always simply display ‘conditioned’ behaviours which were associated with reward in the past. There are also various Pavlovian/associative conditioning experiments which might be interesting to do with RL agents.

^{^}

Does this happen in the brain? Some potential evidence (and probably some inspiration) for this comes from the brain, and probably the basal ganglia which implements subcortical action selection. The basal ganglia is part of a large-scale loop through the brain of cortex -> BG -> thalamus -> cortex which contains the full sensorimotor loop. The classic story of the BG is model-free RL with TD learning (but I personally have come to largely disagree with this). A large number of RL algorithms are consistent with RPEs including policy gradients as well as more esoteric algorithms. Beyond this dopaminergic neurons are more complicated than just implementing RPEs as well as appear to represent multiple reward functions which can result in highly flexible TD learning algorithms. The BG does appear to have opponent pathways for exciting and inhibiting (the Go and No-Go pathways specific actions/plans, which indicate some level of shard-theory like competition. On the other hand, there also seems to be a fairly clear separation between action selection and action implementation in the brain, where the basal ganglia mostly does action selection and delegates the circuitry to implement the action to the motor cortex or specific subcortical structures. As far as I know, the motor cortex doesn’t have the same level of competition between different potential behaviours as in the basal ganglia, although this has of course been proposed. Behaviourally, there is certainly some evidence for multiple competing behaviours being activated simultaneously and needing to be effortfully inhibited. A classic example is the Stroop task but there is indeed a whole literature studying tasks where people need to inhibit certain attractive behaviours in various circumstances. On the other hand, this is not conclusive evidence for a shard-like architecture, but instead there could be a hybrid architecture of both amortised and iterative inference where the amortised and iterative responses are different.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

39

An ML interpretation of Shard Theory

39

39