What is “learning rate”, and why should we expect a learning-rate-modulation mechanism in the brain?
What is “learning rate”? As many readers know, learning algorithms involve a large number of parameters (“weights” in ML, or synapse locations and strengths in brains) that get changed while the learning algorithm runs. These parameters store the information that the algorithm has learned. “Learning rate” is a multiplier on this change process—the higher the learning rate, the more aggressively you change the parameters each step.
So if the learning rate is zero, you don’t change the parameters at all (and it’s no longer a learning algorithm!). If the learning rate is really high, you remember information more reliably with fewer repetitions. But it’s not all good: if the learning rate is too high, you get problems like (depending on the algorithm) instability, more tendency to overwrite old memories, more tendency to overfit (i.e., to “learn” things that are just random noise rather than robust patterns in the environment), and so on.
(Learning rate is an exact synonym of “plasticity”, as far as I can tell.)
Why might an organism benefit from using different learning rates in different situations? Well, as above, the learning rate involves a tradeoff between different considerations, and it would be awfully surprising if the all-things-considered best learning rate winds up being exactly the same for someone sitting by a campfire vs someone fighting a lion. For example, intuitively, when fighting a lion, you’re in a situation where you’re making lots of life-or-death decisions. You probably want an unusually high learning rate here, so that next time you’re fighting a lion you’ll do a much better job of understanding what’s going on. By contrast, sitting by the campfire, maybe it’s not so important that you remember everything, and the balance of considerations pushes towards a low learning rate, which again has better properties in avoiding overwriting old memories, avoiding overfitting, etc.
(I think there’s a connection to pedagogy here. Everyone knows that students retain information better when doing something arousing, like arguing with someone, vs when they’re bored and inactive. I bet the brain sets its learning rate to a higher setting in the first case! But I think there are other things going on here too—for example, emotional memories will get replayed more often, and “more replays” is functionally similar to “higher learning rate”.)
Doesn’t dopamine (reward prediction error) control learning rate / induce plasticity? Well, yes and no. Here’s an algorithm:
If there’s a positive Reward Prediction Error, then whatever you just did, apparently it was pretty awesome! So remember that, and try it again in similar situations in the future.
If that’s the algorithm (and I do think the neocortex has a mechanism like this—see my later post Big Picture Of Phasic Dopamine), and if dopamine is the reward prediction error signal, then more learning will happen when there’s more dopamine. But the dopamine is not the “learning rate” here, it’s just a different input into the learning algorithm. For example, phasic dopamine has a baseline level from which it can swing positive or negative, whereas a learning rate ought to be always nonnegative. As another example, predictive learning (a.k.a. self-supervised learning) has a learning rate, but does not have a reward prediction error.
Biological evidence that acetylcholine sets learning rate
Acetylcholine (abbreviation: “ACh”; adjective form: "cholinergic") is a neurotransmitter. I am by no means an acetylcholine expert, and don’t have time to become one, but from a quick skim of the literature, the glove seems to fit:
- “muscarinic ACh...receptor blockade generates severe anterograde amnesia” (ref) (no ACh = zero learning rate = no new memories = anterograde amnesia)
- ACh can be found in all parts of the brain that learn, so far as I can tell (cortex, hippocampus, cerebellum, striatum, basolateral amygdala).
- “Optical stimulation of cholinergic NBM-BLA terminal fibers ... led to more rapid learning ...” (ref)
- This paper (eq. 5a) actually used ACh as a multiplier on the learning rate in their model, exactly as I’m advocating. However, they used dopamine as a learning rate too. See above for why I think dopamine can induce learning but is not technically a "learning rate" per se.
- This paper works out cellular mechanisms of ACh-induced plasticity.
- This paper gave cognitive tests to people taking the acetylcholine-blocking drug scopalamine. If I’m reading it right, they found that the drug did not harm performance in the tasks involving working memory. (Working memory involves neurons staying active for some period of time, but does not involve editing any synapses, as far as I understand.) Ditto with the tasks that required recalling already-existing memories. But for all the tasks that required editing synapses, the subjects did worse when on the drug.
- Conversely, nicotine binds to some ACh receptors, and “Acute nicotine, which may model the initial effects of smoking, enhances learning (76–81).” (ref) (This doesn’t seem to be true for chronic nicotine, if I’m reading right. Presumably the body adapts to the new baseline.)
- Not 100% sure I understand this right, but my impression is that if attention is focused on a particular part of the visual field, for example, then there’s an ACh spike in the corresponding visual processing part of the brain, with great spatiotemporal precision. That would be a very nice mechanism for preferentially learning to discern biologically-relevant things.
(Again, I’m not an ACh expert. I tried not to cherry-pick in the above list, but I dunno.)
Does acetylcholine do other things too?
Yes! For one thing, there are tons of little subcortical structures in the brain, and they have specialized mechanisms to do specialized things, and I would not be surprised in the slightest if some of those things involved acetylcholine in a role that has nothing to do with learning rate.
More significantly, in the cortex, I just think evolution is not likely to set up a signaling mechanism which causes one and only one thing to happen. Signals communicate information, and whatever that information is, there are probably multiple processes that "care about" that information, and can thus be improved by having some response to that signaling mechanism. (And then neuroscientists would do experiments and announce that the signal "modulates" whatever that process is.)
Or in this case: if ACh is a signal for how high to set the learning rate, and there’s some other function F such that evolution tends to want high F at more-or-less the same times and same places that evolution tends to want high learning rate, then we should expect ACh to control F too!
The obvious candidates for that other function F are “whatever neuron or network changes are appropriate under conditions of attention and arousal”, since that’s presumably the main condition where you want a higher learning rate. I’m not sure exactly what those changes are. One possible example: This paper (already mentioned above) found that subjects on an ACh-blocking drug did worse on reaction time. Makes sense to me! You can probably save energy by having a slower reaction time most of the time, but you want to speed it up under conditions of attention and arousal. There also seem to be other network-level changes that happen under conditions of attention and arousal (e.g. see here).
You might ask: “Is ACh fundamentally an attention-and-arousal mechanism, and learning-rate-change is piggybacking on that signal? Or is it fundamentally a learning-rate-change mechanism, and other attention-and-arousal-related things are piggybacking on that?” My answer is: I’m not sure that question even has an answer, and if it does, I don’t think it matters.
(Thanks Adam Marblestone for comments on a draft.)