In his post Ugh Fields, Roko discussed "temporal difference learning", the process by which the brain propagates positive or negative feedback to the closest cause it can find for the feedback. For example, if he forgets to pay his bills and gets in trouble, the trouble (negative feedback) propagates back to thoughts about bills. Next time he gets a bill, he might paradoxically have even more trouble paying it, because it's become associated with trouble and negative emotions, and his brain tends to unconsciously flinch away from it.
He links to the associated Wikipedia article:
The TD algorithm has also received attention in the field of neuroscience. Researchers discovered that the firing rate of dopamine neurons in the ventral tegmental area (VTA) and substantia nigra (SNc) appear to mimic the error function in the algorithm. The error function reports back the difference between the estimated reward at any given state or time step and the actual reward received. The larger the error function, the larger the difference between the expected and actual reward. When this is paired with a stimulus that accurately reflects a future reward, the error can be used to associate the stimulus with the future reward.
Dopamine cells appear to behave in a similar manner. In one experiment measurements of dopamine cells were made while training a monkey to associate a stimulus with the reward of juice. Initially the dopamine cells increased firing rates when exposed to the juice, indicating a difference in expected and actual rewards. Over time this increase in firing back propagated to the earliest reliable stimulus for the reward. Once the monkey was fully trained, there was no increase in firing rate upon presentation of the predicted reward. This mimics closely how the error function in TD is used for reinforcement learning.
So if I understand this right, the monkey hears a bell and is unimpressed, having no expectation of reward. Then the monkey gets some juice that tastes really good and activates (opioid dependent?) reward pathways. The dopamine system is pretty surprised, and broadcasts that surprise back to all the neurons that have been especially active recently, most notably the neurons that activated upon hearing the bell. These neurons are now more heavily associated with the dopamine system. So the next time the monkey hears a bell, it has a greater expectation of reward.
And in this case it doesn't matter, because the monkey can't do anything about it. But if it were a circus monkey, and its trainer was trying to teach it to do a backflip to get juice, the association between backflips and juice would be pretty useful. As long as the monkey wanted juice, merely entertaining the plan of doing a backflip would have motivational value that promotes the correct action.
The Sinclair Method is a promising technique for treating alcoholics that elegantly demonstrates these pathways by sabotaging them.
Alcohol produces a surge of opioids in, yes, the ventral tegmental area. The temporal difference algorithm there correctly deduces that the reward is due to alcohol, and so links the dopamine system to things like drinking, planning to drink, et cetera. Rounding the nearest cliche, dopamine represents "wanting", so this makes people want to drink.
Repeat this process enough, or start with the right (wrong?) chemical structure for your opioid and dopamine receptors, and you become an alcoholic.
So to treat alcoholism, all you should have to do is reverse the process. Drink something, but have it not activate the reward system at all. Those dopaminergic neurons that detect error in your reward predictions start firing like mad and withdrawing their connections to the parts of the brain representing drinking, drinking is no longer associated with "wanting", you don't want to drink, and suddenly you're not an alcoholic any more.
It's not quite that easy. But it might be pretty close.
The Sinclair Method of treating alcoholism is to give patients naltrexone, an opioid antagonist. Then the patients are told they can drink as much as they want. Then they do. Then they gradually stop craving drink.
In these people, alcohol still produces opioids, but the naltrexone prevents them from working and they don't register with the brain's reward system. Drinking isn't "fun" any more. The dopamine system notices there's no reward, and downgrades the connection between reward and drinking, which from the inside feels like a lessened craving to drink.
In theory, this same process should be useful against any addiction or unwanted behavior. In practice, research either supports or is still investigating naltrexone use1 against smoking, self-harm, kleptomania, and overeating (no word yet on Reddit use).
The method boasts an success rate of between 25% to 78% on alcoholics depending on how you define success. A lot of alcoholism statistics are comparing apples to oranges (did they stay sober for more than a year? Forever? If they just lapsed once or twice, does that still count?) but eyeballing the data2 makes this look significantly better than either Alcoholics Anonymous or willpower alone.
I'm kind of confused by the whole idea because I don't understand the lack of side effects. Knocking out the brain's learning system to cure alcoholism seems disproportionate, and I would also expect naltrexone to interfere with the ability to experience happiness (which many people seem to like). But I haven't heard anyone mention any side-effects along the lines of "oh, and people on this drug can never learn anything or have fun ever again", and you'd think somebody would have noticed. If anyone on Less Wrong has ever used this method, or used naltrexone for anything else, please speak up.
Since these same pathways control so many cravings besides alcoholism, research in this area will probably uncover more knowledge of what really motivates us.
1: There's a subtle but important difference between the Sinclair Method and simple naltrexone use. As I understand it, most doctors who prescribe naltrexone tell the patient to abstain from alcohol as much as possible, but the Sinclair Method tells the patients to continue drinking normally. There are also some complicated parts about exactly when and how often you take the drug. The theory predicts the Sinclair Method would have better results, and the data seems to bear this out. As far as I know, all the studies on kleptomania, overeating, et cetera have been done on standard naltrexone use, not the Sinclair Method; I predict the Sinclair Method would work better, although there might be some practical difficulties invovled in telling a kleptomaniac "Okay, take this tablet once a day while stealing stuff at the same rate you usually do."
2: 27% "never relapse into heavy drinking" and 78% get drinking "below the level of increased risk of morbidity and mortality". There's also an 87% number floating around without any justification or link to a study. I think this guy's statistics on a ~5-10% yearly remission rate from willpower or AA sound plausible.