Related to: Akrasia, hyperbolic discounting, and picoeconomics
If you're tired of studies where you inevitably get deceived, electric shocked, or tricked into developing a sexual attraction to penny jars, you might want to sign up for Brian Wansink's next experiment. He provided secretaries with a month of unlimited free candy at their workplace. The only catch was that half of them got the candy in a bowl on their desk, and half got it in a bowl six feet away. The deskers ate five candies/day more than the six-footers, which the scientists calculated would correspond to a weight gain of over 10 pounds more per year1.
Beware trivial inconveniences (or, in this case, if you don't want to gain weight, beware the lack of them!) Small modifications to the difficulty of obtaining a reward can make big differences in whether the corresponding behavior gets executed.
The best studied example of this is time discounting. When offered two choices, where A will lead to a small reward now and B will lead to a big reward later, people will sometimes choose smaller-sooner rather than larger-later depending on the length of the delay and the size of the difference. For example, in one study, people preferred $250 today to $300 in a year; it took a promise of at least $350 to convince them to wait.
Time discounting was later found to be "hyperbolic", meaning that the discount amount between two fixed points decreases the further you move those two points into the future. For example, you might prefer $80 today to $100 one week from now, but it's unlikely you would prefer $80 in one hundred weeks to $100 in one hundred one weeks. Yet this is offering essentially the same choice: wait an extra week for an extra $20. So it's not enough to say that the discount rate is a constant 20% per week - the discount rate changes depending on what interval of time we're talking about. If you graph experimentally obtained human discount rates on a curve, they form a hyperbola.
Hyperbolic discounting creates the unpleasant experience of "preference reversals", in which people can suddenly change their mind on a preference as they move along the hyperbola. For example, if I ask you today whether you would prefer $250 in 2019 or $300 in 2020 (a choice between small reward in 8 years or large reward in 9), you might say the $300 in 2020; if I ask you in 2019 (when it's a choice between small reward now and large reward in 1 year), you might say no, give me the $250 now. In summary, people prefer larger-later rewards most of the time EXCEPT for a brief period right before they can get the smaller-sooner reward.
George Ainslie ties this to akrasia and addiction: call the enjoyment of a cigarette in five minutes the smaller-sooner reward, and the enjoyment of not having cancer in thirty years the larger-later reward. You'll prefer to abstain right up until the point where there's a cigarette in front of you and you think "I should smoke this", at which point you will do so.
Discounting can happen on any scale from seconds to decades, and it has previously been mentioned that the second or sub-second level may have disproportionate effects on our actions. Eliezer concentrated on the difficult of changing tasks, but I would add that any task which allows continuous delivery of small amounts of reinforcement with near zero delay can become incredibly addictive even if it isn't all that fun (this is why I usually read all the way through online joke lists, or stay on Reddit for hours). This is also why the XKCD solution to internet addiction - an extension that makes you wait 30 seconds before loading addictive sites - is so useful.
Effort discounting is time discounting's lesser-known cousin. It's not obvious that it's an independent entity; it's hard to disentangle from time discounting (most efforts usually take time) and from garden-variety balancing benefits against costs (most efforts are also slightly costly). There have really been only one or two good studies on it and they don't do much more than say it probably exists and has its own signal in the nucleus accumbens.
Nevertheless, I expect that effort discounting, like time discounting, will be found to be hyperbolic. Many of these trivial inconveniences involve not just time but effort: the secretaries had to actually stand up and walk six feet to get the candy. If a tiny amount of effort held the same power as a tiny amount of time, it would go even further toward explaining garden-variety procrastination.
TIME/EFFORT DISCOUNTING AND UTILITY
Hyperbolic discounting stretches our intuitive notion of "preference" to the breaking point.
Traditionally, discount rates are viewed as just another preference: not only do I prefer to have money, but I prefer to have it now. But hyperbolic discounting shows that we have no single discount rate: instead, we have different preferences for discount rates at different future times.
It gets worse. Time discount rates seem to be different for losses and gains, and different for large amounts vs. small amounts (I gave the example of $250 now being worth $350 in a year, but the same study found that $3000 now is only worth $4000 in a year, and $15 now is worth a whopping $60 in a year). You can even get people to exhibit negative discount rates in certain situations: offer people $10 now, $20 in a month, $30 in two months, and $40 in three months, and they'll prefer it to $40 now, $30 in a month, and so on - maybe because it's nice to think things are only going to get better?
Are there utility functions that can account for this sort of behavior? Of course: you can do a lot of things just by adding enough terms to an equation. But what is the "preference" that the math is describing? When I say I like having money, that seems clear enough: preferring $20 to $15 is not a separate preference than preferring $406 to $405.
But when we discuss time discounting, most of the preferences cited are specific: that I would prefer $100 now to $150 later. Generalizing these preferences, when it's possible at all, takes several complicated equations. Do I really want to discount gains more than losses, if I've never consciously thought about it and I don't consciously endorse it? Sure, there might be such things as unconscious preferences, but saying that the unconscious just loves following these strange equations, in the same way that it loves food or sex or status, seems about as contrived as saying that our robot just really likes switching from blue-minimization to yellow-minimization every time we put a lens on its sensor.
It makes more sense to consider time and effort discounting as describing reward functions and not utility functions. The brain estimates the value of reward in neural currency using these equations (or a neural network that these equations approximate) and then people execute whatever behavior has been assigned the highest reward.
1: Also cited in the same Nutrition Action article: if the candy was in a clear bowl, participants ate on average two/day more than if the candy was in an opaque bowl.
That makes the researchers sound pretty silly. Did they just multiply the calories by the number of days worked? More likely: the candies partially spoiled their appetite at lunch, causing them to take 3 fewer bites of the cookies they packed. Or whatever. People tend to maintain stable weights over long periods of time. That implies weight isn't that sensitive to minor changes in environment, or it would fluctuate wildly.
Do any of the studies on hyperbolic discounting attempt to show that it is not just a consequence of combining uncertainty with something like a standard exponential discounting function? That's always seemed the most plausible explanation of hyperbolic discounting to me and it meshes with what seems to be going on when I introspect on these kinds of choices.
Most of the discussions of hyperbolic discounting I see don't even consider how increasing uncertainty for more distant rewards should factor into preferences. Ignoring uncertainty seems like it would be a sub-optimal strategy for agents making decisions in the real world.
I think exponential discounting already assumes uncertainty. You need uncertainty to discount at all - if things are going to stay the same, might as well wait until later. And it doesn't intuitively lead to hyperbolic discounts - if there's a 1% chance you'll die each week, then waiting from now until next week should make you discount the same amount as waiting from ten weeks from now until eleven.
But there is a way to use uncertainty to get from exponential to hyperbolic discounting. You get exponential if you're worried about yourself dying/being unable to use the reward/etc. But if you add in the chance of the reward going away, based on a prior where you don't know anything about how likely that is, you might get hyperbolic discounting. If you don't eat an animal you've killed in the next ten minutes, then it might get stolen by hyenas. But common-sensibly, if it goes a year without being stolen by hyenas or going bad or anything, there's not much chance of hyenas suddenly coming along in the next ten minutes after that.
What the best-known paper on this says is:
It goes on to say:
I'm not sure you need uncertainty to discount at all - in finance exponential discounting comes from interest rates which are predicated on an assumption of somewhat stable economic growth rather than deriving from uncertainty.
As you point out, hyperbolic discounting can come from combining exponential discounting with an uncertain hazard rate. It seems many of the studies on hyperbolic discounting assume they are measuring a utility function directly when they may in fact be measuring the combination of a utility function with reasonable assumptions about the uncertainty of claiming a reward. It's not clear to me that they have actually shown that humans have time inconsistent preferences rather than just establishing that people don't separate the utility they attach to a reward from their expectation of actually receiving it in their responses to these kinds of studies.
This doesn't explain many of the effect described in the post, such as the choice of 10-20-30-40 over 40-30-20-10 and the fact that people's preferences can reverse like you explained, though the latter may just be because evolution found a simpler solution rather than a more accurate one.
Yes. Specifically, I'm always struck by the idea that someone offering me $100 right now, before I let them out of my sight, is more likely to deliver. If it were between me leaving and them mailing (or wiring) $100 later that day (or so they say) vs. $150 next week, clearly I'll take the $150.
But Yvain talks about the reward at time t vs the larger one at time t+1 becoming more tempting only as you get sufficiently close to t - so, if this has been measured in real people, some researchers must have avoided the obvious "we'll get back to you" credibility problem (I didn't follow cites looking for details, however).
Apart from the uncertainty about whether an agency will deliver a future reward at all, there is also the expenditure of resources in keeping track of and following up on a debt. If you have to keep track of a debt for a year, and take action to claim it at the end of that time, that could very well cost more than $50 worth of time and energy, particularly when you take into account the need to spend resources retaliating if the debtor fails to pay. I think it is much more plausible that people use ad hoc heuristics developed to deal with these issues, than that we actually use a hyperbolic discount function.
Pigeons also discount hyperbolically.
... as, according to the paper you linked, do rats. Good point. Can this be explained in similar ways? A pigeon or rat's environment could surely provide uncertainty about whether a future reward will be delivered at all. Time and energy spent keeping track of a future reward might still be a significant cost. Could these factors have selected for neural circuits that just apply a hyperbolic discount function as a default in the absence of specific reason to do otherwise? This does seem to be evidence suggesting the answer might be yes.
Naturally-occurring cases where hyperbolic discounting leads to suboptimal solutions are relatively rare, and at least one seems to have evolved a patch.
Those issues seem pretty common. Why would nature not use hyperbolic discounting - and then use ad hoc heuristics to patch it in cases where it doesn't apply?
Yeah, the paper Yvain linked, seems to be providing evidence that a hyperbolic discount function may be the first-order heuristic for handling such issues, with patches applied where appropriate.
The description of how a reward function works sounds like how a utility function works.
What distinction are you attempting to draw between a reward function and a utility function?
I think Matt Newport's comment answers it pretty well. The distinction I'm attempting to draw is between situation dependence and situation independence. If you assign 100 utils to a new car, then that car is worth 100 utils, period. You can still kinda get a utility function by saying I assign 100 utils to a new car now, but 50 utils to a new car next week. But when you assign different amounts to gaining the car vs. losing the car (loss aversion), or different amounts if you consider each facet of having the car separately than you get when you add all of them together (subadditive utility), or that you'd value winning the car in a contest a different amount than inheriting the car from a relative, then eventually it starts to look less like you're calculating a term for "value of car" and using it in different decisions, and more like you've got a number of decision-making processes, all of which return different results for different decisions involving cars.
I admit it's not a hard-and-fast distinction so much as a difference in connotations.
I am pretty sure I don't approve of Matt Newport's comment. If uncertainty about receiving something means that it is worth less then you would conventionally just build that into the utility function.
IMO, rewards are more-or-less the brain's attempt to symbolically represent utility.
What kind of idiot designed the human brain?!? Sheesh!
A blind idiot, apparently.
If the idea of temporal discounting can usefully be extended to effort, is there anything else it can be extended to? How about financial expenditure, for instance? Is the first penny is the hardest one?
Some friends are in the process of buying a house costing about a million dollars. There was some serious haggling over the final price, to which my friend finally replied "Forget about it, it's just thirty thousand dollars, it's not worth the conflict." And after all, paying $1,100,000 vs. $1,130,000 doesn't seem like an interesting difference.
I imagine that if they were haggling over a car that cost $20,000, they would move heaven and earth to avoid paying $30,000 more; $20,000 vs. $50,000 seems a major difference.
This seems a lot like hyperbolic discounting, where having to wait ten minutes makes a big difference if it's ten minutes from now, but very little difference if it's a year vs. a year + ten minutes. Spending $30,000 makes a big difference if it's the first $30,000, but very little if it's $1.1 million + $30,000.
See today's post on prospect theory for more.
The Kahneman & Tversky jacket-calculator study is the classic example of this, if you want to switch from thought experiment to actual experiment.
This topic also came up in another post; I left I comment about it there.
One view of what's happening is that discounting just reflects people's intuitive sense of magnitude, which is nonlinear. It may not be completely logarithmic, but it's at least somewhere in between linear & logarithmic. So someone faced with a temporal discounting choice effectively thinks "9 years is farther away than 8 years, so I'll demand more money to wait 9 years based on my intuitive sense of how much farther away it is." Because of the nonlinear sense of magnitude, you can't just subtract the two numbers and call it a 1 year gap, since it feels smaller than the gap between 1 year vs. 2 years from now. Similarly, the friend in your house example effectively thinks "$1,130,000 is more money than $1,100,000, so I'll put in more effort (and put up with more conflict) to try to get $1,130,000 based on my intuitive sense of how much more it is." But their intuitive sense of magnitude makes that gap seem relatively small, smaller than the gap between $50,000 and $20,000, so they don't haggle. One gets called temporal discounting and the other gets called diminishing marginal utility, but they can both reflect this same nonlinear sense of magnitude.
It seems as though this idea is closely related to "diminishing marginal utility":
I don't think so.
Diminishing marginal utility is a fundamentally rational process: I really do need my first $20,000 more than I need the next $100,000, because when spending the first $20,000 to increase my utility, I can knock off my low-hanging fruit preferences like food, water, and housing - but when spending the next $100,000 I come to more complicated preferences like social status and comfort that aren't quite as important.
But the discounting I'm mentioning here is per item. I would be more likely to excuse a $50 cost overrun on a $200 item than on a $20 item, even if I am a millionaire and in the end $50 makes no difference to my total amount of money either way. Even if I know I'm going to buy both a $20 item and a $200 item, I'd still prefer getting the $50 surcharge attached to the $200 item, even though it doesn't affect my total expenditure. That's irrational, and so it's got to be a bias rather than an instance of diminishing marginal utility.
Good call - I think. Diminishing marginal utility does seem like a rather nice name for phenomena such as temporal discounting, expenditure discounting and effort discounting, though - even if it is currently defined to mean something else. Is there a better name for these things? Or is some terminology hijacking required?
You are correct, timtyler is wrong.
I'd be interested to know whether LW readers, or adherents of our particular genus of self-optimization scheme more generally, had a different coefficient on that hyperbola than average before being exposed to LW-like ideas. Anecdotal evidence suggests that we'd discount less aggressively, but the sample's too small and unrepresentative for me to draw any reliable conclusions.
Pessimistically, I would expect exposure to LessWrong would not alter the coefficient of our entire reward scheme, but rather patch certain graphs, as it were
Some people would prefer being shocked to gaining ten pounds.
I think we need to differentiate between the utility function you might use to model your actions, and one you'd make for what you consciously think would be better. Essentially, akrasia is when the two contradict each other.
For the former, making a more complicated model would make perfect sense. For the latter, you'd probably pick something simpler, especially exponential or constant, so it doesn't require constantly changing your mind.
Isn't the fact that we perceive all the quantities on a logarithmic scale far wider than the specific effects? We find it again and again and again, from sound perception to explicit amount perception to this. (Note that it is an even wider claim than mattnewport's.) So the distance between 8 and 9 years is not the same as distance between 1 month and 1 year + 1 month because the logarithms' difference... erm... differs; and suggestion to give something "now" is then just as infinite as probability 1.
Present Self vs Future Self on the Wikipedia page for hyperbolic discounting seems interesting but has no citation or explanation or much of anything. Has anyone read about (or derived) any results around not caring as much about your future self turning into time discounting? Is it equivalent to hazard rate or something?
I love your posts. Exceptionally clear and interesting!