Followup to: Evaluability
"Psychophysics", despite the name, is the respectable field that links physical effects to sensory effects. If you dump acoustic energy into air—make noise—then how loud does that sound to a person, as a function of acoustic energy? How much more acoustic energy do you have to pump into the air, before the noise sounds twice as loud to a human listener? It's not twice as much; more like eight times as much.
Acoustic energy and photons are straightforward to measure. When you want to find out how loud an acoustic stimulus sounds, how bright a light source appears, you usually ask the listener or watcher. This can be done using a bounded scale from "very quiet" to "very loud", or "very dim" to "very bright". You can also use an unbounded scale, whose zero is "not audible at all" or "not visible at all", but which increases from there without limit. When you use an unbounded scale, the observer is typically presented with a constant stimulus, the modulus, which is given a fixed rating. For example, a sound that is assigned a loudness of 10. Then the observer can indicate a sound twice as loud as the modulus by writing 20.
And this has proven to be a fairly reliable technique. But what happens if you give subjects an unbounded scale, but no modulus? 0 to infinity, with no reference point for a fixed value? Then they make up their own modulus, of course. The ratios between stimuli will continue to correlate reliably between subjects. Subject A says that sound X has a loudness of 10 and sound Y has a loudness of 15. If subject B says that sound X has a loudness of 100, then it's a good guess that subject B will assign loudness in the range of 150 to sound Y. But if you don't know what subject C is using as their modulus—their scaling factor—then there's no way to guess what subject C will say for sound X. It could be 1. It could be 1000.
For a subject rating a single sound, on an unbounded scale, without a fixed standard of comparison, nearly all the variance is due to the arbitrary choice of modulus, rather than the sound itself.
"Hm," you think to yourself, "this sounds an awful lot like juries deliberating on punitive damages. No wonder there's so much variance!" An interesting analogy, but how would you go about demonstrating it experimentally?
Kahneman et. al., 1998 and 1999, presented 867 jury-eligible subjects with descriptions of legal cases (e.g., a child whose clothes caught on fire) and asked them to either
- Rate the outrageousness of the defendant's actions, on a bounded scale
- Rate the degree to which the defendant should be punished, on a bounded scale, or
- Assign a dollar value to punitive damages
And, lo and behold, while subjects correlated very well with each other in their outrage ratings and their punishment ratings, their punitive damages were all over the map. Yet subjects' rank-ordering of the punitive damages—their ordering from lowest award to highest award—correlated well across subjects.
If you asked how much of the variance in the "punishment" scale could be explained by the specific scenario—the particular legal case, as presented to multiple subjects—then the answer, even for the raw scores, was .49. For the rank orders of the dollar responses, the amount of variance predicted was .51. For the raw dollar amounts, the variance explained was .06!
Which is to say: if you knew the scenario presented—the aforementioned child whose clothes caught on fire—you could take a good guess at the punishment rating, and a good guess at the rank-ordering of the dollar award relative to other cases, but the dollar award itself would be completely unpredictable.
Taking the median of twelve randomly selected responses didn't help much either.
So a jury award for punitive damages isn't so much an economic valuation as an attitude expression—a psychophysical measure of outrage, expressed on an unbounded scale with no standard modulus.
I observe that many futuristic predictions are, likewise, best considered as attitude expressions. Take the question, "How long will it be until we have human-level AI?" The responses I've seen to this are all over the map. On one memorable occasion, a mainstream AI guy said to me, "Five hundred years." (!!)
Now the reason why time-to-AI is just not very predictable, is a long discussion in its own right. But it's not as if the guy who said "Five hundred years" was looking into the future to find out. And he can't have gotten the number using the standard bogus method with Moore's Law. So what did the number 500 mean?
As far as I can guess, it's as if I'd asked, "On a scale where zero is 'not difficult at all', how difficult does the AI problem feel to you?" If this were a bounded scale, every sane respondent would mark "extremely hard" at the right-hand end. Everything feels extremely hard when you don't know how to do it. But instead there's an unbounded scale with no standard modulus. So people just make up a number to represent "extremely difficult", which may come out as 50, 100, or even 500. Then they tack "years" on the end, and that's their futuristic prediction.
"How hard does the AI problem feel?" isn't the only substitutable question. Others respond as if I'd asked "How positive do you feel about AI?", only lower numbers mean more positive feelings, and then they also tack "years" on the end. But if these "time estimates" represent anything other than attitude expressions on an unbounded scale with no modulus, I have been unable to determine it.
Next post: "The Halo Effect"
Previous post: "Evaluability (And Cheap Holiday Shopping)"
Kahneman, D., Schkade, D. A., and Sunstein, C. 1998. Shared Outrage and Erratic Awards: The Psychology of Punitive Damages. Journal of Risk and Uncertainty 16, 49-86.
Kahneman, D., Ritov, I. and Schkade, D. A. 1999. Economic Preferences or Attitude Expressions? An Analysis of Dollar Responses to Public Issues. Journal of Risk and Uncertainty, 19: 203-235.