by taw

# 11

GDP measures essentially how good we are at making widgets - and while widgets are useful, it is a very weak and indirect measure of welfare. For example UK GDP per capita doubled between 1975 and 2007 - and people's quality of life indeed improved - but it would be extremely difficult to argue that this improvement was "doubling", and that the gap between 2007's and 1975's quality of life is greater than between 1975's and hunter-gatherer times.

It's not essential to this post, but my very quick theory is that we overestimate GDP thanks to economic equivalent of Amdahl's Law - if someone's optimal consumption mix consisted of 9 units of widgets and 1 unit of personalized services - and their purchasing power increased so now they can acquire 100x as many widgets, but still the same number of services as before - amount of the mix they can purchase increased only 9x, not 90x you'd get by weighted average of original consumption levels (and they spend 92% of their purchasing power on services now). The least scalable factor - whichever it is - will be the bottleneck.

If we're unhappy with GDP there are alternative measures like HDI, but they're highly artificial. It would be very easy to construct completely different measures which would "feel" about as right.

Fortunately there exists a very natural measure of welfare, which I haven't seen used before in this context - preference utilitarian lotteries. Would you rather live in 1700, or take a 50% chance of living in 2010 or 700? Make a list of such bets, assign numbers coherent with bet values (with 100 for highest and 0 for your lowest value) and you're done! By averaging many people's estimates we can hopefully reduce the noise, and get some pretty reasonable welfare estimates.

And now disclaimer time. This approach has countless problems, here are just a few but I'm sure you can think about more.

• Probabilities are difficult - People are really bad at intuiting about a difference between 1% chance of something vs 3% of something, even though it will count for three times as much in results. We can mostly work around this problem by not comparing extremes, but instead sorting situations by desirability, and only comparing nth situation, with p chance of n+1st vs (1-p) chance of (n-1)st. Such probabilities will usually be in comfortable medium range.
• Risk aversion - you prefer certainty of moderate outcome to change of getting either good or bad outcome. It tends to overestimate past welfare.
• Status quo bias - you prefer situations closer to your current even if there's no actual welfare difference. It tends to underestimate past welfare, perhaps balancing risk aversion.
• Knowledge problem - now how much do you really know about life in Industrial Revolution time Britain, let alone ancient Sumer? Even professional historians have problems with that, and unfortunately we might all be biased the same way negating some benefit of averaging out estimates.
• Values problem - you might find some civilizations more repulsive and others less because of your modern values, even if their welfare is really not that different. It can be infanticide (extremely common historically), slavery, racial discrimination, human sacrifice, particular religion or political system etc.
• Hindsight - reverse of knowledge problem - life in 1345 Florence was nowhere near as bad as our hindsight estimates would make it be.
• Representative sampling - life of exactly whom? In many times a random person born wouldn't survive to adulthood - yet it seems unreasonable to include those. Let's say we focus on a healthy adult somewhere near median social status and income.

I tried to think about such series of bets and my results are:

• Western Europe 2010 CE - 100
• Western Europe 1980 CE - 97
• Western Europe 1950 CE - 91
• Western Europe 1900 CE - 65
• Western Europe 1800 CE - 26
• Western Europe 1700 CE - 16
• Western Europe 1500 CE - 10
• High Middle Ages Europe (1250 CE) - 7.6
• Early Middle Ages Europe (700 CE) - 6.4
• Roman Empire around 100 CE - 7.1
• Mediterranean World 500 BCE - 7.0
• Neolithic Middle East (5000 BCE) - 1.6
• Paleolithic anywhere (20000 BCE) - 0

This seems far more reasonable than GDP's illusion of exponentially accelerating progress.

I used this Ruby code to convert bets to values on scale of 0 to 100 (bets ordered by preference, not chronologically):

`def linearize_ratios(*ratios)  diffs = ratios.inject([1.0]){|d,r| d + [d[-1] * r / (1-r)]}  scale = diffs.inject{|a,b|a+b}  diffs.inject([100]){|v,d| v + [v[-1] - 100.0 * d / scale]}endp linearize_ratios(0.7, 0.8, 0.6, 0.2, 0.4, 0.25, 0.2, 0.1, 0.9, 0.9, 0.25)`

# 11

New Comment

I have a question about how this would work that is related to the both the values and representative sampling issues you raise. Would I or would I not adjust my gender (female) in placing my bets? Would I assume that I would be a woman of around median status and income when compared to other women of the era, or when compared to the overall population? It practically goes without saying that in many of these eras, most women had low status compared to men. Even if the more important factor in determining your position would be income, many women in many of these eras would only have income/property as a member of a household and in relation to men.

Somewhat relatedly, would someone of African descent assume he or she would be of the same ethnicity if rating, for example, the antebellum South of the United States (or many other eras/locales in the U.S.?)

It seems to me that in many cases the values problem you identify isn't just about what one would find morally repulsive even if one's physical welfare was not so bad. In many eras and locales, a women or some particular ethnic minority would have a very different lot in life in a way that directly affects his or her welfare (e.g., as a woman, not having any property rights and thus lacking security and independence; as an ethnic minority, being oppressed or even a slave).

Should everyone making these ratings take that sort of thing into account (that is, the possibility of ending up being a woman or an oppressed ethnic minority in the new era), or only raters who are female or of an ethnic minority? Did you take this into account in making your own ratings? It seems like that sort of thing could greatly affect how one rated different eras.

[-]taw10

I did the standard historical thing and looked at median adult males. Average is usually worse than median.

This is fair, because I'm comparing only Western Europe for the last few centuries, and divisions now are mostly geographical - situation of different people in the same country tends to be similar; but situation of different people in different countries varies drastically. It used to be the other way - and correctly only one way would be rather unfair.

As for women, I'm definitely not going to compare that, as situation of large-family stay-at-home housewife (a model which goes back at least to Ancient Greece) is simply completely different way of life than what 20th century women do. This kind of separation of gender roles is incompatible with modern economy, and modern separation of gender roles is incompatible with most historical economies. I don't think that assigning low value to such life is based on much more than prejudice, but for this discussion look at clusterfuck which spread out of Bryan Caplan's article about 19th century women's freedoms to half of the blogosphere by now.

There's evidence that happiness is proportional to the log of income (blog post, pdf article) That suggests that GDP per capita is a decent measure of quality of life, we just shouldn't treat it as a linear relationship. Exponentially increasing GDP translates into linearly increasing happiness.

Log income predicts happiness at a given point in time, not for a whole life, so I'd expect your method to produce ratings that are closely correlated with log income multiplied by life expectancy at age 20 (to match your decision to exclude child mortality). If we can find historical data on life expectancy and GDP per capita then we could test that prediction.

[-]taw00

There's evidence that happiness is proportional to the log of income

I don't believe this is even a remotely possible result if interpreted absolutely, for if it was true, every single person born before 20th century would need to be suicidally depressed.

There's a reason I point to preference utilitarianism, not happiness utilitarianism.

That depends on the slope of the line and how far we are above the 0 utility threshold for a life worth living. I found this book with a table (table B21 on p. 264) of estimated historical GDP going back to the year 0, and there have been less than six doublings in that time. Current countries with per capita GDP at the same levels as pre-1900 Western Europe (and even year 0 Western Europe) are included in some of those analyses that found the log-linear fit, and their self-reported well-being is in line with the regression that fits the rest of the world. The log-linear relationship might break down if we go far enough into the past (or future), to places that are poor enough (or rich enough), or to societies that are different enough so that GDP won't be a good measure of their material quality of life, but the data I've seen suggest that the relationship is more robust than I would've expected.

The Stevenson & Wolfers paper that I linked uses self-report measures of welfar including happiness, satisfaction with life, and amount of smiling, and finds similar log-linear relationships with income on all of them (though with different slopes and intercepts), which suggests that this relationship will apply to whichever definition of utility we use.

Possibility of other factors affecting how happy people are wasn't excluded, as far as I can tell. Like social factors.

"measured happiness" isn't necessarily happiness in the sense you are thinking of. In fact, it probably isn't.

Being transported from 2010 to 1700 isn't the same as being born in 1700.

Your formulation of the question sounds unintuitive to me. We could ask a simpler question: would you rather live one life starting 2010, or two lives starting 1700?

Also you could try applying your technique to comparing the welfare of different countries right now. Many of the problems you listed will be easier to overcome.

Also you could try applying your technique to comparing the welfare of different countries right now. Many of the problems you listed will be easier to overcome.

This also has the advantage that we actually know how to transport someone to another country. If someone wanted to put resources into this, they could ask people to actually make the choice, not just imagine what choice they would make.

[-]taw00

Unless you speak that country's language natively, have social network there etc. this is just at all comparable in real world. It would still be just a thought experiment.

You are right, those are confounding factors.

Though if you look at willingness of people to take the bets moving in both directions, you may be able to account for it. For example (ignoring bilinguals for simplicity), if in England most people don't speak French, so they are less willing to move to France, and in France most people don't speak english, so they are less willing to move to England, maybe the effect cancels out, if both are equally represented.

Though, the experiment seems overly elaborate. We can look at immigration rates, and costs of immigration people are willing pay.

Neolithic Middle East (5000 BCE) - 1.6, Paleolithic anywhere (20000 BCE) - 0

Why do you prefer neolithic Middle East to the paleolithic? The advent of agriculture probably decreased the life expectancy, and I don't see anything which could compensate it in early agricultural societies. Put another way, I would strongly prefer a dangerous, but at least a bit adventurous life of a hunter-gatherer to the slavish work of a primitive peasant.

[-]taw10

Agriculture, city life, and large scale trade networks arose together, and I prefer this higher population and culture density and lower risk of violent death to Paleolithic somewhat higher quality of food.

Hey taw, did you write a code to present yourself with lotteries, log your choices, and compute your utilities?

[-]taw00

I ordered the choices until I was happy about them (that part wasn't too difficult as they're mostly chronological). Then I did "if I was about to live in nth, and there was a time machine that with p% moved me to n+1th, and otherwise to n-1th era, what would p need to be for me to take it" kind of lottery thinking.

Conversion of these to utilities was done by Ruby code in the post.

This was proposed as an alternative to GDP, but it's not clear that it actually measures something similar. Even broadly understanding both as attempts to measure human happiness, it doesn't seem similar.

Since we have no access to time-machines, we cannot give anyone a real choice between travelling back to 1700 and staying in 2010. There are no actual consequences to what they choose. So we are not even measuring people's naive preferences, we are just measuring what they like to say or believe about 1700 vs 2010.

If we're unhappy with GDP there are alternative measures like HDI, but they're highly artificial. It would be very easy to construct completely different measures which would "feel" about as right.

Running with this side point: I wonder if it would be possible to invent a less arbitrary HDI by feeding the variables the HDI uses (life expectancy, literacy rate, educational enrolment, and log GDP/capita) into a PCA and using the first principal component as an HDI replacement. That'd be less arbitrary than the current average of arbitrary indices method.

[-]taw90

PCA components cluster correlated input variables, with component weights essentially proportional to number of inputs corresponding to it. If you put 10 health indicators, 2 economy indicators, and 2 education indicators - your principal component will be health-based. If you put 10 education indicators, 2 economy, 2 health, your principal component will be education-based etc. In no case will it be meaningfully "welfare".

That's how you get 5-factor models in psychology - you just know what kind of questions to put on the questionnaire, and as long as you don't stray too far from it, you'll get exactly the 5 factors you want.

PCA can only be insightful if all inputs are equally important - something that people using PCA rarely bother sanity-checking.

Thanks for this comment, taw. I'd been wondering whether PCA is solid evidence that the Big Five personality traits carve reality at the joints.

The Big Five personality model was originally developed by researchers who raided dictionaries for every personality trait term that they could find, had people rate themselves (or others) on hundreds or even thousands of them, and kept finding this five factor solution that explained a lot of variance. Studies in other languages and cultures typically find similar results, although it doesn't always replicate perfectly (e.g., a missing factor, an extra factor or two, a slightly different meaning for one factor). In some ways it reflects people's lay theories of personality more strongly than actual personality, so it might share some widespread blind spots or misconceptions, but it was constructed in a thorough, systematic way (and there is evidence that each factor predicts behaviors, so it can't be too wildly off).

Good point, thanks.

This makes sense to me, but my ratings would be very different from yours. Also, is your rating for Western Europe 1900 colored by hindsight of two world wars, viral encephalitis, spanish flu and the rise to domination of the bureaucratic state? How clear are you being about socio-economic class? Are we just assuming the population distributions that existed? If so, ancient world slavery might make it less appealing than the Paleolithic I think.
As noted, time travel is problematic, but in what sense could a Paleolithic person 'be' me.

[-]taw-10

Also, is your rating for Western Europe 1900 colored by hindsight of two world wars, viral encephalitis, spanish flu and the rise to domination of the bureaucratic state?

Definitely not, I estimate that in most parallel universes such things didn't happen. They're very low likelihood was very strong expert consensus of the time, and we really don't have any new knowledge leading us to believe that they were likely.

Well they're at least more likely then our priors for them... they happened. Even with only a tiny prior that a coin is heads-biased, it landing heads is evidence for it.

[-]taw00

You're privileging a hypothesis of events that happened. It was never 50% current world:50% something else - then add the fact that current world happened, and we're over 50% line.

Plenty of things which have happened had negligibly low probabilities.