[ Question ]

Historical forecasting: Are there ways I can get lots of data, but only up to a certain date?

by elityre 1 min read21st Nov 201910 comments

38


Suppose I wanted to get good intuitions about how the world works on historical timescales.

I could study history, but just reading history is rife with historical hindsight bias, both on my own part, and even worse, on the part of the authors I'm reading.

So if I wanted to master history, a better way would be to do it forecasting-style. I read what was happening in the some part of the world, up to a particular point in time, and then make bets about what will happen next. This way, I have feedback as I'm learning, and I'm training an actual historical predictor.

However, this requires a strong limit be enforced on the materials I'm reading: no information about "what's going to happen" can leak backwards. And unfortunately, this is kind of standard in history books. Usually, the author talk about how events are leading towards other events that they know will occur.

Is there some databases (or something), where I might be able to read a wide number of primary sources and economic / socioeconomic indicators (like the amount of pottery fragments, average skeleton size, how far specialized goods traveled, how much money was in circulation, the literacy rate, etc.), but which will only show me data up to a certain date, with a strong constraint of not accidentally seeing spoilers?


New Answer
Ask Related Question
New Comment
Write here. Select text for formatting options.
We support LaTeX: Cmd-4 for inline, Cmd-M for block-level (Ctrl on Windows).
You can switch between rich text and markdown in your user settings.

4 Answers

Ray Dalio mentions in his Big Debt Crises book that he did this by reading through newspaper archives. Obviously this has some shortcomings - not a lot of consistent quantitative data (other than asset prices), comes with a lot of interpretation from the writers, the writers are journalists, only works for relatively recent history, etc. But it seems like a great way to learn what sounded reasonable to laymen at the time.

This is a great idea! I think your best bet is to look for databases of primary sources. There are a number of paid searchable online databases of primary sources. Many of them are cost-prohibitive or unavailable to individuals, so you might need to find a friend who access to them through a university.

This unfortunately will only work for areas and time periods where you speak the language, but most written history is put in context with the past and future which makes it a non-starter.

The short answer is no.

The long answer is that coming up with reasonable estimates of these things (even "easy" things like how far goods traveled or the amount of money in circulation) is a nontrivial task. Moreover, the very act of choosing metrics imposes modern interpretations and values upon past societies.

For example, let's take the amount of money in circulation. That's important today, because most of our commercial transactions are impersonal, conducted with people whom we don't know and may never see again. But historically, that wasn't the case. In older societies, with small tight-knit communities, the amount of cash in circulation didn't matter very much. The vast majority of economic transactions took place on a credit basis, with people keeping tabs on who owed whom what, and settling tabs on a periodic basis with goods, rather than cash. In this world, commercial relations are inseparable from social relations, and, as a result cash is far less important. Fixating on the amount of cash in circulation therefore risks imposing severe distortions on one's view of the level and sophistication of commercial transactions in historical economies.

Nope. There's already SUCH a strong selection bias in what actually got recorded and survived, and what's important enough to publish and teach, that you can never disentangle your model from that.

Note that this is true for forecasting the future as well - the data you have and the topics you're considering to forecast are massively constrained, to the point that you're pretty much p-hacking by the time you write down ANY hypothesis.