Historical forecasting: Are there ways I can get lots of data, but only up to a certain date?

Suppose I wanted to get good intuitions about how the world works on historical timescales.

I could study history, but just reading history is rife with historical hindsight bias, both on my own part, and even worse, on the part of the authors I'm reading.

So if I wanted to master history, a better way would be to do it forecasting-style. I read what was happening in the some part of the world, up to a particular point in time, and then make bets about what will happen next. This way, I have feedback as I'm learning, and I'm training an actual historical predictor.

However, this requires a strong limit be enforced on the materials I'm reading: no information about "what's going to happen" can leak backwards. And unfortunately, this is kind of standard in history books. Usually, the author talk about how events are leading towards other events that they know will occur.

Is there some databases (or something), where I might be able to read a wide number of primary sources and economic / socioeconomic indicators (like the amount of pottery fragments, average skeleton size, how far specialized goods traveled, how much money was in circulation, the literacy rate, etc.), but which will only show me data up to a certain date, with a strong constraint of not accidentally seeing spoilers?

Ray Dalio mentions in his Big Debt Crises book that he did this by reading through newspaper archives. Obviously this has some shortcomings - not a lot of consistent quantitative data (other than asset prices), comes with a lot of interpretation from the writers, the writers are journalists, only works for relatively recent history, etc. But it seems like a great way to learn what sounded reasonable to laymen at the time.

This is a great idea! I think your best bet is to look for databases of primary sources. There are a number of paid searchable online databases of primary sources. Many of them are cost-prohibitive or unavailable to individuals, so you might need to find a friend who access to them through a university.

This unfortunately will only work for areas and time periods where you speak the language, but most written history is put in context with the past and future which makes it a non-starter.

Do you know the names or URLs of any of those databases in particular?

The short answer is no.

The long answer is that coming up with reasonable estimates of these things (even "easy" things like how far goods traveled or the amount of money in circulation) is a nontrivial task. Moreover, the very act of choosing metrics imposes modern interpretations and values upon past societies.

For example, let's take the amount of money in circulation. That's important today, because most of our commercial transactions are impersonal, conducted with people whom we don't know and may never see again. But historically, that wasn't the case. In older societies, with small tight-knit communities, the amount of cash in circulation didn't matter very much. The vast majority of economic transactions took place on a credit basis, with people keeping tabs on who owed whom what, and settling tabs on a periodic basis with goods, rather than cash. In this world, commercial relations are inseparable from social relations, and, as a result cash is far less important. Fixating on the amount of cash in circulation therefore risks imposing severe distortions on one's view of the level and sophistication of commercial transactions in historical economies.

I think these are the wrong types of questions to forecast for history, you want to ask much loower resolution questions like: Who will end up with more resources after this war? What country will be dominant in 20 years time, etc.

These questions are subjective obviously, but you can still do a decent job at figuring out how good you are at predicting the future form the past. One way you could resolve them is ask someone who's analysis of history you respect.

2quanticle2yThey're what the OP is looking to forecast, though. I pulled "money in circulation" example straight from the OP's post.
2Eli Tyre2yWell, I would be happy with whatever I can get. I'm not attached to those particular metrics.

Nope. There's already SUCH a strong selection bias in what actually got recorded and survived, and what's important enough to publish and teach, that you can never disentangle your model from that.

Note that this is true for forecasting the future as well - the data you have and the topics you're considering to forecast are massively constrained, to the point that you're pretty much p-hacking by the time you write down ANY hypothesis.

I think this is a challenging and non-trivial question, which I've considered before, but I'm less pessimistic than some other commenters.

I think what we really should do is to fund someone to research and build a rigorous training set along these lines, using some kind of bias avoiding methodology (eg clever pre-registration, systematic protocols for what data to include, etc ).

I find it conceivable but very implausible that doing this will make you worse, and can certainly imagine that doing it might make you a lot better. Though most plausibly it will have a small positive effect (though that might entirely be due to the benefits of just doing deliberate practice in thinking at all).

Also Tegan McCaslin did some work on this and at one point we ran a test workshop with some superforecasters trying to predict decades of steamship development in the 19ty century based on a dataset she'd made. Could did that out for you.

What about reading modern history textbooks written at a particular time? I'm not sure how long textbooks like that have existed, but it seems like a good way to get secondary data about a specified time frame without data snooping.