Suppose you had to answer the following question: what were the chances of the Confederate States winning the American Civil War in the eve of the Battle of Gettysburg in 1863?

At first glance, it's not terribly clear what this question means. It's vaguely posed: we haven't made it clear what it means for the Confederacy to "win" the war. We could make this more precise, but it's the comparatively less serious problem with the question. More serious is that we don't know what we're talking about when we ask about the probability of an event taking place when said event happened.

We could, for example, interpret the question as being about the extent to which macroscopic events in the world are influenced by the fact that fundamental physics is not deterministic. For example, it's sensible to ask what the probability of an atom of uranium-235 decaying in a given year was, even if we know that it did decay - in fact, in the ideal setting, this knowledge wouldn't affect our assessment of the chances at all, due to Fermi's rule. It's an age old question how much this microscopic lack of determinism cascades upwards to a similar lack of determinism on macroscopic scales, but while this question is interesting I also don't have anything novel to say about it.

What I can say is that even if Einstein had been right that God doesn't play dice with the universe, it would still make sense to ask questions such as the one above about the Civil War. The way to make sense of the question in this context is to imagine it as a question conditional only on what we know about the world prior to the Battle of Gettysburg - we exclude any information obtained after the battle. While in a deterministic world Laplace's demon could know the outcome of the Civil War with certainty starting from perfect knowledge of an initial state, we don't actually have perfect knowledge about the state of the world in June 1863. We also can't simply condition on the outcome of the war and know it with certainty, since the outcome is something that was only revealed past the deadline to which we restrict ourselves. This puts us in a similar epistemic situation to the paragraph above; except now the question is about the impact of anything we don't know on the question we care about rather than only the impact of "irreducible" nondeterminism.

It's important that while we condition only on facts about the state of the world that were available in June 1863, we don't do that when it comes to picking a probability distribution over models of the world. In other words, the question we're asking is the following: given our best understanding of how the world works right now and how much knowledge we have of the state of the world dating to before the deadline, what odds would we give on a particular event taking place?

We can, of course, still cheat by sneaking in what we know happened as part of our understanding inside our model; but this is no different from the usual problem of overfitting and doesn't merit any special consideration.

I won't go into the specific question at the start of the post, but I will direct you to an old blog post by Robin Hanson in which he cites a paper that extracted implied odds of Confederate victory, defined as the Confederacy being able to pay back its debts after the war, from Confederate bond prices posted in the Amsterdam market. Of course markets are often wrong, but we should keep in mind that while we may boast a superior understanding of the world today compared to participants in this market, we're also out of touch with what the state of the world was like in 1863 compared to contemporaries. It's not clear if the net impact would be to make our estimates more or less reliable than theirs.

We might also care about what contemporaries thought of the probability of various different possible futures at the time. In this case, we would have to restrict not only our information about the state of the world but also our beliefs about models of the world to that of contemporaries. This is only a mild variation on the same exercise, however, and doesn't change the spirit of it.

I'll call this activity "retrospective forecasting". It's acting as if we're forecasting an event that has yet to occur when we in fact know whether it took place or not, and I think it should be one of the primary lenses through which we view (and write about) history.

Why do retrospective forecasting?

One answer is surprisingly simple and elegant: Bayes' theorem.

Retrospective forecasting is about calculating for a variety of models we use, or perhaps integrating that against a probability distribution over a family of models if we want to produce a single probability as our final answer. In other words, it's about computing likelihoods. This is important because if we want to do Bayesian updating on our beliefs about which models of the world are how probable, we need to use Bayes' theorem:

Since the likelihood appears on the right hand side, this means "learning from history" actually requires us to do retrospective forecasting. We need to know how likely various events in history were conditional on our models of the world in order to update our beliefs about our models of the world based on what happened in the past. For example, if the disintegration of the USSR is going to be a feather in the cap of free-market economists who believe central planning of an economy is a terrible idea, we must have been able to predict this outcome in advance based on their model of the world.

There are other reasons to want more retrospective forecasting to be done. I'll give one more in this post: Without having a solid grasp on what contemporaries of an event thought about its chances of turning out one way or another, and how well their beliefs actually aligned with the best that could have been done with the information and understanding in their possession; we are at a loss when we attempt to judge their actions and their character. Was Lincoln a skillful politician who won an inevitable war to preserve the Union and free the slaves, or was he a reckless leader who flipped a coin to decide the fate of the Union instead of pursuing a prudent and more promising course of negotiation? Without retrospective forecasting we can't answer this question.

Another prominent historical figure whose reputation encompasses the whole gamut from genius to incompetence is Napoleon. He's often portrayed as having been successful in the early phase of the Napoleonic Wars due to his military genius, but he grew too enamored with his own success and took the poor decision to launch an invasion of Russia in 1812. The disaster that befell the French army in this expedition paved the way for his eventual downfall two years later.

Perhaps. Or perhaps all of Napoleon's decisions were as reckless as his invasion of Russia, and he merely got lucky when his opponents made one mistake after another when confronting him at Austerlitz, Auerstedt and Friedland. His luck finally ran out when he invaded Russia. Once again, our assessment of a historical figure depends crucially on both our view and their view of the odds that they faced.

I hope the examples of Lincoln and Napoleon are sufficient to make my case.

Conclusion

The idea of retrospective forecasting is simple, and yet the total lack of it in any history text, written by amateurs and professionals alike, makes it difficult to properly learn from history or to have grounded opinions about the people that lived in the past and the events that they lived through. We're left only with vague verbiage, which is in my opinion a deeply unsatisfying state of affairs.

You can think of this post as a plea to apply the methods and logic of forecasting to the past in an attempt to both learn from it better and to understand it better. The forecasting community so far has been focused almost exclusively on the future, and I think the way of thinking they use actually has broader applicability than that, and I hope it doesn't remain confined to the realm of what is yet to happen.

New Comment
6 comments, sorted by Click to highlight new comments since: Today at 11:11 AM

Maybe I missed something, or maybe it's simply that the study of history portrayed to us laypeople is usually so qualitative, but this just sounds like a call to apply quantitative model building and testing to the study of history. With some choice word replacements, you could get the post to sound like basic statistical modeling.

In some respect, people do this all the time with "event studies" (and then generalizing those to future events), or in "economic history." Perhaps they don't really address broad strokes of history, but "cliodynamics" tries.

I want to make a pitch for the usual historical analysis though - the need for quantitative modeling to judge the prowess of specific historical figures, for instance, comes mainly from a lack of knowledge of what contemporaries thought since they would presumably be the best judges. But that same lack of primary sources will often be accompanied by a lack of quantitative data. Not always, which is why both have a place in studying history! In particular, if record-keeping makes these available, but the only usable primary evaluations are obviously biased, resorting to the quantitative data may help. But these models have the same concerns of all other statistical models - validity (are your measures accurate and picking up the construct you think?), endogeneity (the actors certainly interact with their environment, so causal estimates of the affect of one variable on others may be hard to ascertain), omitted variables bias, and generalizability (probably the most critical, since time marches ever forward and a model that allows us to evaluate Napoleon might not be the right model to evaluate Lincoln...but we need multiple observations and will have to select our supposedly relevant population extremely carefully).

Maybe I missed something, or maybe it's simply that the study of history portrayed to us laypeople is usually so qualitative, but this just sounds like a call to apply quantitative model building and testing to the study of history. With some choice word replacements, you could get the post to sound like basic statistical modeling.

In a lot of my forecasts about the future, I don't actually use quantitative modeling at all. In fact, the best forecasters are those who rely on such models, but who make forecasts that are ultimately based on their judgment.

If anything, calling for quantitative modeling to be used can easily result in a kind of "scientism". Cliodynamics is actually a good example of that. I would instead recommend taking forecasters who have a good track record when making predictions about the future and have them do retrospective forecasting through whatever means they deem appropriate, for example.

I want to make a pitch for the usual historical analysis though - the need for quantitative modeling to judge the prowess of specific historical figures, for instance, comes mainly from a lack of knowledge of what contemporaries thought since they would presumably be the best judges. But that same lack of primary sources will often be accompanied by a lack of quantitative data. Not always, which is why both have a place in studying history! In particular, if record-keeping makes these available, but the only usable primary evaluations are obviously biased, resorting to the quantitative data may help. But these models have the same concerns of all other statistical models - validity (are your measures accurate and picking up the construct you think?), endogeneity (the actors certainly interact with their environment, so causal estimates of the affect of one variable on others may be hard to ascertain), omitted variables bias, and generalizability (probably the most critical, since time marches ever forward and a model that allows us to evaluate Napoleon might not be the right model to evaluate Lincoln...but we need multiple observations and will have to select our supposedly relevant population extremely carefully).

I don't know what you're talking about here. When I talk about "models" in the post, these "models" could just be heuristics in your head, inexplicit intuitions, et cetera. I never called for using statistical modeling to study history and I think excessive reliance on such models at the expense of your judgment is actually a mistake.

You must have misunderstood what I was trying to say in my post for you to make a comment that's so orthogonal to the point I tried to make, and I think that's my fault for not being sufficiently clear.

Yes I misunderstood your post. I appreciate your taking some responsibility as the communicator (e.g., probabilities and likelihoods are pretty quant!), but your post could have also been reasonably read as referring to inexplicit models, and that is on me. Communication breakdowns are rarely on one party alone.

I agree that cliodynamics has been a dicey application of quant modeling to history - the valuable parts of it are generally in the inexplicit modeling rather than the real quant model per se. Inexplicit forecasting is more common, but it's also less testable (anything but the most extreme falsification fits!) and then again not really all that different from what historians already do. The status quo in history is inexplicit modeling in expert judgment, so I'm not sure that relabeling it or asking historians to think less-inexplicitly-but-not-quite-explicitly will do much to move the field.

Qualitative work is not fated to fall into "just-so" stories, and neither is quantitative work destined to be "scientism." The key is understanding the internal and external validity of your research.

What is the use of retrospective forecasting unless you can come up with testable predictions?

What problem do you have with the two use cases I provide in the post?

If you want to make testable predictions about the future, you need to have good models of the world. To have good models of the world, you often need to learn from the past. As mentioned in the post, this requires you to do retrospective forecasting.

Concrete example: if you're going to make forecasts about whether there will be a civil war in the United States before the end of the century, you need to reason from models of what causes civil wars to happen. For your models of that to be good, you need to have updated your beliefs based on what you know about past civil wars, which requires you to know how likely they were to occur both under different models of the world and overall, since both probabilities go into Bayesian updating.

This question helped me realize that, if we have a theory that retrospective forecasting works, we can use that theory to make testable predictions, and then we can build up evidence for or against retrospective forecasting.

Suppose we have two models, Model-A and Model-B. The prior is 50%. We also have a history to look at. We can apply retrospective forecasting to determine P(History|ModelA) and P(History|ModelB), and then from Bayes Theorem we can update our estimate as to which model is most likely. Suppose this tells us that Model-A is much more likely.

Now we can use ModelA and ModelB to make testable predictions about the future. As the future unfolds, if events occur as predicted by ModelA, this is evidence that retrospective forecasting works. If events occur as predicted by ModelB, this is evidence that retrospective forecasting doesn't work.