I'll start the post with an example because it's the best way I know to explain what I want to talk about.
If you had to evaluate how well someone understands macroeconomics but you yourself didn't have the mental tools to do so on the basis of their arguments, how would you go about doing it?
There are two answers that are popular; one which has been popular for a long time and a second related one which has been gaining popularity recently. These are:
I will argue that both of these are actually flawed methods for assessing how good someone's understanding of macroeconomics is and that this is a serious problem when we use forecasting track records to assess someone's ability to win real world competitions or the quality of their world models.
This argument is not new. It has been a topic of discussion in macro circles for a long time: if modern macro models give us so much insight into how the economy works, then how come big hedge funds or quant shops rarely rely on them? They can have some people looking at these models but generally the impression is that academic macro models are pretty useless if you want to make money in trading.
In addition to this, academics themselves find that modern macro models are not good at forecasting the future. For example; research by Christiano, Eichenbaum and Evans before the 2008 financial crisis has consistently found that New Keynesian dynamic stochastic general equilibrium (DSGE) models, the workhorse models of modern macro, fail to explain changes in wages and prices in the economy. Formally speaking, inflation and wage growth in these models seem to "dance to their own tune", mostly moved around by "wage and price shocks" that are only included to close the model instead of the monetary or fiscal policy shocks that are conventionally thought as important drivers of inflation. What gives? And if so, why do academics keep using the models instead of throwing them out?
I do think that there are some serious problems with these models but that's not what I want to talk about in this post. The more salient point for this post is that understanding is local, if we think of locality as restricting a huge causal network into a subgraph consisting only of a few nodes in it.
For instance, Newtonian mechanics gives us good predictions only in regimes where we can localize the phenomena we're studying. Celestial mechanics was historically one of the most important sources of information for Newton's gravitational theory, and that's because by a stroke of fortune the celestial bodies form an almost closed system with few large objects in it. If you evaluated a theory of mechanics by how well it's able to predict a camera footage of what's happening outside of your window, the theory would do pretty badly.
"Okay", you might say. "Newtonian mechanics is not good if we try to use it to predict messy and complicated phenomena like the shape of the clouds in the sky or the trajectory of leaves falling from a tree. That doesn't matter, though, because there are situations in which it can make good predictions, such as what happens if we slide a block down an inclined plane."
That's correct, but in general we have no reason to expect this to be true. Physics is a domain in which it's particularly easy to cut out external interference from an experiment and ensure that an understanding of just a few nodes in a causal network and their interactions will be sufficient to make good predictions about the results. If you have a similar good understanding of some subset of a complicated causal network, though, it's possible you really never get to express this in forecast quality in a measurable way if the real world never allows you to ask "local questions".
Instead, other considerations will dominate when you attempt to forecast variables in such a complex network. For instance, having a good forecast of inflation requires understanding how the central bank will react to various situations, the risks of a shortage in oil or food, whether a particular stimulus bill is going to pass, et cetera. All this interference with your simple picture of how inflation works is going to make your model pretty useless. There's a real risk that you'll be beaten by a simple LSTM or Transformer that's been fit to past inflation data or some ensemble of such models because these models learn how to do autoregressive predictions in the whole causal net while your understanding only applies to a very restricted subset of it. It may be of higher quality and depth locally, but it will lack the breadth over the whole variety of effects you need to consider to outperform other methods.
For example, if I wanted to forecast when the next earthquake with a magnitude over 8 in Chile will occur, I would just look at how many have occured in e.g. the past 100 years and use that as a base rate for forecasting the date of the next one. In contrast, someone who knows in great detail how earthquakes happen and the tectonic plate structure of Earth but not the dates of past earthquakes in that region would perform much worse. One way to model this is to imagine the right model of the situation is some Markov chain with lots of states and a huge transition probability matrix. I know almost nothing about the chain except the probability mass of the stationary distribution that's on states of the chain where earthquakes are happening. The expert may know much more than me about the structure of the chain but his knowledge is simply useless because even a single node he's missing can dramatically change the stationary distribution of the whole chain.
It would, however, be a mistake to conclude from this that I understand earthquakes better than the expert or that I'm more reliable on questions about e.g. how to reduce the frequency of earthquakes. I likely know nothing about such questions and you better listen to the expert since he knows much more about what actually causes earthquakes than I do. That also means for anyone who wants to take action to affect earthquakes in the real world the expert's advice will be more relevant, but for someone who just wants to sell earthquake insurance my approach is probably what they will prefer.
In general, having a better understanding of a system becomes more important the more you want to intervene on it. This is also why, in my opinion, forecasting has failed to gain traction inside organizations whose work mostly consists of such interventions on the physical world. Forecasts for variables you can't influence are quite useful, but if your job is to act on a particular variable then you really want to have a model of how your actions are going to influence the variable which is much more compressed than a huge array of conditional forecasts. That's when understanding or gears-level models become important.
Going back to the original question about modern macro models, I think the main reason they are not used by traders very much is that traders individually don't have much influence over the prices of the assets they are trading and they don't profit by having agency over where the price is going to go. Their agency is over portfolio and trading decisions and they take the behavior of prices as (kind of, though not quite) fixed by the external world. This means they value forecasts about prices and understanding of e.g. market mechanics or portfolio sizing, which I think maps onto the behavior of real world trading firms quite well. It's also telling that an exception to the rule of "trading firms don't care about understanding prices" is the price impact of their trades, which they try to understand in much greater detail because it's a case where they do have a short-term impact on the price by how they size and execute their orders.
To conclude, I think we should be cautious when we use the fact that experts underperform when making forecasts about the future to infer that their expertise is flawed. This is valid if their expertise is specifically in trying to forecast the future, but if we evaluate macroeconomists by their ability to forecast recessions or diplomats by their ability to forecast wars, I think we would be making a mistake.
You could test this to some extent by asking the forecasters to predict more complicated causal questions. If they lose most of their edge, then you may be right.
Speaking as a forecaster, my first instinct when someone does that is to go on Google and look up whatever information may be relevant to the question that I've been asked.
If I'm not allowed to do that and the subject is not one I'm expert in then I anticipate doing quite badly, but if I'm allowed a couple hours of research and the question is grounded enough I think most of the edge the expert has over me would disappear. The problem is that an expert is really someone who has done that for their whole field for thousands of hours, but often if your only goal is to answer a very specific question you can outsource most of the work away (e.g. download Sage and run some code on it to figure out the group of rational points of an elliptic curve).
So I don't think what you said really tests my claim, since my argument for why understanding is important when you want to have impact is that in the real world it's prohibitively expensive to set up conditional prediction markets on all the different actions you could take at a given moment every time you need to make a decision. As davidad puts it, "it's mostly that conditional prediction markets are too atomized/uncompressed/incoherent".
I still think your experiment would be interesting to run but depending on how it were set up I can imagine not updating on it much even if forecasters do about as well as experts.
It seems like you're saying that the practical weakness of forecasters vs experts is their inability to make numerous causal forecasts. Personally, I think the causal issue is the main issue, whereas you think it is that the predictions are so numerous. But they are not always numerous - sometimes you can affect big changes by intervening at a few pivot points, such as at elections. And the idea that you can avoid dealing with causal interventions by conditioning on every parent is usually not practical, because conditioning on every parent/confounder means that you have to make too many predictions, whereas you can just run one RCT.
I thought this post was great; thanks for writing it.
Speaking of macroeconomics, there's a nice connection here to the famous Lucas critique:
The Lucas critique, named for American economist Robert Lucas's work on macroeconomic policymaking, argues that it is naive to try to predict the effects of a change in economic policy entirely on the basis of relationships observed in historical data, especially highly aggregated historical data. More formally, it states that the decision rules of Keynesian models—such as the consumption function—cannot be considered as structural in the sense of being invariant with respect to changes in government policy variables.