Prediction evaluations may be best when minimally novel
Imagine a prediction pipeline is resolved with a human/judgemental evaluation. For instance, a group today starts predicting what a trusted judge 10 years from now will say for the question, "How much counterfactual GDP benefit did policy X make, from 2020-2030?"
So, there are two stages:
One question for the organizer of such a system is how many resources to delegate to the prediction step vs. the evaluation step. It could be expensive to both pay for predictors and evaluators,