One way to look at this is, where is the variance coming from? Any particular forecasting question has implied sub-questions, which the predictor needs to divide their attention between. For example, given the question "How much value has this organization created?", a predictor might spend their time comparing the organization to others in its reference class, or they might spend time modeling the judges and whether they tend to give numbers that are higher or lower.
Evaluation consistency is a way of reducing the amount of resources that you need to spend modeling the judges, by providing a standard that you can calibrate against. But there are other ways of achieving the same effect. For example, if you have people predict the ratio of value produced between two organizations, then if the judges consistently predict high or predict low, this no longer matters since it affects both equally.