Aggregating forecasts

Thank you for pointing this out!

I have a sense that that log-odds are an underappreciated tool, and this makes me excited to experiment with them more - the "shared and distinct bits of evidence" framework also seems very natural.

On the other hand, if the Goddess of Bayesian evidence likes log odds so much, why did she make expected utility linear on probability? (I am genuinely confused about this)

Aggregating forecasts


I had not realized, and this makes so much sense.

Can an agent use interactive proofs to check the alignment of succesors?

Paul Christiano has explored the framing of interactive proofs before, see for example this or this.

I think this is a exciting framing for AI safety, since it gets to the crux of one of the issues as you point out in your question.

What confidence interval should one report?

It's good to know that this a extended practice (do you have handy examples to see how others approach this issue?)

However to clarify my question is not whether those should be distinguished, but rather what should be the the confidence interval I should be reporting, given we are making the distinction between model predection and model error.

Assessing Kurzweil's 1999 predictions for 2019

I do not understand prediction 86.

In other words, the difference between those "productively" engaged and those who are not is not always clear.

As context, prediction 84 says

While there is sufficient prosperity to provide basic necessities (secure housing and food,
among others) without significant strain to the economy, old controversies persist
regarding issues of responsibility and opportunity.

And prediction 85 says

The issue is complicated by the
growing component of most employment's being concerned with the employee's own
learning and skill acquisition.

What is Kurzweil talking about? Is this about whether we can tell when employees are doing useful work and when they are shirking?

Assessing Kurzweil's 1999 predictions for 2019

Sorry for being dense, but how should we fill it?

By default I am going to add a third column with the prediction, is that how you want to receive the data?

Call for volunteers: assessing Kurzweil, 2019

Sure sign me up, happy to do up to 10 for now, plausibly more later depending on how hard it turns out to be

Is there an intuitive way to explain how much better superforecasters are than regular forecasters?

Brier scores are scoring three things:

  • How uncertain the forecasting domain is (because of this Brier scores are not comparable between domains - if I have a high Brier score in short term weather predictions and you have a low Brier score on geopolitical forecasting that does not imply I am a better forecaster than you)
  • How well-calibrated is the forecaster (eg we would say that a forecaster is well-calibrated if 80% of the predictions that he assigned 80% confidence to actually come true)
  • How much information does a forecaster convey in their predictions (eg if I am predicting coin flips and say 50% all the time, my calibration will be perfect but I will not be conveying extra information)

Note that in Tetlock's research there is no hard cutoff from regular forecasters to superforecasters - he arbitrarily declared that the top 2% were superforecasters, and showed that 1) the top 2% of forecasters tended to remain in the top 2% between years and 2) that some of the techniques they used for thinking about forecasts could be shown in an RCT to improve the forecasting accuracy of most people.

On characterizing heavy-tailedness

Sadly I have not come across many definitions of heavy tailedness that are compatible with finite support, so I dont have any ready examples of action relevance AND finite support.

Another example involving a momentum-centric definition:

Distributions which are heavy tailed in the sense of not having a finite moment generating function in a neighbourhood of zero heavily reward exploration over exploitation in multi armed bandit scenarios.

See for example an invocation of light tailedness to simplify an analysis at the beginning of this paper, implying that the analysis does not carry over directly to heavy tail scenarios (disclaimer, I have not read the whole thing).

On characterizing heavy-tailedness

The point you are making - that distributions with infinite support may be used to represent model error - is a valid one.

And in fact I am less confident about that one that point relative to others.

I still think that is a nice property to have, though I find it hard to pinpoint exactly what is my intuition here.

One plausible hypothesis is because I think it makes a lot of sense to talk about frequency of outliers in bounded contexts. For example, I expect that my beliefs about the world are heavy tailed - I am mostly ignorant about everything (eg, "is my flatmate brushing their teeth right now?"), but have some outlier strong beliefs about reality which drives my decision making (eg, "after I click submit this comment will be read by you").

Thus if we sample the confidence of my beliefs the emerging distribution seems to be heavy tailed in some sense, even though the distribution has finite support.

One could argue that this is because I am plotting my beliefs in a weird space, and if I plot them with a proper scale like odd-scale which is unbounded the problem dissolves. But since expected value is linear with probabilities, not odds, this seems a hard pill to swallow.

Another intuition is that if you focus on studying asymptotic tails you expose yourself to Pascal's mugging scenarios - but this may be a consideration which requires separate treatment (eg Pascal's mugging may require a patch from the decision-theoretic side of things anyway).

As a different point, I would not be surprised if allowing finite support requires significantly more complicated assumptions / mathematics, and ends up making the concept of heavy tails less useful. Infinites are useful to simplify unimportant details, as with complexity theory for example.

TL;DR: I agree that infinite support can be used to conceptualize model error. I however think there are examples of bounded contexts where we want to talk about dominating outliers - ie heavy tails.

Load More