I enjoy the idea of thinking in probabilities, and also, as part of my job, I have to deal with the hellish task of figuring out how to present probabilities to people in a simple way.

It seems fairly easy to reason about probabilities for discrete events, there's a whole apparatus of reasoning regarding binary outcomes that's more or less "made it into the zeitgeist" at this point.

For > 2 discrete outcomes you can pretty much reason in terms of accuracy + balanced accuracy + (in specific cases) a custom-weighted accuracy.

Again, these seem intuitive to me for two reasons:

  • you can present them to intelligent people that aren't too familiar with mathematics overall and they "click" fairly quickly.
  • they are simple enough that our monkey brains can use them, to some extent, in day-to-day reasoning

On the other hand, I can't find a solution that's as intuitive for numerical predictions, the two most obvious are:

  • r2 score, which is a "scientific" context will usually make sense, to some extent, under certain assumptions about the distribution of your data
  • a "percentage accuracy" score (e.g. abs(inferred - observed)/observed ) which indicates the error as some percentage of the original value... however this gets really tricky when trying to sum up more than one observation and in many situations, it seems to scale oddly.
  • a "difference accuracy" score (e.g. abs(inferred - observed) )

Even worst, given all those 3 scores combined, you're still left with many problem types for which they are unrepresentative.

Granted, one can throw 20 other error measures at the problem and paint a basically clear picture, but then the "understandable" bit is lost. Even an r2 score is not really obvious unless you work in specific domains. The "percentage" and "difference" based accuracy scores, on the other hand, behave horribly chaotic even with seemingly small inferential differences and can often be completely misleading even when taken together ... hence why I assume, they are never used ... but they are the easiest to explain to epople.

The alternative I currently have is always having confidence intervals, and thinking about accuracy as being bound by those. But that just tosses you into a whole different problem space regarding figuring out the acceptable distance between the inferential bonds.

Obviously, there's no solution here, even the combination of balanced + unbalanced accuracy for the discrete case leaves us with a lot of situations where it doesn't pain a representative picture. But I'm curious what you guys would use to explain, and more importantly to reason internally, about the confidence/certainty/probability of a continuous inference. What's a good starting point that seems to give "decent" results for the vast majority of continuous-target problems you encounter in life? 

New to LessWrong?

New Answer
New Comment