In an earlier post, I made a serious mistake. This is an attempt to remedy it. I conclude in this post that improper priors are fundamentally flawed, so, in a sense, this is a more formal example of the intuition shown in Against improper priors.

 


 

Imagine Omega tells you he has constructed a uniform distribution with minimum 0 and maximum B. Before he gives you any numbers, what should your prior distribution be for B?

I don’t have an answer, but I do have some light to shed on the issue. Now, from the inside view, its hard to say what priors I should choose. I mean, you might make an intuitive argument for a uniform (improper) prior over all positive real numbers, but (as far as I know) there is no completely accepted theoretical justification for this.

 


 

Recall that Eliezer Yudkowsky mentioned that the proper way to score how accurate your probability assignments are is to use a logarithmic scoring rule. In particular, for discrete outcomes, the score you should get if you are correct should be equal to the logarithm of the probability you assigned that outcome (plus some constant value if purely negative scores bother you).

We can easily extend this scoring system to continuous distributions, by noting that the density function is defined as

where we take the limit as h approaches 0.

Doing some math, we can compute a scoring rule:

where x is the correct answer.

Therefore, if we have two functions, f, and g, we can compute their difference in scores:

In short, we can take the logarithm of the density assigned to X=x and use that as our score.

Using this metric, I claim that some prior distributions for answering Omega's question are strictly better than other prior distributions in the sense that no matter what B’s value, you can expect to score higher using one than the other. For legibility/accessibility reasons, I'm not going to include a proof of this claim, but I will outline it in this footnote.1

In my opinion, this result is extremely counterintuitive – does every prior distribution have some possible instance in which is better than an alternative? It seems intuitively obvious to me that, given any two prior distributions, whether one ends up being better than the other should depend on the parameter’s actual value – otherwise it feels like one of them is cheating – if you and a friend were betting on the land-area of Asia, you wouldn’t let them say – “see, I was right,” regardless of what the correct answer actually was.


You can see, then, why I favor the interpretation that improper prior distributions just shouldn’t be used. Indeed, if you try finding the optimal xa prior distribution but with the assumption that the maximum is greater than 1 (allowing for proper prior distributions), then you find that no prior distribution strictly dominates all the other sin this class. I still don’t know whether this is true in general, but I’ve tried for a good deal of time to come up with a counter example and have only failed.


1. To prove that some priors score objectively better than others, you want to calculate the expected score of a prior given some B

where n is the sample size and f is the prior distribution.

The left-hand side indicates the probability of a certain maximum, m, in the sample. The right-hand represents something like Bayes' Theorem. Essentially, the top represents the probability density we assign to B being correct before normalizing the probability to get an area of 1. The bottom represents the area under our (improper) posterior probability density function. So, in effect, the bottom part normalizes the top.

If you let f(x)=x^a, then you conclude that a=-1 maximizes the expected score, regardless of B's (or n's) values.

New to LessWrong?

New Comment