This question exists in the awkward space between "things undergrads google for homework" and "things on the cutting edge," so google isn't being super helpful.

I have a number I want a computer to estimate. Right now I have two regression models and an insider methodology. The former can be used to create two normal curves. The latter creates a point estimate only, but I can back into a confidence interval/normal curve with an acceptable amount of arbitrary hand-waving. If necessary, this could be conceived of as a prior.

How can I automatically weight the three curves into a single point estimate? I vaguely remember something from an econometrics class about weighting forecasts in a way that minimized total standard error, but I tried to work the math out myself and I didn’t know how to deal with the covariances of the forecasts. Can I simply assume the forecast covariances are zero?

This seems like a good place to use Bayes’ law, but I don't know how to formally set it up.

 

Edit to Add: Bayesian statistics is still new to me, so forgive me for being a bit dense. Here's my understanding of the methodology right now.

What exactly is D in this scenario?

 

New to LessWrong?

New Comment
8 comments, sorted by Click to highlight new comments since: Today at 12:27 PM

I don't have an answer for this case, but I find that http://stats.stackexchange.com/ is usually a better place to get good answers for things like that.

Great link. Thanks.

Can I simply assume the forecast covariances are zero?

Nope. Just a basic sanity check here: If your forecast covariances are zero, then their correlations are zero. You want your forecasts to be correlated with the truth, so if they are, they should also be correlated with each other.

The google keyword you're looking for is "Bayesian model averaging." I've never seen anyone use Bayesian model averaging to average frequentist estimates, but I'd bet at even odds that it's been done before.

As a quick run through, this is what you do. Denote your three models as M1, M2, and M3 (with Mi or Mk denoting model i or k). Set a prior probability for each model being 'true,' e.g. if you have no information to differentiate the models, set P(M1)=P(M2)=P(M3)=1/3. You might underweight the regression models because you trust the insider methodology more or perhaps the opposite. Whatever prior you choose, it's just a matter of calculating posterior model weight using Bayes rule. Formally, if X is the data and Lk() is the likelihood function for model k:

P(Mk | X) = Lk(X)*P(Mk) / sum_i[ Li(X)*P(Mi) ]

Then you average the forecasts according to P(Mk | X).

If I were using Bayesian model averaging, I'd also want to do Bayesian analyses of the individual models - I'm not sure what using frequentist estimates does to the posterior model probabilities - but you can probably find more details on google.

edit: notation

This makes sense. Using Bayes rule to develop the weights was the (/a) missing link for me. I was trying to do it all conditional on the possible outcomes.

Correct me if I'm wrong, but there should be a different weight between the models at different parts of the dependent variable? When the dependent variable is near its mean, the regressions will have narrower forecast distributions and so less weight should go to the insider methodology.

This particular method doesn't do that. Think of the weight for a given model as the probability that the model is 'true.'

I think you can make the weights depend on the dependent variable by specifying the prior weights conditional on the dependent variable. For example, if your dependent variable, x, is continous, you might set P(M1|x)=P(M2|x)=logit(x)/2 and P(M3|x)=1-logit(x). The key would be choosing appropriate functions of x that reflect your actual prior knowledge.

On the other hand, there's probably a method that automatically takes into account each model's prediction error as a function of the dependent variable(s), but I'm not aware of it.

In general Bayesian regression is pretty similar to frequentist regression - it should be easy to convert them over.

It can be more difficult in the context of model averaging though. RJMCMC isn't intuitive for many people and the alternatives aren't necessarily simple. This really depends on whether the dimension of the parameter space is constant across models.

Edit to Add: Bayesian statistics is still new to me, so forgive me for being a bit dense. Here's my understanding of the methodology right now.

What exactly is D in this scenario?

D is your data.

First, I misspoke - you don't want the likelihood, you want the marginal distribution of the data. See http://www-personal.umich.edu/~bnyhan/montgomery-nyhan-bma.pdf especially the first 5 or so pages.

Second, your likelihood will look different from what you think anyway. Assuming normal distributions and only one covariate x, letting y denote the response with n total observations, it will be:

%5E2%20\right))

Where sigma is your standard error, NOT the forecast standard error. Your likelihoods might look different depending on the particular models you are using. Multiple regression, for example, will have more covariates and thus more regression parameters in the mean function.