Solomonoff induction is generally given as the correct way to penalise more complex hypotheses when calculating priors. A great introduction can be found here.
My question is, how is this actually calculated in practice?
As an example, say I have 2 hypotheses:
A. The probability distribution of the output is given by the same normal distribution for all inputs, with mean and standard deviation .
B. The probability distribution of the output is given by a normal distribution depending on an input with mean and standard deviation .
It is clear that hypothesis B is more complex (using an additional input [], having an additional parameter [] and requiring 2 additional operations to calculate) but how does one calculate the actual penalty that B should be given vs A?
Well that explains why I was struggling to find anything online!
Thanks for the link, I’ve been going through some of the techniques.
Using AIC the penalty for each additional parameter is a factor of e. For BIC the equivalent is √n so the more samples the more penalised a complex model is. For large n the models diverge - are there principled methods for choosing which regularisation to use?