Solomonoff induction is generally given as the correct way to penalise more complex hypotheses when calculating priors. A great introduction can be found here.

My question is, how is this actually calculated in practice?

As an example, say I have 2 hypotheses:

A. The probability distribution of the output is given by the same normal distribution for all inputs, with mean and standard deviation .

B. The probability distribution of the output is given by a normal distribution depending on an input with mean and standard deviation .

It is clear that hypothesis B is more complex (using an additional input [], having an additional parameter [] and requiring 2 additional operations to calculate) but how does one calculate the actual penalty that B should be given vs A?

Well that explains why I was struggling to find anything online!

Thanks for the link, I’ve been going through some of the techniques.

Using AIC the penalty for each additional parameter is a factor of e. For BIC the equivalent is √n so the more samples the more penalised a complex model is. For large n the models diverge - are there principled methods for choosing which regularisation to use?